Sage Journals: Discover world-class research

Abstract

Companies are increasingly using predictive modeling to manage customer churn proactively. While extant customer retention literature centers mainly on propensity models, recent research indicates merits of uplift models for targeting retention efforts toward customers. However, prior research on uplift modeling relies largely on experimental data and tailored uplift algorithms, making it difficult and costly for practitioners and researchers to apply. Thus, we investigate the applicability and competitiveness of an uplift modeling procedure for customer retention management combining propensity score matching with meta-learner approaches (standard machine learning algorithms). Using a semi-synthetic churn dataset with 1980 customers, we affirm the effectiveness of propensity score matching for reducing covariate imbalance in observational data. The empirical experiments show that meta-learner uplift models outperform tailored uplift random forest approaches regarding Qini scores and computation efficiency. Moreover, the results imply that targeting retention efforts based on a meta-learner uplift model reduces churn more effectively than using propensity models.

Keywords

uplift modeling customer retention meta-learner propensity-score matching

Introduction

Customer churn, also referred to as customer defection or attrition, is a persistent challenge for many businesses that seek to sustain profitability and long-term customer relationships (Blattberg et al., 2008; Lemmens & Gupta, 2020; Neslin et al., 2006; Pondel et al., 2021). As acquiring new customers is costly, retaining existing ones is essential for long-term success and firm value (Ascarza et al., 2016). Accurately predicting customer behavior is therefore a key element of proactive customer retention management (Ascarza, 2018).

Customer churn prediction and response models help marketing teams segment customers by estimating who is likely to churn or respond to marketing actions (Ascarza et al., 2018; Z.-Y. Chen et al., 2015; Lemmens & Gupta, 2020). These models allow firms to implement targeted retention measures designed to persuade customers to stay before they defect (Blattberg et al., 2008). However, despite their popularity and strong predictive power, traditional propensity models often lead to ineffective campaigns because they ignore the causal nature of customer responses (Ascarza, 2018; Devriendt, Berrevoets, & Verbeke, 2021). Propensity models cannot answer questions such as “Did this customer stay because of the campaign?” or “Would this customer have left without the incentive?” As a result, marketing resources may be wasted on customers who would have stayed anyway or cannot be retained through incentives (Caigny et al., 2021).

To overcome these limitations, recent studies have promoted uplift modeling as a more causal approach to customer retention management (Ascarza, 2018; Guelman et al., 2015; Provost & Fawcett, 2013). Uplift modeling estimates the incremental impact of a treatment (e.g., a retention offer) on a customer’s likelihood of staying (Hansotia & Rukstales, 2002; Li et al., 2018). By estimating the treatment effect directly, uplift models identify persuadable customers who are most likely to respond positively to a campaign, allowing firms to focus efforts where they make the greatest difference (Kane et al., 2014; Rößler et al., 2021).

Despite these advantages, uplift modeling remains difficult to apply in practice. Most approaches depend on randomized controlled trials (RCTs), which are expensive, time-consuming, and often impractical because they require random targeting that conflicts with established marketing processes (Haupt et al., 2019; Radcliffe, 2007). In addition, many uplift models rely on specialized algorithms that are complex, computationally demanding, and difficult for practitioners to interpret (Caigny et al., 2021; Zhang et al., 2022). Recent advances in heterogeneous treatment-effect estimation and information systems research make uplift modeling more accessible for practical use by building on existing implementations of modern algorithms. However, these advances have not yet been fully adopted in the uplift modeling literature or evaluated in the context of customer retention.

This study addresses these challenges by proposing an accessible and efficient uplift modeling procedure that works with observational data. Specifically, we combine propensity score matching (PSM) with meta-learner approaches to make uplift modeling more practical and transparent for customer retention management. Meta-learner approaches refer to modeling frameworks that use standard machine learning algorithms in structured combinations to estimate the causal effect of an intervention on individual outcomes. The study examines whether this approach is competitive with existing uplift modeling techniques and more effective than traditional propensity models. We focus on two key questions: (1) Do meta-learner-based uplift models perform as well as or better than specialized uplift algorithms? and (2) Does targeting based on the proposed approach improve campaign effectiveness compared to conventional churn prediction methods?

Our contribution is twofold. First, we introduce propensity score matching as a causal inference technique for uplift modeling, addressing calls by Gubela et al. (2017) to explore alternatives to randomized controlled trials. This extension enables the use of uplift models when data are not fully randomized. Second, we apply and evaluate different meta-learning approaches to demonstrate that uplift modeling can be implemented using existing machine learning tools without custom algorithms. Together, these contributions advance both theory and practice: theoretically by expanding the methodological foundation of uplift modeling through causal inference, and practically by offering marketers a feasible and interpretable approach to improve retention targeting.

Related Work

General Uplift Modeling Literature

Research on uplift modeling falls into two families. First, tailored algorithms modify standard learning methods particularly tree-based and ensemble models to estimate uplift directly. Early work formalized tree splits that account for treatment and control to maximize differential response (Radcliffe & Surry, 1999), with subsequent variants on split criteria and pruning (Rzepakowski & Jaroszewicz, 2012b) and ensemble formulations such as uplift random forests (Guelman et al., 2015; see also Sołtys et al., 2015). Recent studies integrate insights from heterogeneous treatment-effect research to stabilize splitting and improve robustness (Rößler et al., 2021). These approaches build on the general CART tradition (Breiman et al., 2017) but modify objectives to target incremental effects rather than outcomes.

Second, meta-learners, uses standard predictive models to estimate uplift by decomposing the causal problem into simpler sub-tasks (Curth & van der Schaar, 2021; Künzel et al., 2019; Nie & Wager, 2021). Single-model formulations either transform the outcome to a “treatment-consistent” label for direct classification (Gutierrez & Gérardy, 2017; Jaskowski & Jaroszewicz, 2012; Lai et al., 2006) or add a treatment indicator and interactions so that uplift is recovered from fitted responses (Lo, 2002; sometimes termed the S-learner, Künzel et al., 2019). Two-model approaches estimate separate models for treated and control groups and define uplift as their difference (T-learner; Hansotia & Rukstales, 2002; Radcliffe, 2007). More recent multi-step learners (e.g., X- and R-learners) incorporate nuisance components such as propensities or response functions to construct treatment-effect estimators (Künzel et al., 2019; Nie & Wager, 2021; Okasa, 2022). In practice, meta-learners are attractive because they retain off-the-shelf tooling while accommodating heterogeneous effects, whereas tailored trees optimize a task-specific objective and can yield interpretable segmentation. Together, these streams delineate a design space that balances model specificity (tailored trees) against implementation flexibility (meta-learners).

Uplift Modeling for Customer Retention

In customer retention management, most uplift modeling studies focus on single tailored approaches. For example, Guelman et al. (2012) applied an uplift random forest in an insurance retention case, and Ascarza (2018) demonstrated its effectiveness in field experiments for a wireless provider and a membership organization. Lemmens and Gupta (2020) extended this work by incorporating a profit-based loss function, while Caigny et al. (2021) introduced a tailored uplift logit leaf model for B2B contexts.

Only a few studies compare different modeling paradigms from both the tailored and meta-learner streams. Radcliffe (2007) contrasted an early S-learner formulation (Lo, 2002) with a decision tree uplift model in a mobile phone retention case. Devriendt et al. (2018, 2021) further benchmarked S- and T-learners against tailored uplift models and traditional churn prediction models, confirming the potential of uplift modeling for more profitable targeting.

Despite these advances, most research continues to rely on randomized controlled trials (RCTs) as the causal foundation of uplift modeling. However, RCTs are costly and logistically challenging, as random targeting often conflicts with established marketing practices (Ascarza, 2018; Haupt et al., 2019). These constraints, combined with the technical complexity of uplift algorithms, limit adoption in practice (Rößler et al., 2021). To address these challenges, our study introduces propensity score matching as a causal inference method for uplift modeling and combines it with meta-learners to enhance accessibility and applicability in observational data settings.

Methodology

Dataset Description and Preparation

For the experimental evaluation, we use a semi-synthetic churn dataset generated from covariate data of a real-world telecom dataset—a common practice in causal inference (Fredrik et al., 2016; Hill, 2011; Louizos et al., 2017). We simulate only the treatment indicator and outcomes to maintain realism while preserving the structure of actual customer data. The original CrowdAnalytix¹ churn dataset contains 3,333 observations with 19 covariates describing usage patterns (e.g., number of calls) and customer characteristics such as area of living. We remove redundant variables (e.g., state, already captured by area), drop the churn indicator, and encode categorical variables via One-Hot-Encoding. The final dataset includes 5 binary and 15 continuous covariates.

Outcomes and treatment indicators are generated using a modified version of the uplift data generation algorithm implemented in the pylift Python library (Yi & Frost, 2018). The simulation steps are as follows:

1. Standardize the real-world covariates using z-standardization, where the sample mean is subtracted from every observation and then divided by the sample’s standard deviation.

2. Generate a coefficient $β_{j}$ for each covariate $j$ and a coefficient $β_{0}$ as intercept in the range of [-10, 10].

3. Create an error term $ε_{i}$ for each observation $i$ , drawn from a zero-centered normal distribution with a standard deviation of 3.

4. Generate a binary treatment indicator $t_{i}$ for each observation $i$ , drawn randomly from a binomial distribution with a probability of 0.5 to simulate a random treatment assignment.

5. Define a vector $τ_{i}$ for each observation $i$ , which governs the impact of the treatment on the outcome. $τ_{i}$ is calculated as the sum of a randomly drawn value from a normal distribution with a center of 10 and a standard deviation of 1 and the dot product of each covariate $j$ and its corresponding coefficient $β_{j}$ , which is multiplied by a constant feature effect of 0.5.

6. Calculate the outcome $Y_{i}$ as the sum of $β_{0}$ , the dot product of each covariate $j$ and its corresponding coefficient $β_{j}$ , the error term $ε_{i}$ and the binary treatment indicator $t_{i}$ multiplied by the treatment vector $τ_{i} .$ Transform the continuous outcome to a binary flag with 1 (no churn) if $Y_{i}$ > 0 and 0 (churn) if $Y_{i}$ ≤ 0.

7. Append outcomes and treatment indicator to the covariate dataset.

The resulting dataset initially reflects an experimental setup due to random treatment assignment. To simulate an observational setting, we introduce selection bias following Hill (2011) by discarding nonrandom portions of the treatment group. Specifically, we remove customers with zero service calls, account age under 20 days, or more than 25 voicemail messages—representing customers unlikely to receive retention offers. This yields 990 treated and 1,711 control observations, closely mirroring realistic campaign targeting scenarios.

Experimental Setup and Procedure

Our experiment comprises three parts. First, we apply propensity score matching (PSM) to observational data for uplift modeling. Second, we compare meta-learner uplift models with a widely used tailored approach. Third, we benchmark the best meta-learner uplift model against the top-performing churn prediction and customer response models. We use nested cross-validation, with an inner loop for hyperparameter tuning and an outer loop for performance assessment (Gattermann-Itschert & Thonemann, 2021). Effectiveness is measured with model-specific metrics, and the impact of alternative targeting practices is simulated under different campaign scenarios.

In observational data, treatment and control groups often differ in baseline characteristics, biasing treatment effect estimates (Stuart, 2010). We therefore use standardized mean differences (SMDs) to assess covariate balance (Austin, 2009; Rosenbaum & Rubin, 1985). SMDs express mean differences in pooled standard-deviation units, making balance comparable across covariates (Austin, 2011).

We employ propensity score matching to mitigate covariate imbalance in uplift estimation. Among causal inference techniques, such as inverse probability weighting or instrumental variables, PSM is widely used for observational studies (Caliendo & Kopeinig, 2008). Because it separates confounder adjustment from treatment-effect estimation, PSM integrates easily into uplift modeling. The method balances covariates by pairing treated and untreated observations with similar propensity scores, defined as the probability of treatment conditional on observed covariates P (T_i = 1|X_i) (Rosenbaum & Rubin, 1983). Matching is conducted in R (MatchIt package; Ho et al., 2011) using generalized boosted models (GBM) for propensity estimation (Lee et al., 2010; McCaffrey et al., 2004; Setoguchi et al., 2008) and one-to-one nearest-neighbor matching (Rubin, 1973; Thoemmes & Kim, 2011).

We then estimate four meta-learners (S-, T-, X-, and R-learners) and one tailored uplift random forest using logistic regression, random forest, and XGBoost as base learners. For the uplift random forest, we test the KL, ED, and CHI split criteria (Rzepakowski & Jaroszewicz, 2012a). The uplift random forest serves as a benchmark due to its strong prior performance (Ascarza, 2018; Devriendt et al., 2018; Guelman et al., 2015). All models are implemented in Python using the CausalML library (H. Chen et al., 2020), with scikit-learn (Pedregosa et al., 2011) and XGBoost (T. Chen & Guestrin, 2016) for base learners.

For model training, uplift models use both treatment and control observations, churn prediction models use control-group data only, and response models use treatment-group data reflecting standard practice (Ascarza, 2018; Devriendt, Berrevoets, & Verbeke, 2021). Hyperparameters are optimized with the Hyperopt library (Bergstra et al., 2013) using the Tree of Parzen Estimators algorithm (Bergstra et al., 2011) over 50 trials. Performance is scored via Qini coefficient (uplift models) and AUC (propensity models). We also record runtime per trial and compare efficiency using paired t-tests. Detailed hyperparameter grids and settings are available upon request.

Analysis

We conduct different analyses to evaluate the effectiveness of targeting a retention campaign based on uplift and propensity models. We start by predicting the metrics used for the targeting decision for each of the considered models. The target metric for the uplift model is the expected incremental effect of the campaign (uplift). For the customer response model, it is the response propensity, and for the churn prediction model, it is the churn propensity. We predict the metrics for each observation in the outer validation folds. The metrics are defined as follows:

u p l i f t = P (Y_{j} = 1 | T_{j} = 1, X_{j} = x_{j}) - P (Y_{j} = 1 | T_{j} = 0, X_{j} = x_{j})

(1)

r e s p o n s e p r o p e n s i t y = P (Y_{j} = 1 | T_{j} = 1, X_{j} = x_{j})

(2)

c h u r n p r o p e n s i t y = P (Y_{j} = 1 | X_{j} = x_{j})

(3)

where $x$ denotes the covariates of an observation $j$ in the validation folds, and $T$ indicates the treatment condition. Note that we recoded the outcome labels for the churn prediction case for better interpretability to 1 = churn, 0 = no churn (i.e., for the uplift and response model, the outcome label 1 represents no churn). To assess the effectiveness of the different modeling approaches, we analyze the impact of the retention campaign under three targeting scenarios: (1) the targeting of customers is based on a churn prediction model (i.e., the propensity to churn), which is the most commonly used approach in practice, (2) the company selects customers for a retention campaign based on a customer response model (i.e., the response propensity), and (3) the targeting decision is based on an uplift model (i.e., the incremental effect of the campaign). We follow the approach applied by Ascarza (2018) and compare the accumulated average treatment effects (ATE) among different subgroups of observations, defined by percentiles of the predicted metrics (i.e., uplift, response propensity churn propensity). We compare different targeting thresholds to evaluate the three scenarios. For example, we evaluate the potential impact of the retention campaign when targeting the top 20% of customers according to the propensity to churn, response propensity, or uplift, respectively. We proceed as follows: We start with ranking the observations in the validation sample based on the corresponding metric (i.e., uplift, churn propensity, and response propensity) from high to low. The first 10% percentile $P_{10}$ includes all observations whose predicted value of the corresponding metric (i.e., uplift, churn propensity, and response propensity) is higher than for observations in the subsequent percentiles $P_{20} > P_{30} \dots > P_{100} .$ Next, for each percentile $P$ = 10%, 20%, 30% … 100% we build subgroups containing the top $P$ observations (Ascarza, 2018). $P$ = 100% corresponds with a targeting decision where all customers in the sample are targeted. We then calculate the accumulated average treatment effect by subtracting the churn rate in the treatment group from the churn rate in the control group within the corresponding percentile. For the uplift model and response model, we calculate the ATE as follows:

{A T E}_{P} = (1 - (\frac{1}{M_{k}} \sum_{k ϵ c o n t r o l} [Y_{s} = 1])) - (1 - (\frac{1}{M_{k^{'}}} \sum_{k^{'} ϵ t r e a t m e n t} [Y_{s^{'}} = 1]))

(4)

and for the churn prediction model as follows:

{A T E}_{P} = (\frac{1}{M_{k}} \sum_{k ϵ c o n t r o l} [Y_{s} = 1]) - (\frac{1}{M_{k^{'}}} \sum_{k^{'} ϵ t r e a t m e n t} [Y_{s^{'}} = 1])

(5)

where $M_{k}$ denotes the number of observations in the percentile P that belong to the control group. $M_{k^{'}}$ refers to the number of observations in the respective percentile P, belonging to the treatment group. Similar to equation (4), for the uplift and response model, the outcome label equals 1 corresponds with no churn (i.e., positive response). Accordingly, we calculate the churn rates by subtracting the response rate from 1.

Results

Covariate Balance

To evaluate the effectiveness of propensity score matching, we examine the balance of the covariate distribution between the treatment and control group before and after employing propensity score matching. For the covariates account length, voice mail plan, number voice mail messages, and customer service calls, the magnitude of the standardized mean differences (SMD) exceeds the threshold of 0.1 before propensity score matching. After propensity score matching the magnitude of the standardized mean differences is less than 0.1 across all covariates (see Table 1).

Table 1.

Covariate Balance After Propensity Score Matching

	Control (n = 990)		Treatment (n = 990)
	M	(SD)	M	(SD)	SMD	\|SMD\| < 0.1
Area code 408^a	0.25	(−)	0.26	(−)	0.01	Balanced
Area code 415^b	0.49	(−)	0.48	(−)	−0.01	Balanced
Area code 510^c	0.26	(−)	0.26	(−)	−0.01	Balanced
Account length	104.28	(37.92)	104.59	(38.76)	0.01	Balanced
International plan^d	0.09	(−)	0.08	(−)	−0.01	Balanced
Voice mail plan^e	0.09	(−)	0.09	(−)	0.00	Balanced
Number vmail messages	1.91	(6.01)	1.96	(6.17)	0.01	Balanced
Total day minutes	178.98	(55.03)	176.80	(53.68)	−0.04	Balanced
Total day calls	99.86	(20.00)	100.57	(20.46)	0.04	Balanced
Total day charge	30.43	(9.35)	30.06	(9.13)	−0.04	Balanced
Total eve minutes	200.67	(50.49)	200.62	(50.54)	−0.01	Balanced
Total eve calls	100,23	(19.62)	100.54	(20.13)	0.02	Balanced
Total eve charge	17.06	(4.29)	17.05	(4.30)	−0.01	Balanced
Total night minutes	202.08	(48.07)	199.25	(50.76)	−0.05	Balanced
Total night calls	100.79	(18.88)	100.24	(19.13)	−0.03	Balanced
Total night charge	9.09	(2.16)	8.96	(2.28)	−0.05	Balanced
Total intl minutes	10.16	(2.80)	10.23	(2.84)	0.02	Balanced
Total intl calls	4.50	(2.47)	4.47	(2.50)	−0.01	Balanced
Total intl charge	2.25	(0.79)	2.30	(0.80)	0.07	Balanced
Customer service calls	1.98	(1.43)	1.97	(1.17)	−0.01	Balanced

After matching, the dataset consists of 1980 observations, equally distributed across the treatment and control group as matched pairs. Detailed information about the characteristics of the matched dataset is provided in Table 2.

Table 2.

Matched Dataset Characteristics

	Number of customers	Share (%)	Number of churners	Churn rate (%)	Uplift (%)
Treatment	990	50	277	27.9	16.4
Control	990	50	438	44.3	16.4
Total	1980	100	715	36.1	-

Performance Comparison of Meta-Learner and Tailored Uplift Models

To examine the competitiveness of meta-learner uplift modeling approaches against tailored uplift modeling approaches, we compare the four meta-learners’ predictive performance with the tailored uplift random forest algorithm. Table 3 details the predictive performance in terms of the Qini coefficient for each meta-learner paired with every base learner, and for the uplift random forest paired with each splitting criterion considered. The reported Qini coefficients are median values over the 10 out-of-sample validation folds. The p-values are based on pairwise Wilcoxon signed-rank tests. We use boldface to indicate the best performing combination of each meta-learner approach and base-learner, and uplift random forest and splitting criterion, respectively.

Table 3.

Computation Time per Trial of Uplift Random Forest and Meta-Learners

Approach	Model	Computation time in sec/trial		Uplift RF-KL	Uplift RF-ED	Uplift RF-CHI
Approach	Model	M	(SD)	T	T	T
Uplift RF	KL	1177.06	(771.88)	-	-	-
	ED	1105.08	(747.21)	1.72	-	-
	CHI	1130.54	(745.63)	0.99	0.57	-
S-learner	XGB	3.08	(1.52)	33.97***	32.93***	33.77***
	RF	3.15	(1.60)	33.98***	32.94***	33.77***
	LR	0.58	(0.37)	34.05***	33.02***	33.85***
T-learner	XGB	3.20	(1.66)	33.97***	32.95***	33.77***
	RF	4.19	(2.27)	33.94***	32.91***	33.75***
	LR	0.53	(0.22)	34.05***	33.02***	33.85***
X-learner	XGB	10.05	(1.78)	33.77***	32.74***	33.57***
	RF	18.88	(2.35)	33.53***	32.47***	33.31***
	LR	7.23	(1.09)	33.85***	32.83***	33.65***
R-learner	XGB	16.07	(5.43)	33.58***	32.55***	33.38***
	RF	23.28	(3.86)	33.40***	32.35***	33.18***
	LR	6.66	(1.14)	33.87***	32.83***	33.67***

As shown in Table 4 comparing the performance in terms of absolute Qini coefficients, the overall best performing model is the S-learner paired with XGBoost as base-learner with a Qini coefficient of 0.0926. Moreover, the T-, X- and R-learner interact best with logistic regression as base learner. Among the uplift random forest versions, the configuration with the KL splitting criterion performs best with a Qini coefficient of 0.0278. The pairwise Wilcoxon signed-rank test shows that the S-learner paired with XGBoost and logistic regression, as well as the T-learner with logistic regression as base-learner, outperforms all configurations of the uplift random forest. The predictive performance of the remaining combinations of meta-learner strategies with base-learners are statistically on par with the uplift random forest configurations. Figure 1 plots the Qini curves for the best-performing uplift models per algorithm category together with a baseline depicting random targeting.

Table 4.

Qini Performance of Uplift Random Forest and Meta-Learners

Approach	Model	Qini coefficient	Uplift RF-KL	Uplift RF-ED	Uplift RF-CHI
Approach	Model	Mdn	W	W	W
Uplift RF	KL	0.0278	-	-	-
	ED	0.0223	24	-	-
	CHI	0.0118	25	26	-
S-learner	XGB	0.0926	2***	2***	1***
	RF	0.0349	26	24	24
	LR	0.0750	0***	0***	0***
T-learner	XGB	0.0427	14	13	12
	RF	0.0287	24	22	16
	LR	0.0763	0***	0***	0***
X-learner	XGB	0.0287	23	27	27
	RF	0.0315	16	18	16
	LR	0.0505	22	22	23
R-learner	XGB	0.0233	21	25	25
	RF	0.0145	22	21	24
	LR	0.0480	13	14	15

Figure 1.

Qini curves for best performing uplift model per approach

In Figure 1, the S-learner shows the steepest ascent within the first 10% of targeted customers and performs better than the baseline for all customers targeted. The Qini curve of the T-learner ascends slightly less steep for the top 10% of customers targeted but follows a similar pattern as the curve of the S-learner. Both models achieve their optimal uplift value when targeting around 40% - 50% of all customers. However, for the last 10% of customers targeted, the T-learner shows a weaker performance than the baseline. Compared to the S- and T-learner, the uplift random forest shows a considerably shallower progression of the uplift curve. Similar to the X-learner, the uplift random forest performs better than the baseline when targeting around 70% - 80% of all customers. The R-learner performs better than the baseline, except when targeting the top 10% of total customers, which results in a negative Qini value.

To provide a comprehensive assessment of the competitiveness of the considered uplift modeling techniques, we additionally examine the performance in terms of computational efficiency to evaluate the practicability for use in practice. Table 5 details the average computing time for one trial within the hyperparameter optimization process. The reported runtimes in seconds are averaged values over 50 trials conducted in the inner cross-validation loop per iteration of the outer 10-fold cross-validation loop. The p-values are based on pairwise comparison using dependent sample t-tests.

Table 5.

AUC Performance of Churn Prediction and Response Models

Approach	Model	AUC score
		Mdn
Churn prediction	XGB	0.956
	RF	0.940
	LR	0.969
Customer response	XGB	0.953
	RF	0.930
	LR	0.972

Table 3 highlights the computational efficiency of the meta-learning techniques. All meta-learners significantly outperform the three configurations of the uplift random forest in terms of the computation time within the hyperparameter optimization process. The best-performing uplift random forest configuration in terms of predictive performance (i.e., uplift random forest with KL splitting criterion) shows an average runtime of 1177.06 seconds (SD = 771.88) per trial. The meta-learner with the best predictive performance (i.e., S-learner paired with XGBoost) shows an average runtime of 3.08 seconds (SD = 1.52) per trial which is around 382 times faster. Among the meta-learners, the combination with logistic regression as base-learner shows the shortest computation time, followed by the combination with XGBoost and random forest as base-learner. On average, the less complex meta-learners (i.e., S- and T-learner) show faster computation times than the more complex meta-learners (i.e., X- and R-learner).

To identify the best-performing classifier for the propensity models (i.e., churn and response model) used for further analyses, we examine their predictive performance using the AUC score. We report the AUC score as the median value over the 10 out-of-sample validation folds. The best-performing classifier for each modeling strategy is highlighted in bold.

Table 5 reveals that all classifiers are substantially better at predicting churn or customers’ response than the baseline of a random model achieving an AUC of 0.5. For both modeling strategies, logistic regression achieves the highest absolute AUC score and ranks before the XGBoost and random forest classifiers.

Comparison of Targeting Effectiveness Using Uplift and Propensity Models

We analyze whether targeting based on uplift or propensity models leads to more effective retention campaigns. For the examination, we compare the best meta-learner uplift model – S-learner paired with XGBoost – with the best propensity models – churn prediction model and customer response model, each based on logistic regression. We evaluate the differences in the treatment effect for each modeling approach by comparing the churn rates between treatment and control groups for customers with different levels of the model-specific target metric (i.e., uplift, churn propensity, response propensity). Figure 2 illustrates the effect of the treatment for each of the targeting approaches, assuming different proportions of targeted customers. The blue circles, orange squares, and green triangles represent the accumulated average treatment effects for the corresponding approach over the 10 out-of-sample validation folds. The bars illustrate the standard deviation for each approach’s reported mean values². Figure 2 reveals that targeting based on uplift results in a substantially larger accumulated treatment effect, up to a proportion of 40% of customers, than based on churn propensity or response propensity. For instance, when targeting 30% of the customers based on the response propensity, the treatment (i.e., retention campaign) has no effect in terms of reducing churn. Targeting the same proportion of customers based on their predicted churn propensity, churn can be reduced by 14.3 %. However, when the same proportion of customers is selected based on their predicted uplift, the same treatment can reduce churn by 36.9 %. Thus, the difference in churn reduction between using uplift and the response propensity as a targeting metric is 36.9 %, and between uplift and the churn propensity, 22.6 %. This difference between the three targeting approaches diminishes as the proportion of the targeted customers’ increases.

Figure 2.

Accumulated treatment effects per model

Discussion and Theoretical Implications

In recent years, predictive models have become essential tools for marketers seeking to anticipate customer behavior and improve retention decisions (Z.-Y. Chen et al., 2015; Hair & Sarstedt, 2021). Extant customer retention literature has focused primarily on propensity models such as churn prediction and response models (Caigny et al., 2018), while uplift modeling has emerged as a promising alternative that directly estimates the incremental impact of marketing actions (Ascarza, 2018; Devriendt, Berrevoets, & Verbeke, 2021). Although prior research has explored the effectiveness of uplift models, few studies have examined their actionability or applicability in practice. Our study contributes to this gap by proposing and testing an accessible uplift modeling procedure that combines propensity score matching (PSM) with meta-learners to enable causal inference from observational data.

Theoretically, our findings extend the understanding of uplift modeling in three main ways.

First, we demonstrate that propensity score matching effectively reduces covariate imbalance in observational data and can serve as a viable alternative to randomized controlled trials for uplift estimation. This finding aligns with evidence from statistics and econometrics (Imbens, 2004; Rubin, 2012) and extends these insights to marketing analytics. By validating PSM for uplift modeling, we broaden the methodological foundation of customer retention research and provide a bridge between causal inference and predictive analytics.

Second, our results show that meta-learner-based uplift models can perform as well as or better than specialized algorithms such as the uplift random forest while requiring less computational effort. This confirms recent findings by Zhao and Harinen (2019) and Zhang et al. (2022), highlighting that modern meta-learning techniques offer both accuracy and scalability. These results suggest that uplift modeling can be implemented using standard machine learning tools, making causal targeting methods more practical for marketing researchers and practitioners.

Third, our analysis reveals that the performance of meta-learners depends on the choice of base algorithm. For instance, XGBoost performs particularly well in combination with the S-learner, whereas logistic regression provides robust results across multiple learner types. This underscores that no single algorithm universally outperforms others. Instead, performance depends on data characteristics and campaign objectives. Practically, analytics teams should conduct small-scale validation experiments to identify the optimal combination of meta- and base learners for their specific datasets.

In summary, our discussion emphasizes the key theoretical implications: Uplift modeling can be effectively extended to observational data using PSM, implemented efficiently with meta-learners, and tailored flexibly to different marketing contexts. These insights advance the theoretical integration of causal inference into marketing analytics and support more evidence-based decision-making in customer retention management.

Practical Implications

In addition to our theoretical contributions, this study offers clear and actionable implications for marketing managers, data scientists, and commercial research teams responsible for customer retention and campaign optimization.

First, our results demonstrate that propensity score matching (PSM) provides a practical and cost-efficient alternative to fully randomized field experiments. Instead of running controlled trials which are often difficult, costly, or inconsistent with established targeting procedures, marketing teams can use existing customer data from CRM or loyalty systems and apply PSM to approximate the benefits of randomization. This enables firms to evaluate campaign effects more accurately and make causal inferences from everyday business data (Ascarza, 2018; Rossi & Allenby, 2003; Zhang et al., 2022). In commercial settings where access to real-time experimentation is limited, this approach can significantly reduce analytical costs while improving evidence-based decision-making.

Second, the findings highlight that meta-learner uplift models can be implemented with standard analytics tools already used in many firms, such as Python or R, thereby lowering the entry barrier for advanced causal analytics. Compared with tailored uplift algorithms, meta-learners achieve similar or superior predictive performance with far less computational and maintenance effort. This means that analytics teams can integrate uplift modeling into existing marketing automation or customer intelligence workflows without major system overhauls. The results therefore encourage organizations to expand their analytics toolbox and upskill marketing analysts in the application of causal machine learning approaches (Wedel & Kannan, 2016).

Third, from a strategic marketing perspective, our findings show that targeting retention campaigns based on uplift scores rather than churn or response propensities substantially improves campaign efficiency and ROI. For example, in our analysis, targeting only 30% of customers based on uplift scores reduced churn by 22–37 percentage points more than using conventional models. This implies that marketing budgets can achieve higher impact with fewer contacts, directly addressing the long-standing challenge of maximizing the return on marketing investment (Ascarza, 2018; Rust et al., 2004). Managerially, this means firms should prioritize “persuadable” customers (those whose churn likelihood decreases when targeted) rather than “sure things” and “lost causes,” who either would not churn or cannot be retained despite intervention. By adopting uplift modeling as the decision criterion for campaign selection, managers can allocate resources more effectively, improve customer lifetime value, and minimize unnecessary retention spending. Especially when budgets are limited or campaigns must comply with data protection and contact frequency restrictions, uplift-based targeting provides a data-driven framework for precision retention marketing.

Overall, our results underscore that uplift modeling is not only a methodological innovation but a commercially viable instrument for improving customer retention management. We recommend that firms integrate uplift modeling into their marketing analytics pipelines, test it alongside traditional scoring models, and monitor the incremental financial gains in pilot campaigns.

Limitations and Avenues for Future Research

As with other research, our study is not free of limitations. First, the analysis is based on a semi-synthetic churn dataset with 1,980 customers, where real-world covariates were combined with simulated outcomes. While this approach allows for controlled experimentation and causal benchmarking, the use of simulated outcomes may introduce biases that do not fully reflect real-world behavioral dynamics, such as unobserved heterogeneity or non-linear responses to marketing interventions. Ideally, the results should be validated with a real-world dataset. However, obtaining access to actual customer data is challenging, as firms are often reluctant to share such strategically sensitive information (Caigny et al., 2021). Our simulation procedure mitigates some of these issues by grounding the data in realistic covariates from an existing telecom dataset, ensuring structural similarity to real retention contexts. Nevertheless, future research should replicate the results using proprietary or field data to strengthen external validity.

Second, we employ cross-sectional data in this study. However, real-world retention campaigns often consist of multiple actions performed over a longer time horizon accompanied by changes in treatment exposure (Ascarza, 2018). Extending the framework to a longitudinal setting could reveal how repeated campaigns affect customer responsiveness and model stability over time.

Third, our study focuses on a mobile operator case with a churn rate of 36%. We acknowledge that the advantage of implementing our proposed procedure could be less pronounced in industries with lower churn rates. Hence, future work could examine settings such as subscription-based or B2B markets to assess generalizability across contexts.

Beyond these limitations, several research avenues remain. Future studies could explore the interaction between meta-learner types and base algorithms in greater depth, as our results suggest that their performance is context-dependent. Moreover, we used one-to-one propensity score matching to enable the use of observational data for uplift modeling, which may discard unmatched cases and reduce statistical power. Future research could therefore evaluate alternative balancing methods, such as full matching (Rosenbaum, 1991; Stuart & Green, 2008) or inverse probability of treatment weighting (Lunceford & Davidian, 2004; Robins et al., 2000), which use all available data.

Footnotes

ORCID iD

Matthias Handrich

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

References

Ascarza

(2018). Retention futility: Targeting high-risk customers might be ineffective. Journal of Marketing Research, 55(1), 80–98. https://doi.org/10.1509/jmr.16.0163

Ascarza

Iyengar

Schleicher

(2016). The perils of proactive churn prevention using plan recommendations: Evidence from a field experiment. Journal of Marketing Research, 53(1), 46–60. https://doi.org/10.1509/jmr.13.0483

Ascarza

Netzer

Hardie

B. G. S.

(2018). Some customers would rather leave without saying goodbye. Marketing Science, 37(1), 54–77. https://doi.org/10.1287/mksc.2017.1057

Austin

P. C.

(2009). Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research. Communications in Statistics - Simulation and Computation, 38(6), 1228–1234. https://doi.org/10.1080/03610910902859574

Austin

P. C.

(2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3), 399–424. https://doi.org/10.1080/00273171.2011.568786

Bergstra

Bardenet

Bengio

Kégl

(2011). Algorithms for hyper-parameter optimization Advances in Neural Information Processing Systems (24). Curran Associates, Inc. https://proceedings.neurips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf

Bergstra

Yamins

Cox

(2013). Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. Proceedings of the 30th International Conference on Machine Learning, 28, 115–123. https://proceedings.mlr.press/v28/bergstra13.html

Blattberg

R. C.

Kim

B.-D.

Neslin

S. A.

(2008). Database marketing: Analyzing and managing customers. International series in quantitative marketing. Springer. https://swbplus.bsz-bw.de/bsz265010640cov.htm

Breiman

Friedman

J. H.

Olshen

R. A.

Stone

C. J.

(2017). Classification and regression trees. Routledge. https://doi.org/10.1201/9781315139470

10.

Caigny

Coussement

Bock

(2018). A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research, 269(2), 760–772. https://doi.org/10.1016/j.ejor.2018.02.009

11.

Caigny

Coussement

Verbeke

Idbenjra

Phan

(2021). Uplift modeling and its implications for B2B customer churn prediction: A segmentation-based modeling approach. Industrial Marketing Management, 99, 28–39. https://doi.org/10.1016/j.indmarman.2021.10.001

12.

Caliendo

Kopeinig

(2008). Some practical guidance for the implementation of propensity score matching. Journal of Economic Surveys, 22(1), 31–72. https://doi.org/10.1111/j.1467-6419.2007.00527.x

13.

Chen

Harinen

Lee

J.-Y.

Yung

Zhao

(2020). CausalML: Python package for causal machine learning. https://doi.org/10.48550/arXiv.2002.11631

14.

Chen

Guestrin

(2016). XGBoost. In Krishnapuram

Shah

Smola

Aggarwal

Shen

Rastogi

(Eds.), Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785–794). ACM. https://doi.org/10.1145/2939672.2939785

15.

Chen

Z.-Y.

Fan

Z.-P.

Sun

(2015). Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data. European Journal of Operational Research, 241(2), 422–434. https://doi.org/10.1016/j.ejor.2014.09.008

16.

Curth

van der Schaar

(2021). Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms. In Banerjee

Fukumizu

(Eds.), Proceedings of machine learning research, proceedings of the 24th international conference on artificial intelligence and statistics (pp. 1810–1818). PMLR. https://proceedings.mlr.press/v130/curth21a.html

17.

Devriendt

Berrevoets

Verbeke

(2021). Why you should stop predicting customer churn and start using uplift models. Information Sciences, 548, 497–515. https://doi.org/10.1016/j.ins.2019.12.075

18.

Devriendt

Guns

Verbeke

(2021). Learning to rank for uplift modeling. IEEE Transactions on Knowledge and Data Engineering, 1. https://doi.org/10.1109/TKDE.2020.3048510

19.

Devriendt

Moldovan

Verbeke

(2018). A literature survey and experimental evaluation of the state-of-the-art in uplift modeling: A stepping stone toward the development of prescriptive analytics. Big Data, 6(1), 13–41. https://doi.org/10.1089/big.2017.0104

20.

Fredrik

Uri

David

(2016). Learning representations for counterfactual inference. Proceedings of The 33rd International Conference on Machine Learning, 48, 3020–3029. https://proceedings.mlr.press/v48/johansson16.html

21.

Gubela

Lessmann

Haupt

Baumann

Radmer

Gebert

(2017). Revenue uplift modeling. In 2017 proceedings of the 38th international conference on information systems (ICIS). ICIS.

22.

Guelman

Guillén

Pérez-Marín

A. M.

(2012). Random forests for uplift modeling: An insurance customer retention case. In van der Aalst

Mylopoulos

Rosemann

Shaw

M. J.

Szyperski

Engemann

K. J.

Gil-Lafuente

A. M.

Merigó

J. M.

(Eds.), Lecture Notes in Business Information Processing. Modeling and Simulation in Engineering, Economics and Management (115, pp. 123–133). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-30433-0_13

23.

Guelman

Guillén

Pérez-Marín

A. M.

(2015). Uplift random forests. Cybernetics & Systems, 46(3-4), 230–248. https://doi.org/10.1080/01969722.2015.1012892

24.

Gutierrez

Gérardy

J.-Y.

(2017). Causal inference and uplift modelling: A review of the literature. In International conference on predictive applications and APIs (pp. 1–13). PMLR.

25.

Hair

J. F.

Sarstedt

(2021). Data, measurement, and causal inferences in machine learning: Opportunities and challenges for marketing. Journal of Marketing Theory and Practice, 29(1), 65–77. https://doi.org/10.1080/10696679.2020.1860683

26.

Hansotia

Rukstales

(2002). Incremental value modeling. Journal of Interactive Marketing, 16(3), 35–46. https://doi.org/10.1002/dir.10035

27.

Haupt

Jacob

Gubela

Lessmann

(2019). Affordable uplift: Supervised randomization in controlled experiments. In ICIS 2019 international conference on information system (ICIS). AIS. https://aisel.aisnet.org/icis2019/data_science/data_science/24

28.

Hill

J. L.

(2011). Bayesian nonparametric modeling for causal inference. Journal of Computational & Graphical Statistics, 20(1), 217–240. https://doi.org/10.1198/jcgs.2010.08162

29.

D. E.

Imai

King

Stuart

E. A.

(2011). MatchIt: Nonparametric preprocessing for parametric causal inference. Journal of Statistical Software, 42(8). https://doi.org/10.18637/jss.v042.i08

30.

Imbens

G. W.

(2004). Nonparametric estimation of average treatment effects under exogeneity: A review. The Review of Economics and Statistics, 86(1), 4–29. https://doi.org/10.1162/003465304323023651

31.

Jaskowski

Jaroszewicz

(2012). Uplift modeling for clinical trial data. In ICML workshop on clinical data analysis: Edinburgh.

32.

Kane

V. S.

Zheng

(2014). Mining for the truly responsive customers and prospects using true-lift modeling: Comparison of new and existing methods. Journal of Marketing Analytics, 2(4), 218–238. https://doi.org/10.1057/jma.2014.18

33.

Künzel

S. R.

Sekhon

J. S.

Bickel

P. J.

(2019). Meta-learners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences, 116(10), 4156–4165. https://doi.org/10.1073/pnas.1804597116

34.

Lai

Wang

Ling

Shi

Zhang

(2006). Direct marketing when there are voluntary buyers. In Sixth international conference on data mining (ICDM'06) (pp. 922–927). IEEE. https://doi.org/10.1109/ICDM.2006.54

35.

Lee

B. K.

Lessler

Stuart

E. A.

(2010). Improving propensity score weighting using machine learning. Statistics in Medicine, 29(3), 337–346. https://doi.org/10.1002/sim.3782

36.

Lemmens

Gupta

(2020). Managing churn to maximize profits. Marketing Science, 39(5), 956–973. https://doi.org/10.1287/mksc.2020.1229

37.

Yan

Deng

Chu

Qiao

Xiong

(2018). A policy gradient method with variance reduction for uplift modeling. ArXiv Preprint ArXiv:1811.10158.

38.

V. S. Y.

(2002). The true lift model. ACM SIGKDD Explorations Newsletter, 4(2), 78–86. https://doi.org/10.1145/772862.772872

39.

Louizos

Shalit

Mooij

J. M.

Sontag

Zemel

Welling

(2017). Causal effect inference with deep latent-variable models Advances in Neural Information Processing Systems (30). Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/94b5bde6de888ddf9cde6748ad2523d1-Paper.pdf

40.

Lunceford

J. K.

Davidian

(2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine, 23(19), 2937–2960. https://doi.org/10.1002/sim.1903

41.

McCaffrey

D. F.

Ridgeway

Morral

A. R.

(2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods, 9(4), 403–425. https://doi.org/10.1037/1082-989X.9.4.403

42.

Neslin

S. A.

Gupta

Kamakura

Mason

C. H.

(2006). Defection detection: Measuring and understanding the predictive accuracy of customer churn models. Journal of Marketing Research, 43(2), 204–211. https://doi.org/10.1509/jmkr.43.2.204

43.

Nie

Wager

(2021). Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2), 299–319. https://doi.org/10.1093/biomet/asaa076

44.

Okasa

(2022). Meta-learners for estimation of causal effects: Finite sample cross-fit performance. https://doi.org/10.48550/arXiv.2201.12692

45.

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

Blondel

Prettenhofer

Weiss

Dubourg

(2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.

46.

Pondel

Wuczyński

Gryncewicz

Łysik

Ł.

Hernes

Rot

Kozina

(2021). Deep learning for customer churn prediction in E-Commerce decision support. Business Information Systems, 3–12. https://doi.org/10.52825/bis.v1i.42

47.

Provost

Fawcett

(2013). Data science for business: What you need to know about data mining and data-analytic thinking. O'Reilly Media, Inc.

48.

Radcliffe

(2007). Using control groups to target on predicted lift: Building and assessing uplift model. Direct Marketing Analytics Journal, 14–21. https://www.research.ed.ac.uk/en/publications/using-control-groups-to-target-on-predicted-lift-building-and-ass

49.

Radcliffe

Surry

(1999). Differential response analysis: Modeling true responses by isolating the effect of a single action. Credit Scoring and Credit Control, IV. https://www.research.ed.ac.uk/en/publications/differential-response-analysis-modeling-true-responses-by-isolati

50.

Robins

J. M.

Hernán

M. Á.

Brumback

(2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11(5), 550–560. https://journals.lww.com/epidem/Fulltext/2000/09000/Marginal_Structural_Models_and_Causal_Inference_in.11.aspx

51.

Rosenbaum

P. R.

(1991). A characterization of optimal designs for observational studies. Journal of the Royal Statistical Society: Series B, 53(3), 597–610. https://doi.org/10.1111/j.2517-6161.1991.tb01848.x

52.

Rosenbaum

P. R.

Rubin

D. B.

(1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55. https://doi.org/10.1093/biomet/70.1.41

53.

Rosenbaum

P. R.

Rubin

D. B.

(1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. The American Statistician, 39(1), 33–38. https://doi.org/10.1080/00031305.1985.10479383

54.

Rossi

P. E.

Allenby

G. M.

(2003). Bayesian statistics and marketing. Marketing Science, 22(3), 304–328.

55.

Rößler

Roman

Detlef

(2021). To treat, or not to treat: Reducing volatility in uplift modeling through weighted ensembles. In Proceedings of the 54th Hawaii international conference on system sciences.

56.

Rubin

D. B.

(1973). Matching to remove bias in observational studies. Biometrics, 29(1), 159. https://doi.org/10.2307/2529684

57.

Rubin

D. B.

(2012). Matched sampling for causal effects. Cambridge University Press. https://doi.org/10.1017/CBO9780511810725

58.

Rust

R. T.

Lemon

K. N.

Zeithaml

V. A.

(2004). Return on marketing: Using customer equity to focus marketing strategy. Journal of Marketing, 68(1), 109–127.

59.

Rzepakowski

Jaroszewicz

(2012a). Decision trees for uplift modeling with single and multiple treatments. Knowledge and Information Systems, 32(2), 303–327. https://doi.org/10.1007/s10115-011-0434-0

60.

Rzepakowski

Jaroszewicz

(2012b). Uplift modeling in direct marketing. Journal of Telecommunications and Information Technology(2), 43–50.

61.

Setoguchi

Schneeweiss

Brookhart

M. A.

Glynn

R. J.

Cook

E. F.

(2008). Evaluating uses of data mining techniques in propensity score estimation: A simulation study. Pharmacoepidemiology and Drug Safety, 17(6), 546–555. https://doi.org/10.1002/pds.1555

62.

Sołtys

Jaroszewicz

Rzepakowski

(2015). Ensemble methods for uplift modeling. Data Mining and Knowledge Discovery, 29(6), 1531–1559. https://doi.org/10.1007/s10618-014-0383-9

63.

Stuart

E. A.

(2010). Matching methods for causal inference: A review and a look forward. Statistical Science, 25(1), 1–21. https://doi.org/10.1214/09-STS313

64.

Stuart

E. A.

Green

K. M.

(2008). Using full matching to estimate causal effects in nonexperimental studies: Examining the relationship between adolescent marijuana use and adult outcomes. Developmental Psychology, 44(2), 395–406. https://doi.org/10.1037/0012-1649.44.2.395

65.

Thoemmes

F. J.

Kim

E. S.

(2011). A systematic review of propensity score methods in the social sciences. Multivariate Behavioral Research, 46(1), 90–118. https://doi.org/10.1080/00273171.2011.540475

66.

Wedel

Kannan

P. K.

(2016). Marketing analytics for data-rich environments. Journal of Marketing, 80(6), 97–121.

67.

Frost

(2018). Pylift: A fast python package for uplift modeling. https://github.com/wayfair/pylift

68.

Zhang

Liu

(2022). A unified survey of treatment effect heterogeneity modelling and uplift modelling. ACM Computing Surveys, 54(8), 1–36. https://doi.org/10.1145/3466818

69.

Zhao

Harinen

(2019). Uplift modeling for multiple treatments with cost optimization. In 2019 IEEE international conference on data science and advanced analytics (DSAA). IEEE. https://doi.org/10.1109/dsaa.2019.00057

Using Meta-Learners and Propensity Score Matching to Optimize Customer Retention

Abstract

Keywords

Introduction

Related Work

General Uplift Modeling Literature

Uplift Modeling for Customer Retention

Methodology

Dataset Description and Preparation

Experimental Setup and Procedure

Analysis

Results

Covariate Balance

Performance Comparison of Meta-Learner and Tailored Uplift Models

Comparison of Targeting Effectiveness Using Uplift and Propensity Models

Discussion and Theoretical Implications

Practical Implications

Limitations and Avenues for Future Research

Footnotes

ORCID iD

Funding

Declaration of conflicting interests

Notes

References