Sage Journals: Discover world-class research

Abstract

Nowadays, logistics service providers (LSPs) increasingly consider using a crowdsourced workforce on the last mile to fulfill customers’ expectations regarding same-day or on-demand delivery at reduced costs. The crowdsourced workforce’s availability is, however, uncertain. Therefore, LSPs often hire additional fixed employees to perform deliveries when the availability of crowdsourced drivers is low. In this context, the reliability versus flexibility trade-off which LSPs face over a longer period, for example, a year, remains unstudied. Against this background, we jointly study a workforce planning problem that considers salaried drivers (SDs) and the temporal development of the crowdsourced driver (CD) fleet over a long-term time horizon. We consider two types of CDs, dedicated gig-drivers (DDs) and opportunistic gig-drivers (ODs). While DDs are not sensitive to the request’s destination and typically exhibit high availability, ODs only serve requests whose origin and destination coincide with their own private route’s origin and destination. Moreover, to account for time horizon-specific dynamics, we consider stochastic turnover for both SDs and CDs as well as stochastic CD fleet growth. We formulate the resulting workforce planning problem as a Markov decision process whose reward function reflects total costs, that is, wages and operational costs arising from serving demand with SDs and CDs, and solve it via approximate dynamic programming. Applying our approach to an environment based on real-world demand data from GrubHub, we find that in fleets consisting of SDs and CDs, approximate dynamic programming (ADP)-based hiring policies can outperform myopic hiring policies by up to $19 %$ and lookahead policies with perfect knowledge of future information by up to $10 %$ in total costs. In the studied setting, we observed that DDs reduce the LSP’s total costs more than ODs. When we account for CDs’ increased resignation probability when not being matched with enough requests, the amount of required SDs increases.

Keywords

Strategic workforce planning on-demand delivery crowdsourced delivery Markov decision processes dynamic programming

1. Introduction

In recent years, on-demand home delivery services experienced significant growth, especially in urban areas. Global e-commerce sales grew by $38 %$ in a year-over-year comparison in 2021 (Forbes, 2021), and consumers increasingly use meal and grocery delivery apps. This rapid growth in demand for delivery services leads to increased customer expectations regarding same-hour or same-day delivery. Herein, logistics service providers (LSPs) face a dilemma as they must provide sufficient delivery capacity to rapidly serve increasing demand while maintaining low costs to remain competitive. To address this challenge, LSPs increasingly consider crowdsourcing: they outsource delivery requests to crowdsourced drivers (CDs), that is, independent contractors who flexibly decide when and where to work and are paid per request and not per hour. CDs can decide to leave the LSP’s fleet at any time. The LSP profits from the CDs as an easily scalable workforce at the price of uncertainty in the CDs’ availability, both from a strategic long-term and an operational short-term perspective. While many LSPs base their business models exclusively on CDs, for example, Doordash or Grubhub, some companies operate hybrid driver fleets, that is, fleets consisting of CDs and salaried drivers (SDs), who receive compensation regardless of their utilization during the contract period, to reduce the uncertainty arising from the use of CDs. For example, Bringg’s partnership with WorkWhile aims at providing additional delivery capacity through a pool of CDs to LSPs with existing SD fleets (Freightwaves, 2022).

To account for heterogeneous CD behavior, we consider two predominant types of CDs: the first type of CDs are dedicated gig-drivers (DDs), whose request acceptance behavior is not sensitive to the request’s destination and who typically exhibit high availability. Example companies relying on this type of workforce are Postmates, Instacart, DoorDash (in the US), or Rappi (in South America). DDs install an app on their phone and receive a notification when a new delivery request arises. They can then either accept the request or wait for another request. When serving the request, they receive compensation, typically proportional to the distance of the request’s route. Second, we consider ODs, which only serve requests whose origin and destination coincide with their private route’s origin and destination. For example, the company Roadie relies on this type of driver. As such a concept leverages pre-existing routes, it potentially reduces delivery traffic, emissions, and costs.

One major challenge of a mixed fleet of SDs and CDs is to ensure minimum service levels, which the LSP achieves by hiring the right number of SDs based on the expected demand and the uncertain CD supply unfolding throughout the planning horizon. While initially hired SDs might become obsolete if the number of CDs grows, not hiring enough SDs early negatively impacts the LSP’s service level in early time periods. Hence, the main focus of this article is to examine the trade-off between hiring a reliable workforce supply via SDs, who receive compensation regardless of their utilization during the contract period, and the uncertain supply from CDs, whose compensation is proportional to their utilization. Hiring obsolete SDs poses an issue especially if the contract duration for SDs is long. Such long contract durations are particularly prevalent in regions with stringent labor regulations, for example, the European Union. If the contract duration is rather short (cf. Amazon, 2023) the LSP is less committed to SDs and can decide ad-hoc whether to prolong the SDs’ contract or not.

Since the level of required SDs to match the demand depends on operational aspects, for example, request route patterns, we develop a framework that integrates decision-making on two planning levels: the strategic level, where the LSP makes hiring decisions, and the operational level, where the LSP decides on how to route its SDs and which request to outsource to CDs. In the remainder of this section, we relate our work to the existing literature (Section 1.1), state our contribution (Section 1.2), and outline the article’s structure (Section 1.3).

1.1. Related Literature

Three streams of literature relate to our work: vehicle routing problems (VRPs) with CDs, strategic workforce planning problems with conventional employees, and studies combining workforce planning and crowdsourcing for general on-demand service platforms and last-mile delivery companies. We detail these streams in the following.

A large body of literature emerged in the field of VRPs with CDs. First studies consider an LSP that optimizes route plans for deliveries from a single depot for SDs and expected CDs in a static day-ahead manner (Archetti et al., 2016; Gdowska et al., 2018; Torres et al., 2022). Other papers study delivery with CDs in a multi-depot (Sampaio et al., 2019) or many-to-many network (Raviv and Tenzer, 2018; Voigt and Kuhn, 2022). Further works consider a dynamic delivery problem (DDP) in a crowdsourced context (Arslan and Zuidwijk, 2019; Dayarian and Savelsbergh, 2020; Mak, 2020), which is a special class of a dynamic pick-up and delivery problem (Berbeglia et al., 2010), wherein drivers do not change their route or pick-up another request once they began serving the current one. More recently, the meal delivery routing problem (MDRP) led to an increased focus on the DDP with CDs. In the MDRP, requests arise dynamically at random regions and must be delivered instantly to their destinations. In this context, some works consider random CD supply (Reyes et al., 2018), while others assume that CDs’ availability is known to the LSP (Ulmer et al., 2021; Yildiz and Savelsbergh, 2019). Our problem corresponds to a DDP in a many-to-many network using CDs, and we refer to the literature review of crowdsourced delivery in Alnaggar et al. (2021) and Savelsbergh and Ulmer (2022) for a comprehensive overview. So far, studies on the dynamic delivery setting consider relatively small instance sizes to benchmark their order matching and courier routing policies, for example, 24 drivers (Ulmer et al., 2021). Moreover, to the best of our knowledge, all works considering the dynamic delivery problem with CDs envision CDs to behave like DDs and neglect the potential of synchronizing demand with ODs.

In the strategic workforce planning problem with conventional employees, the objective minimizes costs from hiring, compensating, promoting, and operating a workforce over a certain time horizon. Several studies model employee hiring, training and learning, and turnover dynamics as a sequential decision-making problem formalized as a Markov decision process (MDP). Gans and Zhou (2002) considered the employee hiring problem of a service organization that wants to serve uncertain demand. They model hiring decisions, up-skilling transitions, employees’ turnover rates and formulate a total cost minimization objective, including an operational cost element. Similar studies include firing decisions (Ahn et al., 2005), propose heuristics to solve large instances (Song and Huang, 2008), consider worker heterogeneity (Arlotto et al., 2014), account for inter-departmental worker mobility (Dimitriou et al., 2013), model decisions on multiple organizational levels (Guerry and De Feyter, 2012), or focus on a specific application case, for example, healthcare (Hu et al., 2016). Further works use multi-stage stochastic programming combined with linearizations, Bender’s decomposition, or conic optimization (cf. De Feyter et al., 2017; Jaillet et al., 2022; Zhu and Sherali, 2009). Similar to these studies, we aim at finding total cost minimizing SD hiring policies over a long-term planning horizon. None of these works, however, considers the presence of a partially uncertain workforce whose size cannot be controlled. Incorporating such an uncertain workforce in our long-term SD hiring problem is the focus of our work.

Some studies investigate workforce management in a crowdsourced context. One stream of works analyzes general on-demand platforms controlling the supply of crowdsourced workers indirectly by adjusting the compensation offered for a service. Gurvich et al. (2019) studied such a platform and consider self-scheduling agents that decide to work based on expected compensation and their availabilities. Similar works focus on surge pricing to balance demand and supply (Cachon et al., 2017), on the influence of agents’ independence and customers’ delay sensitivity (Taylor, 2018), and platform commission schemes (Zhou et al., 2019). Similarly to these works, we consider self-scheduling agents as part of our workforce. However, our problem formulation differs significantly from existing works, as we consider them jointly with conventional employees (the SDs) and control our workforce solely through the hiring process of SDs. Finally, studies combining workforce planning and crowdsourced delivery are closest to our work. Dai et al. (2017) studied a problem with in-house drivers (equivalent to permanent employees), part- and full-time CDs, and derive optimal in-house driver and CD staffing levels at different depots and times of one day based on a deterministic demand scenario. Similarly, Behrendt et al. (2022a), Cheng et al. (2023), and Goyal et al. (2023) considered hybrid crowdsourced fleets with joint SD fleet-sizing and operational decision making, respectively focusing on warehouse allocation decisions, robust workforce management, and order pricing. All of these studies are restricted to a time horizon of one day, similar to Ulmer and Savelsbergh (2020) and Behrendt et al. (2022b), who focus on pure CD fleets and consider two types of CDs: scheduled CDs that announce their availability prior to the operational time horizon and unscheduled couriers, that arrive ad-hoc while the LSP already operates. They aim to find the optimal set of schedules for one day to minimize fixed costs associated with scheduled CDs and operational costs. While the former employ a classical value function approximation approach, the latter use neural networks to find the optimal set of shifts. Finally, Lei et al. (2020) also considered a one-day planning horizon and an entirely crowdsourced delivery platform and study mechanisms to reduce demand-supply imbalance by outsourcing excess requests to drivers willing to prolong their scheduled shifts. While these works study joint SD acquisition and operational planning, they consider short-term planning horizons. Hence, these works do not account for long-term dynamics, for example, workforce turnover or stochastic CD fleet growth. Moreover, the LSPs’ contractual commitment when hiring SDs reduces itself to one day in the studies above. However, in many legislative systems, contracts for fixed employees must have a minimum duration of a year, even when considering temporary contracts. Changing demand levels or increasing CD supply might make these fixed employees obsolete before their minimum contract duration terminates. Our work will address this untouched issue by considering long-term time horizons.

In conclusion, our work closes three gaps in the literature, combining crowdsourced delivery and workforce management. First, to the best of our knowledge, no work considers joint SD fleet sizing and operational decision-making on a long-term time horizon, thereby neglecting dynamics such as workforce turnover or stochastic CD fleet growth. Second, all works considering the dynamic delivery problem with CDs envision CDs to behave like DDs, hence disregarding the potential to synchronize demand with ODs. Third, studies on the dynamic delivery setting with CDs consider relatively small instance sizes. Yet, the instant delivery market, especially in urban areas, is expected to grow significantly, thus calling for studies accounting for large demand scenarios and large delivery fleets.

1.2. Contribution

To close the research gaps outlined above, we develop a novel framework to study the long-term workforce planning problem in the context of hybrid crowdsourced delivery fleets. To account for the interplay between workforce planning and operations, we integrate hiring decisions for a long-term time horizon with operational decisions regarding SD relocation and outsourcing of demand to CDs. Moreover, we consider two CD types, DDs and ODs, which exhibit distinct request acceptance behaviors. While the former is less sensitive to a request’s origin and destination and typically exhibits higher availability, the latter only accepts requests whose origin and destination coincide with their private route’s origin and destination.

Figure 1.

Different temporal entities and their dependency. Here, $K$ denotes the number of sub-horizons of $\bar{T}$ . The variables $t$ and $\bar{t}$ represent the time steps of the strategic and operational level time horizons, respectively.

Specifically, our contribution is threefold. First, we formalize the strategic level planning problem as a novel stochastic workforce planning problem, wherein the LSP needs to decide on how many SDs to hire or fire while taking into account uncertain CD supply. We model the strategic level as a finite-horizon MDP. Here, the objective is to minimize total costs arising from SD wages and operational costs. To obtain the latter term for large fleets within reasonable computation times, we approximate the operational problem with a fluid model. Second, we prove the value function’s convexity along the SD dimension and use this property to develop a look-ahead policy based on piecewise linear value function approximation (PL-VFA), which approximately solves our strategic problem. Third, we conduct numerical studies based on real-world data provided by Grubhub (2018), wherein we benchmark our PL-VFA against a myopic policy and a lookahead policy with perfect knowledge of future information. Furthermore, we evaluate sensitivities of strategic and operational levels’ parameters, for example, joining and resignation rates of CDs. Our main findings are as follows: (i) A hiring policy obtained from PL-VFA can yield up to $19 %$ lower total costs than a myopic hiring policy and up to $10 %$ lower total costs than a lookahead policy with perfect knowledge of future information. It does so by hiring less SDs than required in early time steps and by relying on future CD supply. (ii) SDs remain an important cost driver of total costs in the hybrid fleet, constituting up to 50% of total costs. Thus, developing better hiring SD policies can significantly impact total costs. DDs are the main cost driver among CDs, whereas ODs bear a significant potential when their spatial and temporal patterns are synchronized with request patterns. (iii) When we take into account that CDs leave the LSP’s platform with a higher likelihood if they are matched to a lower number of requests, we observe a lower effective CD supply, which leads to a higher amount of required SDs.

1.3. Structure

We structure the remainder of this work as follows. In Section 2, we introduce our problem setting, describing decisions and events on the strategic level and the dynamics of the operational level. In Section 3, we formalize the strategic level as an MDP and introduce a closed queueing network to model the operational level. Moreover, we introduce our PL-VFA for finding the optimal number of SDs and derive a fluid approximation for our operational planning problem. We detail the design of experiments for our numerical study in Section 4 and discuss results in Section 5. We conclude this article with a short synthesis in Section 6.

2. Problem Setting

In the following, we introduce our problem setting. First, we provide a high-level descriptive overview in Section 2.1, before we formalize and detail the problem dynamics and objectives in Section 2.2.

Figure 2.

Sequence of events and decisions in time step $t$ .

2.1. Descriptive Overview

In this article, we focus on an LSP providing on-demand delivery services in an urban area $M$ by operating a mixed fleet of SDs, DDs, and ODs. We decompose the problem setting into two nested levels. First, on a strategic level, we consider the LSP’s problem of composing its mixed delivery fleet by hiring and firing SDs, and by observing the outcomes of stochastic CD joining and resignation processes. We define the strategic level problem on time horizon $T$ spanning multiple months to a couple of years, wherein the time steps represent, for example, weeks. Second, we consider an operational level, which is embedded within one time step of the strategic level problem. Herein, we treat the LSP’s problem of leveraging their mixed fleet to serve on-demand requests arising stochastically in different regions within $M$ . The LSP can either dispatch its SDs or outsource requests to CDs. Moreover, the LSP can relocate idling SDs to more promising regions. We define the operational level problem on time horizon $\bar{T}$ which can consist of multiple sub-horizons representing different time intervals of interest, for example, morning or afternoon time windows. Each of the sub-horizons can be discretized, for example, into minutes or hours. We visualize the dependency between the different temporal entities and their corresponding structure in Figure 1. We assume that SDs who were hired and CDs who joined in one time step $t \in T$ are available throughout the respective $\bar{T}$ . While SDs are hired as fixed employees, CDs register on the LSP’s crowdsourced delivery platform, for example, via an app, through which they receive notifications about potential delivery requests. They can de-register from the platform in any time step $t \in T$ . Thus, the LSP neither controls the CDs’ joining nor their resignation processes. The LSP’s objective is total cost minimization. On the operational level, costs consist of the variable SD dispatching and relocation costs, of the payments to CDs, and of penalties for undelivered requests. We base payments to DDs on the delivery distance. Such distance-based compensation schemes are similar to compensation schemes employed by on-demand delivery companies, for example, GrubHub. In line with existing works, we assume that the ODs’ compensation is independent of the request distance (cf. Archetti et al., 2016). Such a compensation implies that ODs accept requests up to a certain detour, which depends on their value of time and the offered constant compensation. On the strategic level, we account for the fixed personnel costs of SDs and the severance payments for firing SDs. In the next section, we provide a detailed mathematical formalization of the problem setting.

2.2. Mathematical Formalization

We start by describing the sequence of events and decisions during one time step $t \in T$ of the strategic level, see Figure 2. Let $n_{t}^{SD}$ , $n_{t}^{DD}$ , and $n_{t}^{OD}$ describe the number of SDs, DDs, and ODs, respectively, available to the LSP in time step $t \in T$ . Firstly, the LSP decides on the net number $a_{t}$ of newly hired ( $a_{t} > 0$ ) or laid-off SDs $(a_{t} < 0)$ . Secondly, the LSP serves requests on the operational level by matching them to SDs and CDs and by relocating SDs to ensure high service levels. Thirdly, at the end of time step $t$ , some SDs and CDs resign, while some new CDs decide to join the LSP’s platform. Let ${\tilde{x}}^{α}$ denote the random variable describing the number of drivers resigning at the end of $t$ with $α \in {SD, DD, OD}$ . We let ${\tilde{x}}^{α}$ follow a probability distribution $X^{α}$ , which we will detail in Section 4. Analogously, we model the number of newly joining CDs, ${\tilde{y}}^{α}$ , to follow a probability distribution $Y^{α}$ . We assume the distributions $X^{α}$ and $Y^{α}$ to be independent. This is plausible since they represent different types of workforce exhibiting distinct behaviors and motivations to work for the LSP. The evolution of the number of drivers available to the LSP, from one time step $t$ to $t + 1$ , then reads

\begin{aligned} n_{t + 1}^{SD} & = n_{t}^{SD} + a_{t} - {\tilde{x}}^{SD}; n_{t + 1}^{DD} = n_{t}^{DD} + {\tilde{y}}^{DD} - {\tilde{x}}^{DD}; \\ n_{t + 1}^{OD} & = n_{t}^{OD} + {\tilde{y}}^{OD} - {\tilde{x}}^{OD} . \end{aligned}

(2.1)

Now we describe the costs that the LSP incurs at each time step

t

. Let the constant

C^{fix}

denote the wage per SD and time step

t

. Moreover, let

C^{sev}

denote the severance payment per laid-off SD and time step

t

. Finally, let the function

C_{t}^{ops} (n_{t}^{SD}, n_{t}^{DD}, n_{t}^{OD}, R^{t})

denote costs from serving requests on the operational level. Here,

R^{t}

represents the demand to be served. We assume a deterministic demand curve over the strategic time horizon

T

and no shortages in SD supply. Thus, the LSP can hire enough SDs to cover the entire demand in each time step

t

To obtain $C_{t}^{ops} (n_{t}^{SD}, n_{t}^{DD}, n_{t}^{OD}, R^{t})$ , we now describe the operational level problem in detail. Requests arise dynamically in some region $i \in M$ and need to be delivered instantaneously to region $j \in M$ . We model request arrivals in region $i$ with a Poisson process with arrival rates $λ_{i t}^{R} \in R^{t}$ . We describe requests’ destinations by a request origin-destination matrix $P_{i j}^{R}$ . Requests not being served in time step $\bar{t} \in \bar{T}$ disappear from the system and result in a penalty. Likewise, we let CD arrivals follow a Poisson process. We link arrival rates at regions $i$ , denoted by $λ_{i t}^{DD}$ and $λ_{i t}^{OD}$ , to the number of CDs currently active on the LSP’s platform, to $ζ^{α}$ , and to demand and area specific availability patterns $I_{i}^{DD}$ and $I_{i}^{OD}$ for DDs and ODs, respectively. Area specific availability patterns describe the share of CD arrivals in region $i$ , that is, $I_{i}^{α} = \frac{Arrivals of CDs of type α in i for {\bar{T}}_{k}}{All arrivals of CDs of type α in area M for {\bar{T}}_{k}}$ . The constants $ζ^{α}$ quantify the CDs’ share of available drivers of type $α$ within ${\bar{T}}_{k}$ compared to the total number of drivers of type $α$ , that is, $ζ^{α} = \frac{Available CDs of type α within \bar{T_{k}}}{Total no. of CDs of type α}$ . The arrival rates, therefore, result to

λ_{i t}^{DD} = n_{t}^{DD} ζ^{DD} I_{i}^{DD}, λ_{i t}^{OD} = n_{t}^{OD} ζ^{OD} I_{i}^{OD},

(2.2)

where

ζ^{α}

and

I_{i}^{α}

depend on the problem instances. While ODs only accept requests whose origins coincide with the origin of their individually planned routes, DDs seek regions with high request density to maximize earnings. Hence, we consider

I_{i}^{DD} = \frac{λ_{i t}^{R}}{\sum_{j}^{| M |} λ_{j t}^{R}}

. Contrarily, ODs’ availability patterns differ from request patterns as detailed in Section 4. Moreover, we denote request, DD, and OD route patterns by

P_{i j}^{R}

P_{i j}^{DD}

, and

P_{i j}^{OD}

respectively. Route patterns describe the share of requests, DDs, and ODs starting in

i \in M

and heading towards

j \in M

. Analogously to availability patterns, we assume

P_{i j}^{R} = P_{i j}^{DD}

, and

P_{i j}^{OD}

being independent of request patterns, as detailed in Section 4. We assume that requests and CDs who are not matched to a driver or request within

\bar{t}

leave the system. This assumption aligns with the on-demand delivery context we study, wherein requests need to be delivered instantaneously, for example, because they consist of perishable goods. The penalty cost

c_{i j}^{\emptyset}

accounts for both opportunity costs and actual costs of paying an expensive third-party courier to perform the delivery. The assumption regarding CDs is sensible as CDs find better outside options if not being matched because they often register at different delivery platforms simultaneously (Wired, 2018).

We denote the costs of serving a request with delivery option $β$ by $c_{i j}^{β}$ and the penalty costs for not serving a request by $c_{i j}^{\emptyset}$ . We assume that $c_{i j}^{SD} < c_{i j}^{DD}$ and $c_{i j}^{SD} < c_{i j}^{OD}$ . This is plausible since SD’s variable costs only include mileage costs, whereas CD’s variable costs need to cover mileage costs and a profit margin which motivates them to serve the request. We model the DDs’ compensation to depend on the distance between a request’s origin and the request’s destination. Hence, $c_{i j}^{DD} = c_{var}^{DD} r_{i j}$ , where $r_{i j}$ is the origin-destination matrix, and $i$ and $j$ are the origin and destination of the request. The constant $c_{var}^{DD}$ denotes the per distance-unit costs of DDs. As mentioned in Section 2.1, we consider a constant $c_{var}^{OD}$ (cf. Archetti et al., 2016). To summarize the behavioral differences between SDs, DDs, and ODs, we highlight their key characteristics across the strategic and operational level in Table 1. Let $R_{i j} (\bar{t})$ denote the number of SDs that the LSP decides to relocate from $i$ to $j$ in time step $\bar{t}$ , and let $A_{i j}^{β} (\bar{t})$ be the number of requests the LSP decides to match to delivery option $β \in {SD, DD, OD, \emptyset}$ , with $A_{i j}^{\emptyset} (\bar{t})$ being the number of requests not matched to any driver. The expected operational costs in $t$ then read

\begin{aligned} C_{t}^{ops} (n_{t}^{SD} + a_{t}, n_{t}^{DD}, n_{t}^{OD}, R^{t}) \\ = \sum_{k} (\min_{R_{i j}^{k}, A_{i j}^{SD, k}} E [\sum_{\bar{t} \in {\bar{T}}_{k}} [\sum_{i j} (\sum_{β} [c_{i j}^{β} A_{i j}^{β, k} (\bar{t})] + c_{i j}^{SD} R_{i j}^{k} (\bar{t}))]]) . \end{aligned}

(2.3)

In the remainder of this article, we assume that

\bar{T}

consists of

K

sub-horizons

{\bar{T}}_{k}

, each of which represents a periodically occurring time-window, for example, the same afternoon time-window of each day belonging to the strategic level time step. We assume that arrival patterns for requests and CDs follow the same distribution for each of the periodically occurring

{\bar{T}}_{k}

. This allows us to drop index

k

, as the operational costs for each of these different

{\bar{T}}_{k}

are approximately the same. Hence, we obtain the total operational costs in

t

by calculating them once and multiplying them by

K

. We refer the interested reader to Appendix A in the E-Companion for details on how to adapt our methodology to unequal

{\bar{T}}_{k}

Table 1.

Characteristics of SDs, DDs and ODs.

	SDs	DDs	ODs
Fixed costs	$C^{fix}$ and $C^{sev}$ per $t$	None
Variable costs	Distance-based	Distance-based	Constant per request
Joining pattern	Deterministic hiring	Stochastic
Resignation pattern	Deterministic firing & stochastic resignation	Stochastic resignations
Availability patterns $I_{i}^{β}$	n/a	$\frac{λ_{i t}^{R}}{\sum_{j}^{\| M \|} λ_{j t}^{R}}$	Independent of $λ_{i t}^{R}$
Route patterns $P_{i j}^{β}$	n/a	$P_{i j}^{DD} = P_{i j}^{R}$	Independent of $P_{i j}^{R}$

SDs = salaried drivers; DDs = dedicated gig-drivers; ODs = opportunistic gig-drivers.

We now describe the set of constraints we need to fulfill in each time step $\bar{t}$ of the operational level’s problem. Let $O_{i j} (\bar{t})$ denote the number of requests occurring in time step $\bar{t}$ for a specific origin-destination pair $(i, j)$ . Furthermore, we denote the number of SDs in the process of relocating from $i$ to $j$ with $E_{i j} (\bar{t})$ , for $i \neq j$ . For $i = j$ , $E_{i j} (\bar{t})$ describes the number of SDs idling in $i$ . Let $F_{i j} (\bar{t})$ describe the number of SDs in the process of currently serving a request from $i$ to $j$ . We denote by $E_{i j}^{'} (\bar{t})$ and $F_{i j}^{'} (\bar{t})$ the number of SDs completing their relocation and request delivery respectively. Finally, let $X_{i j}^{β} (\bar{t})$ , $β \in {DD, OD}$ , denote the number of CDs available in time step $\bar{t}$ , which we sample based on $λ_{i t}^{β}$ and $P_{i j}^{β}$ as defined above. The operational costs and the respective constraints for each $\bar{t}$ then read

\begin{aligned} C_{t}^{ops} & = K (\min_{R_{i j}, A_{i j}^{SD}} E [\sum_{\bar{t} \in \bar{T}} [\sum_{i j} (\sum_{β} [c_{i j}^{β} A_{i j}^{β} (\bar{t})] + c_{i j}^{SD} R_{i j} (\bar{t}))]]) \end{aligned}

(2.4a)

\begin{aligned} s . t . & F_{i j} (\bar{t} - 1) + A_{i j}^{SD} (\bar{t}) - F_{i j}^{'} (\bar{t}) = F_{i j} (\bar{t}) & \forall i, j \in M, \forall \bar{t} \in \bar{T}, \end{aligned}

(2.4b)

\begin{aligned} A_{i j}^{β} (\bar{t}) \leq X_{i j}^{β} (\bar{t}) & \forall i, j \in M, β \in {DD, OD}, \forall \bar{t} \in \bar{T}, \end{aligned}

(2.4c)

\begin{aligned} E_{i j} (\bar{t} - 1) + R_{i j} (\bar{t}) - E_{i j}^{'} (\bar{t}) = E_{i j} (\bar{t}) & \forall i, j, \in M, i \neq j, \forall \bar{t} \in \bar{T}, \end{aligned}

(2.4d)

\begin{aligned} E_{i i} (\bar{t} - 1) + \sum_{j, j \neq i} E_{j i^{'}} (\bar{t}) + \sum_{j} F_{j i^{'}} (\bar{t}) - (\sum_{j, j \neq i} R_{i j} (\bar{t}) + \sum_{j} A_{i j}^{SD} (\bar{t})) = E_{i i} (\bar{t}) & \forall i \in M, \forall \bar{t} \in \bar{T}, \end{aligned}

(2.4e)

\begin{aligned} \sum_{β} A_{i j}^{β} (\bar{t}) = O_{i j} (\bar{t}) & \forall i, j \in M, \forall \bar{t} \in \bar{T}, \end{aligned}

(2.4f)

\begin{aligned} R_{i j} (\bar{t}) \geq 0, R_{i i} (\bar{t}) = 0, A_{i j}^{β} (\bar{t}) \geq 0 & \forall i, j \in M, β \in {SD, DD, OD, \emptyset}, \forall \bar{t} \in \bar{T}, \end{aligned}

(2.4g)

\begin{aligned} \sum_{i j} (E_{i j} (\bar{t}) + F_{i j} (\bar{t})) = n_{t}^{SD} & \forall \bar{t} \in \bar{T} . \end{aligned}

(2.4h)

Constraints (2.4b) ensure that $F_{i j} (\bar{t})$ corresponds to the number of SDs in the last time step plus the number of SDs starting a delivery from $i$ to $j$ minus the number of SDs completing their delivery. Constraints (2.4c) guarantee that the number of requests matched to CDs does not surpass the number of CDs available in time step $\bar{t}$ . Constraints (2.4d) ensure the same conditions for $E_{i j} (\bar{t})$ as Constraints (2.4b) do for $F_{i j} (\bar{t})$ . Constraints (2.4e) guarantee that the number of idling SDs equals the sum of SDs arriving in $i$ as they complete their relocation or their request delivery minus the sum of SDs leaving $i$ as they are relocated or matched to a request. Finally, Constraints (2.4f), (2.4g), and (2.4h), respectively, ensure that all requests are matched to one delivery option, the positivity of $R_{i j} (\bar{t})$ and $A_{i j}^{β} (\bar{t})$ , and that all drivers are considered. We note that we do not consider relocations from $i$ to $i$ , ensured by $R_{i i} (\bar{t}) = 0$ (cf. Constraints (2.4g)), as this does not yield any benefit to the LSP.

Let us for now assume that we obtain some approximation for $C_{t}^{ops} (n_{t}^{SD} + a_{t}, n_{t}^{DD}, n_{t}^{OD}, R^{t})$ . Then, total costs $C_{t}^{tot}$ in time step $t$ result to

\begin{aligned} C_{t}^{tot} (n_{t}^{SD} + a_{t}, n_{t}^{DD}, n_{t}^{OD}, R^{t}) = C_{t}^{ops} (n_{t}^{SD} + a_{t}, n_{t}^{DD}, n_{t}^{OD}, R^{t}) \\ + (C^{fix} \cdot (n_{t}^{SD} + a_{t}) + C^{sev} | \min (0, a_{t}) |) . \end{aligned}

(2.5)

The LSP aims at minimizing its expected total costs over the time horizon

T

. Due to the stochastic nature of the problem, this objective formally results to

\begin{aligned} \min E [\sum_{t = 0}^{| T |} C_{t}^{tot}] = \min E [\sum_{t = 0}^{| T |} (C_{t}^{ops} (n_{t}^{SD} + a_{t}, \cdot) \\ + (C^{fix} (n_{t}^{SD} + a_{t}) + C^{sev} | \min (0, a_{t}) |))] . \end{aligned}

(2.6)

Some comments on our modeling choices and assumptions are in order. First, we do not account for uncertainty in the parameters governing the evolution of future demand levels, that is, the set of Poisson rates

R^{t}

in each time step

t

of the strategic level’s problem, as we want to exclusively analyze the effect of uncertain CD supply. Therefore, we assume that robust forecasts concerning these parameters can be carried out on the strategic level’s planning horizon. This is plausible because LSPs indirectly control future demand levels and distributions through the contracts they set up with demand sources, for example, restaurants or supermarkets, before the strategic level’s planning horizon starts.

Second, we assume that there are no SD supply shortages, as we restrict our problem to urban areas that typically have an abundant workforce supply, especially in the gig economy sector.

Third, severance payments implicitly prescribe the minimum duration for which the LSP must hire SDs. If $C^{fix} (T - t) < C^{sev}$ , the LSP will not fire SDs as it is cheaper to pay them until the end of the time horizon. This allows us to model both short-term (low $C^{sev}$ ) and long-term (high $C^{sev}$ ) hiring decisions. We note that in real-world applications short-term hiring decisions are typically achieved by issuing short-term contracts, which do not require paying a severance fee when the contract expires and the LSP does not extend it.

Finally, we consider a finite time horizon on the strategic level since LSPs’ strategic workforce planning process relies on finite horizons, for which they can leverage a robust forecast. This is in line with works in the strategic workforce planning literature (cf. Gans and Zhou, 2002).

3. Methodology

This section formalizes the problem setting presented in Section 2. We model the strategic level’s problem as an MDP (Section 3.1) and the operational level’s problem as a closed queueing network (Section 3.2). Finally, we present an approximate dynamic programming approach to solve large instance sizes in Section 3.3.

3.1. Strategic Level Workforce Planning

In this section, we formalize the LSP’s workforce planning problem, outlined in Section 2, as an MDP. In the following, we will successively describe the state, the feasible actions, the state transition, the policy, and the objective function.

Pre-decision state: We denote pre-decision states that represent the fleet composition in time step $t$ by $s_{t} = (n_{t}^{SD}, n_{t}^{DD}, n_{t}^{OD}) \in N_{0}^{3}$ with $s \in S = {0, \dots, N^{SD}} \times {0, \dots, N^{DD}} \times {0, \dots, N^{OD}} \times {0, \dots, T}$ , where $S$ denotes the state space. Pre-decision states describe the fleet composition at the beginning of time step $t$ before making any decisions. Variables $N^{SD}$ , $N^{DD}$ , and $N^{OD}$ represent the maximum number of drivers attainable for each driver type, for example, the maximum number of individuals with the intention to work for the LSP within urban area $M$ . Similarly, we denote the state space in time step $t$ by $S_{t} = {0, \dots, N^{SD}} \times {0, \dots, N^{DD}} \times {0, \dots, N^{OD}}$ .

Feasible actions and post-decision state: The LSP decides on the number of SDs to hire or fire. The action space $A_{t} = {- n_{t}^{SD}, \dots, N^{SD}} \in Z$ describes the possible hiring decisions that the LSP can make. The maximum number of SDs that can be fired depends on the current time step, as the LSP cannot fire more SDs than currently employed. When the LSP decides on an action $a_{t} \in A_{t}$ , we reach the post-decision state $s_{t}^{a} = (n_{t}^{SD} + a_{t}, n_{t}^{DD}, n_{t}^{OD})$ and evaluate the operational problem to approximate the operational costs $C_{t}^{ops} (s_{t}^{a}, R^{t})$ .

State transition: We transition to the next time step by following a resignation process for all drivers and a joining process for CDs, described by equation (2.1). Let $P (s_{t + 1} | s_{t}^{a}) \in [0, 1]^{| S | \times | S |}$ denote the transition probability matrix, describing the probability of transitioning to state $s_{t + 1}$ when being in state $s_{t}^{a}$ . Accordingly, as the distributions $X^{α}$ and $Y^{α}$ are assumed to be independent, each entry of $P$ reads

\begin{aligned} P (s_{t + 1} | s_{t}^{a}) & = P ({\tilde{x}}_{t}^{SD} | s_{t}^{a}) \cdot P ({\tilde{x}}_{t}^{DD} | s_{t}^{a}) \cdot P ({\tilde{x}}_{t}^{OD} | s_{t}^{a}) \cdot P ({\tilde{y}}_{t}^{DD} | s_{t}^{a}) \\ \cdot P ({\tilde{y}}_{t}^{OD} | s_{t}^{a}) . \end{aligned}

(3.1)

Policy: We denote a deterministic state-dependent hiring and firing policy by

π : S \to A

. It assigns an action

a_{t} \in A (s_{t})

to each pre-decision state

s_{t} \in S_{t}

. Moreover, we denote the set of all possible policies by

Π

Objective: Starting in an initial state $s_{0}$ , the LSP’s objective is to minimize expected future total costs over the time horizon $T$ , formally

\begin{aligned} V_{0} (s_{0}) & = \min_{π \in Π} E [\sum_{t = 0}^{T} γ^{t} \cdot (C_{t}^{ops} (n_{t}^{SD} + a_{t}, \cdot) \\ + K (C^{fix} (n_{t}^{SD} + a_{t}) + C^{sev} | \min (0, a_{t}) |)) | s_{0}], \end{aligned}

(3.2)

wherein

γ

denotes the discount factor. Equation (3.2) formalizes the LSP’s objective (cf. equation (2.6)) as finding the hiring/firing policy

π

minimizing the sum of discounted costs over the planning horizon

T

. The following section introduces the model for determining operational costs

C_{t}^{ops}

3.2. Formalization of the Operational Level’s Problem

Given the necessity to solve the operational level’s problem in each time step of the strategic level, an efficient approximation is essential to preserve computational tractability. Current research on mobility on demand (MoD), similar to the on-demand last-mile delivery problem examined in our study, show that greedy matching heuristics are only marginally surpassed by lookahead policies based on methods such as model predictive control or deep reinforcement learning (Enders et al., 2023). Therefore, we rely on greedy driver-to-request matching and use a forward-looking SD relocation policy. This approach allows us to employ a fluid approximation model for the operational costs as suggested by Braverman et al. (2019). First, we formalize the operational level’s problem as a closed queueing network. We base our formulation of the queueing network on the model from Zhang and Pavone (2016). Herein, we assume that requests can only be served by drivers in the same region as the request’s origin. This is plausible if the discretization of the area $M$ results in large regions, such that drivers outside of the region cannot reach the request in time. Moreover, we assume that the model is agnostic to the varying distance between different drivers within the region and a request’s origin. Hence, the greedy matching policy results into matching requests to the cheapest available driver located in the region. We describe the number of SDs idling in region $i$ with $| M |$ single server queues, denoted by $E_{i i} (\bar{t})$ , as we have $| M |$ regions. Each $E_{i i} (\bar{t})$ has a service rate $λ_{i t}^{R}$ . We describe the number of SDs relocating from $i$ to $j$ with $| M |^{2} - | M |$ infinite server queues $E_{i j} (\bar{t})$ , each having a service rate $μ_{i j}$ . Finally, we describe the number of SDs serving a request from $i$ to $j$ with $| M |^{2}$ infinite server queues, denoted by $F_{i j} (\bar{t})$ , also having a service rate $μ_{i j}$ , as we assume that drivers relocate with the same speed as they deliver a request. Here, $1 / μ_{i j}$ denotes the travel time matrix.

We now introduce the fluid approximation of the presented closed queueing network as proposed by Braverman et al. (2019). Herein, we consider steady-state conditions, that is, $\bar{t} \to \infty$ . This holds approximately in on-demand delivery services in urban areas if the operational planning horizon is sufficiently small, for example, 60 min.

The fluid approximation reformulates the closed-queueing system as a network flow problem, whose counterparts to the closed-queueing system’s queue lengths $E_{i j}$ and $F_{i j}$ are network flows from region $i$ to $j$ . We denote these network flows by $e_{i j}$ and $f_{i j}$ . They correspond to single and infinite server queues in steady-state conditions, respectively, and read

e_{i j} = \frac{E_{i j} (\bar{t} \to \infty)}{n_{t}^{SD}}; f_{i j} = \frac{F_{i j} (\bar{t} \to \infty)}{n_{t}^{SD}} .

(3.3)

Moreover, we denote by

a_{i}^{SD}

the fraction of requests in region

i

matched to SDs in region

i

in steady-state conditions. Analogously, let us denote by

a_{i j}^{DD}

and

a_{i j}^{OD}

, the corresponding fractions of requests matched to DDs and ODs, respectively, on routes

i, j

. Consider the following linear program (LP), whose full derivation we detail in Appendix B in the E-Companion. It represents the network flow problem whose objective value is the operational cost over time horizon

\bar{T}

\begin{aligned} C_{t}^{ops} = K \min_{e, f, a} \sum_{i \in M} \sum_{j \in M} [λ_{i t}^{R} (\sum_{β \in {DD, OD, \emptyset}} (c_{i j}^{β} \cdot a_{i j}^{β}) + c_{i j}^{SD} \cdot P_{i j}^{R} \cdot a_{i}^{SD}) + (1 - δ_{i j}) c_{i j}^{SD} n_{t}^{SD} e_{i j}] \end{aligned}

(3.4a)

\begin{aligned} (λ_{i t}^{R} / n_{t}^{SD}) \cdot a_{i}^{SD} \cdot P_{i j}^{R} = μ_{i j} \cdot f_{i j} \forall i, j \in M, \end{aligned}

(3.4b)

\begin{aligned} λ_{i t}^{R} \cdot a_{i j}^{DD} \leq λ_{i t}^{DD} P_{i j}^{DD} \forall i, j \in M, \end{aligned}

(3.4c)

\begin{aligned} λ_{i t}^{R} \cdot a_{i j}^{OD} \leq λ_{i t}^{OD} P_{i j}^{OD} \forall i, j \in M, \end{aligned}

(3.4d)

\begin{aligned} μ_{i j} e_{i j} \leq \sum_{k} μ_{k i} f_{k i}, i \neq j \forall i, j \in M, \end{aligned}

(3.4e)

\begin{aligned} \sum_{k, k \neq i} μ_{k i} e_{k i} \leq (λ_{i t}^{R} / n_{t}^{SD}) a_{i}^{SD} \leq \sum_{k, k \neq i} μ_{k i} e_{k i} + \sum_{k} μ_{k i} f_{k i} \forall i \in M, \end{aligned}

(3.4f)

\begin{aligned} (λ_{i t}^{R} / n_{t}^{SD}) a_{i}^{SD} + \sum_{j, j \neq i} μ_{i j} e_{i j} = \sum_{k, k \neq i} μ_{k i} e_{k i} + \sum_{k} μ_{k i} f_{k i} \forall i \in M, \end{aligned}

(3.4g)

\begin{aligned} a_{i}^{SD} + \sum_{j} (a_{i j}^{DD} + a_{i j}^{OD} + a_{i j}^{\emptyset}) = 1 \forall i \in M, \end{aligned}

(3.4h)

\begin{aligned} 0 \leq a_{i}^{SD} \leq 1 \forall i, j \in M, \end{aligned}

(3.4i)

\begin{aligned} 0 \leq a_{i j}^{DD} \leq P_{i j}^{R}; 0 \leq a_{i j}^{OD} \leq P_{i j}^{R} \forall i, j \in M, \end{aligned}

(3.4j)

\begin{aligned} 0 \leq a_{i j}^{\emptyset} \leq 1 \forall i, j \in M, \end{aligned}

(3.4k)

\begin{aligned} 0 \leq e_{i j} \leq 1, 0 \leq f_{i j} \leq 1, \sum_{i} \sum_{j} e_{i j} + f_{i j} = 1 \forall i, j \in M . \end{aligned}

(3.4l)

Objective (3.4a) describes the minimization of operational costs based on costs from serving requests and the empty routing costs, with

δ_{i j} = {\begin{cases} 0, if i \neq j, \\ 1, if i = j . \end{cases}

being an indicator function. Constraints (3.4b) to (3.4d) ensure flow conservation. Constraints (3.4e) to (3.4g) result from a linear relaxation. Constraints (3.4e) reflect the relaxed Little’s law, which states that the outgoing flow of relocating SDs in one direction

j

at one region

i

cannot be higher than the incoming flow of SDs that serve requests. Constraints (3.4f) and (3.4g) ensure that the total SD flow leaving region

i

is equal to the total SD flow entering region

i

. Constraints (3.4h) ensure that a request is either matched or not matched in every region

i

. Constraints (3.4i) to (3.4k) ensure that no more requests are matched to the corresponding delivery option than possible. Finally, Constraints (3.4l) ensure that the sum of SD flows sums up to 1. We can readily compute the optimal solution to this LP with commercially available solvers.

Proposition 1

The optimal objective of the LP described by equations (3.4a) to (3.4l) is a lower bound on the operational costs, as $n_{t}^{SD} \to \infty$ and $λ_{i t}^{R} (t) \to \infty$ .

Proof: See Appendix B in the E-Companion.

We use the solution of the LP 3.4 to approximate the operational costs obtained in $\bar{T}$ .

3.3. Dynamic Programming on the Strategic Level

In this section, we present a stochastic dynamic programming approach to solve the workforce planning problem on the strategic level. Section 3.3.1 shortly discusses a standard backward dynamic programming (BDP) procedure to find the optimal policy $π$ . As this approach becomes intractable for large fleet sizes, we present a PL-VFA in Section 3.3.2, to compute a near-optimal $π$ .

Figure 3.

Piecewise linear approximation along the salaried driver (SD) dimension for a fixed $W_{t}$ .

3.3.1. Exact Approach: Backward Dynamic Programming

We start by describing a BDP approach, which allows us to determine the optimal workforce planning policy $π$ . Herein, we consider a deterministic policy $π$ and recall that the value function $V_{t}$ of being in pre-decision state $s_{t}$ reads

\begin{aligned} V_{t} (s_{t}) & = \min_{π \in Π} E [\sum_{t^{'} = t}^{T} γ^{t^{'} - t} \cdot C_{t^{'}}^{tot} | s_{t}] \\ = \min_{a_{t}} (C_{t}^{tot} (s_{t}^{a}, R^{t}) + γ E [V_{t + 1} (s_{t + 1}) | s_{t}^{a}]), \end{aligned}

(3.5)

where the value of being in state

s_{t + 1}

is defined as follows:

V_{t + 1} (s_{t + 1}) = \min_{π \in Π} E [\sum_{t^{'} = t + 1}^{T} γ^{t^{'} - t - 1} \cdot C_{t^{'}}^{tot} | s_{t + 1}] .

(3.6)

Using the transition probabilities as defined in equation (3.1) we can rewrite equation (3.5) as

\begin{aligned} V_{t} (s_{t}) = \min_{a_{t}} (C_{t}^{tot} (s_{t}^{a}, R^{t}) + γ \sum_{s_{t + 1}} P (s_{t + 1} | s_{t}^{a}) V_{t + 1} (s_{t + 1} | s_{t}^{a})) . \end{aligned}

(3.7)

Since the strategic level’s time horizon

T

is finite, we can apply BDP, as described in Appendix C in the E-Companion. For LSPs operating with fleets of more than

1, 000

drivers per type, BDP would require to solve the operational problem more than one billion times per time step and becomes intractable. Accordingly, we develop an approximate algorithm to solve (3.7) in the following section, which allows us to study larger fleet sizes in reasonable computation times and to consider unbounded state spaces.

3.3.2. Piecewise Linear Value Function Approximation

In this section, we introduce an algorithm that approximates the value of being in post-decision state $s_{t}^{a_{t}}$ , represented by $V_{t}^{a} (s_{t}^{a_{t}})$ . Let us denote the set of all possible CD combinations in $t$ by $W_{t}$ , and one CD combination in time step $t$ by $w_{t} = (n_{t}^{DD}, n_{t}^{OD})$ . For each $w_{t} \in W_{t}$ , we seek for a piecewise linear approximation of $V_{t}^{a} (s_{t}^{a_{t}})$ along the SD dimension. To efficiently obtain this approximation, we rely on Propositions 2 and 3, which state that $C_{t}^{tot}$ and $V_{t}^{a} (s_{t}^{a_{t}})$ are piecewise-linear convex in $a_{t}$ .

Proposition 2
$C_{t}^{t o t} (n_{t}^{SD} + a_{t}, w_{t})$ is piecewise-linear and convex in $a_{t}$ .

Proof: See Appendix D in the E-Companion.
Proposition 3
$V_{t}^{a} (n_{t}^{SD} + a_{t}, w_{t})$ is piecewise-linear and convex in $a_{t}$ .

Proof: See Appendix E in the E-Companion.

We denote the number of SDs in the post-decision state by $n_{t}^{SD, a_{t}} = n_{t}^{SD} + a_{t}$ . Let $v_{t} (w_{t})$ denote the set of slopes describing $V_{t}^{a} (s_{t}^{a_{t}})$ along the SD dimension for fixed $w_{t}$ , and let $v_{t} (n_{t}^{SD, a_{t}}, w_{t})$ be the slope to the “left” of $n_{t}^{SD, a_{t}}$ , as illustrated in Figure 3. The value of the post-decision state $s_{t}^{a_{t}}$ , $V_{t}^{a} (s_{t}^{a_{t}})$ , then reads
$\begin{aligned} V_{t}^{a} (s_{t}^{a_{t}}) = V_{t}^{a} (0, w_{t}) + \sum_{k = 0}^{n_{t}^{SD} + a_{t}} v_{t} (k, w_{t}) . \end{aligned}$
(3.8)
From here on, we omit $V_{t}^{a} (0, w_{t})$ as shifting the value function by a constant does not impact the optimal decision. We obtain the value of being in a pre-decision state as
$V_{t}^{a} (n_{t}^{SD, a_{t}}, w_{t}) = \min_{a_{t}} (C_{t}^{tot} (n_{t}^{SD, a_{t}}, w_{t}) + \sum_{k = 0}^{n_{t}^{SD, a_{t}}} v (k, w_{t}))$
(3.9)
and the optimal number of SDs as
$a_{t}^{*} = \underset{a_{t}}{argmin} (C_{t}^{tot} (n_{t}^{SD, a_{t}}, w_{t}) + \sum_{k = 0}^{n_{t}^{SD, a_{t}}} v (k, w_{t})) .$
(3.10)
We denote the post-decision states to the left and right of $s_{t}^{a}$ by $s_{t}^{a -} = (n_{t}^{SD} + a_{t} - 1, w_{t})$ and $s_{t}^{a +} = (n_{t}^{SD} + a_{t} + 1, w_{t})$ , respectively, see Figure 3. Moreover, we denote by $s_{t + 1}^{-}$ and $s_{t + 1}^{+}$ the pre-decision states in $t + 1$ , to which we transition from $s_{t}^{a -}$ and $s_{t}^{a +}$ respectively. Accordingly, we denote by $n_{t}^{SD, a -}$ and $n_{t}^{SD, a +}$ the number of SDs in $s_{t}^{a -}$ and $s_{t}^{a +}$ , and by $n_{t + 1}^{SD -}$ and $n_{t + 1}^{SD +}$ the number of SDs in $s_{t + 1}^{-}$ and $s_{t + 1}^{+}$ . We use the following relations:
$\begin{aligned} V_{t}^{a} (s_{t}^{a_{t}}) & = E [V_{t + 1} (s_{t + 1})], V_{t}^{a} (s_{t}^{a -}) = E [V_{t + 1} (s_{t + 1}^{-})], \\ V_{t}^{a} (s_{t}^{a +}) & = E [V_{t + 1} (s_{t + 1}^{+})], \end{aligned}$
to obtain an explicit expression of the slopes to the “left” and “right” of the post-decision state $s_{t}^{a_{t}}$
$\begin{aligned} v_{t} (s_{t}^{a -}) & = V_{t}^{a} (s_{t}^{a_{t}}) - V_{t}^{a} (s_{t}^{a -}) = E [V_{t + 1} (s_{t + 1})] - E [V_{t + 1} (s_{t + 1}^{-})], \\ v_{t} (s_{t}^{a +}) & = V_{t}^{a} (s_{t}^{a +}) - V_{t}^{a} (s_{t}^{a_{t}}) = E [V_{t + 1} (s_{t + 1}^{+})] - E [V_{t + 1} (s_{t + 1})] . \end{aligned}$
To approximate the slopes of the optimal value function, we adapt an iterative approach initially proposed by Nascimento and Powell (2009). We denote the approximated slopes by $\bar{v}$ . Moreover, we indicate sample information by $(\hat{\cdot})$ . Algorithm 1 shows the procedure for calculating approximated value function slopes. We initialize the approximated slopes with zeros (l. 1). Then, we sample, for each episode, an initial state (l. 3). Subsequently, we walk through the episode and first obtain a decision, $a_{t}$ , by solving (3.10) (l. 5) and sample the transition to the next state according to our transition function described by the resignation and joining processes of CDs (l. 7).

Then, we observe samples of $s_{t + 1}$ (l. 8), $s_{t + 1}^{-}$ (l. 9), and $s_{t + 1}^{+}$ (l. 10), and use these to calculate $v (s_{t}^{a -})$ (l. 11) and $v (s_{t}^{a +})$ (l. 12). We store the current slope approximations in a temporary vector $z$ (l. 13) and obtain a new slope based on a running mean update (l. 14 and 15), where $α$ denotes the learning rate. These updates can lead to temporary convexity violations. We therefore preserve convexity by correcting the slopes to the “left” and “right” of $n_{t}^{SD, a_{t}}$ as follows:
$Conv (z (n, w)) = {\begin{cases} z (n_{t}^{SD, a_{t}}, w_{t}), if n < n_{t}^{SD, a_{t}}, \\ w = w_{t} and z (n, w) > z (n_{t}^{SD, a_{t}}, w_{t}) \\ z (n_{t}^{SD, a_{t}} + 1, w_{t}), if n > n_{t}^{SD, a_{t}} + 1, \\ w = w_{t} and z (n, w) < z (n_{t}^{SD, a_{t}} + 1, w_{t}) \\ z (n, w), else \end{cases}$
(3.11)
Finally, learning the slopes for all $w_{t} \in W_{t}$ requires many iterations to ensure that each CD combination $w_{t}$ is sufficiently sampled. To reduce the number of samples required, we consider a homogeneous aggregation approach, in which we aggregate $W_{t}$ (cf. Powell, 2011: 144). Let us denote the aggregation factors for the DD and OD dimension by $k^{DD}$ and $k^{OD}$ , respectively, and the corresponding aggregated CD fleet sizes by ${\bar{n}}_{t}^{DD} = ⌊ \frac{n_{t}^{DD}}{k^{DD}} ⌋$ and ${\bar{n}}_{t}^{OD} = ⌊ \frac{n_{t}^{OD}}{k^{OD}} ⌋$ , respectively. Moreover, we let ${\bar{w}}_{t} = ({\bar{n}}_{t}^{DD}, {\bar{n}}_{t}^{OD})$ . Then, when considering homogeneous aggregation, the approximation of a slope for some $n_{t}^{SD, a_{t}}$ and $w_{t}$ reads
$\begin{aligned} {\bar{v}}_{t} (n_{t}^{SD, a_{t}}, w_{t}) & = {\bar{v}}_{t} (n_{t}^{SD, a_{t}}, n_{t}^{DD}, n_{t}^{OD}) \approx {\bar{v}}_{t} (n_{t}^{SD, a_{t}}, {\bar{n}}_{t}^{DD}, {\bar{n}}_{t}^{OD}) \\ = {\bar{v}}_{t} (n_{t}^{SD, a_{t}}, {\bar{w}}_{t}) . \end{aligned}$
(3.12)
We obtain the PL-VFA algorithm with homogeneous aggregation by replacing all $w_{t}$ by ${\bar{w}}_{t}$ in Algorithm 1.

4. Design of Experiments

This section describes our experimental setup for a subsequent managerial analysis. In the first part, we present the setup for the operational level’s problem, which bases on a real-world data set describing spatial and temporal order patterns for on-demand food deliveries. In the second part, we discuss the parameter settings on the strategic level and sensitivities to be analyzed.

Figure 4.

Spatial and temporal request patterns in instance 0o100t75s1p100. (a) Origin and destination pairs of orders; (b) no. of orders between $t = 0$ and $t = 850$ .

4.1. Experimental Setup

To account for a real-world scenario, we consider a data set provided by Grubhub (2018), which describes anonymized food delivery orders. The data set consists of 10 different instances. Each instance represents one US metropolitan area. Each order is characterized by its origin and destination coordinates. Moreover, each order is described by its placement and ready time. The former is when a customer orders through the Grubhub platform, and the latter is when the order is ready to be delivered. To keep the computational complexity of our experimental evaluation manageable, we randomly chose the instance type with initial digits “0o100” and only used the order information contained within them. The “0” encodes the metropolitan area on which the order information bases. The “o100” describes that 100% of orders are used. The remaining digits encode driver schedules, that is, times and locations at which and where drivers start and end their shift. As we do not model driver shifts on the operational level, we ignore the remaining digits. Figure 4 highlights its spatial and temporal order distribution. Orders occur from minute $t = 0$ to minute $t = 850$ . We use the provided data as base for deriving the origin-destination distance matrix $r_{i j}$ , the order arrival rates per region $i \in M$ , and the request pattern matrix $P_{i j}^{R}$ as follows. First, to account for average zip code surfaces, we discretize the area into squares of $4 {km}^{2}$ (visualized by the gray grid in Figure 4(a)), each representing a region $i \in M$ . We obtain $r_{i j}$ , for $i \neq j$ , as the Euclidean distance between centers of regions $i$ and $j$ . We assume that $r_{i i}$ corresponds to the half of the square’s side length, which is the average distance between any two points within a square, in this case $1 km$ . This results in 18 regions and consequently in a $18 \times 18$ origin-destination matrix. Based on the request distribution and the aggregated regions, we obtain the demand pattern matrix $P_{i j}^{R}$ and demand arrival rates $λ_{i t}^{R}$ . We briefly discuss the adequateness of the chosen discretization in Appendix F in the E-Companion. Moreover, since the data set is anonymized, the metropolitan area on which the data bases is unknown. Hence, we sample DD and OD arrivals. We let DD arrivals depend on the arrival patterns of requests, and we set the DD arrival intensity function $I_{i}^{DD}$ to $I_{i}^{D D} = \frac{λ_{i t}^{R}}{\sum_{j} λ_{j t}^{R}}$ , and obtain $λ_{i t}^{DD}$ via equation (2.2). This modeling approach reflects the main characteristic of DDs, which financially depend on the work for LSPs and, therefore, try to maximize their earnings by frequenting regions with high demand for deliveries. We randomly generate OD arrivals $I_{i}^{OD}$ and mobility patterns $P_{i j}^{OD}$ . We report CD mobility and request patterns in Appendix G in the E-Companion. We consider an average travel speed of $v^{avg} = 19 km / h$ as reported in the Grubhub data set for all driver types.

We set the strategic time horizon to a year and divide it into $T = 26$ two-week segments. We restrict our study on the operational level to the time window between minutes 550 and 600 (cf. Figure 4(b)) of instance 0o100t75s1p100 in the Grubhub data set, wherein we can assume steady-state conditions.

4.2. Description of Base Case Parameters and Variations

We now present the parameter settings required for the strategic level MDP and the fluid approximation on the operational level. We start by describing a base case and then present parameter variations.

We motivate the base case resignation probability and joining rate by a statistical evaluation initially made for Uber drivers between 2012 and 2016 (Hall and Krueger, 2018). On the strategic level, we consider a constant resignation probability of $p^{α} = 0.01$ for the base case. Hence, the number ${\tilde{x}}_{t}^{α}$ out of $n_{t}^{α}$ drivers that decide to leave the platform in time step $t$ follows a binomial distribution. Therefore, the probability of ${\tilde{x}}_{t}^{α}$ leaving, reads

P ({\tilde{x}}_{t}^{α}) = (\binom{n_{t}^{α}}{{\tilde{x}}_{t}^{α}}) {(p^{α})}^{{\tilde{x}}_{t}^{α}} {(1 - p^{α})}^{n_{t}^{α} - {\tilde{x}}_{t}^{α}}, α \in {DD, OD} .

(4.1)

In Section 5.5, we consider

p^{α}

to depend on the CDs’ matching sensitivity, that is, their increased likelihood to leave the LSP’s platform when they do not receive sufficiently many requests. Let us denote the slack variables from equations (3.4c) and (3.4d), which bound the number of CDs being matched to requests on the operational level, by

s_{i j}^{DD}

and

s_{i j}^{OD}

. Then, we can model

p^{α}

\begin{aligned} p^{α} = p_{high}^{α} \cdot \frac{\sum_{i j} s_{i j}^{α}}{n_{t}^{α}} + p_{low}^{α} \cdot (1 - \frac{\sum_{i j} s_{i j}^{α}}{n_{t}^{α}}), α \in {DD, OD} . \end{aligned}

(4.2)

Herein

p_{high}^{α}

and

p_{low}^{α}

are upper and lower resignation probabilities, which we set to 1 (all CDs resign when not being matched at all) and 0.01 (base case resignation probability), respectively. The term

\frac{\sum_{i j} s_{i j}^{α}}{n_{t}^{α}}

describes the share of unmatched CDs. When

\frac{\sum_{i j} s_{i j}^{α}}{n_{t}^{α}}

is high

p_{high}^{α}

receives a higher weight and resignations take place at a higher rate. When

\frac{\sum_{i j} s_{i j}^{α}}{n_{t}^{α}}

is low, the opposite is the case. Note that if CD resignation rates depend on the operational level’s cost function and accordingly non-linearly on the number of CDs, the convexity of the post-decision state’s value function is not guaranteed anymore.

To model the joining process, we assume the number of newly joining CDs ${\tilde{y}}_{t}^{α}$ to also follow a binomial distribution. Hence, the probability of ${\tilde{y}}_{t}^{α}$ newly joining CDs reads

P ({\tilde{y}}_{t}^{α}) = (\binom{n_{t}^{α}}{{\tilde{y}}_{t}^{α}}) {(q^{α})}^{{\tilde{y}}_{t}^{α}} {(1 - q^{α})}^{n_{t}^{α} - {\tilde{y}}_{t}^{α}}, α \in {DD, OD} .

where

n_{t}^{α}

is the number of CDs currently active on the platform, and

q^{α}

is the average joining rate of CDs. The number of CDs in the next time step can be calculated based on equation (2.1). Making the number of newly joining CDs dependent on the currently available ones allows us to account for network effects, that is, platforms with more users/workers attract more users/workers. We set

q^{α} = 0.09

in the base case. Now, we describe the fraction of CDs being active on the operational level and start by DDs. As DDs align their working times to the demand distribution, and the demand peaks in the

[550, 600]

time window, we assume that all DDs are active. Hence, we set

ζ^{DD} = 1

. ODs’ main working times are aligned with their primary occupation and lie mainly in the late afternoon/evening times, for example, when returning home from work (Le and Ukkusuri, 2019; Galkin et al., 2021). Accordingly, we assume OD arrivals to occur uniformly within the

[400, 800]

time window. This results to a share of OD arrivals within

\tilde{T}

ζ^{OD} = \frac{50}{400} = 0.125

We consider a homogeneous demand growth rate of roughly $0.6 %$ per strategic time step, leading to a compound annual growth rate of $20 %$ . We consider an overall hourly demand comparable to New York City of $\sim$ 24,000 requests (The Washington Post, 2022). Since the area we consider has a surface of $\sim$ $100 {km}^{2}$ (cf. Figure 4(a)), which is smaller than New York City ( $\sim 800 {km}^{2}$ ), we divide the overall demand by a factor of 8 and obtain a total demand of $\sum_{i} λ_{i T}^{R} = 3, 000$ requests per hour in the final time-step. We assume that the spatial distribution of requests remains constant over $T$ . The request arrival rate reads $λ_{i t}^{R} = \frac{3, 000}{{1.006}^{T - t}} \frac{λ_{i t}^{R}}{\sum_{j}^{| M |} λ_{j t}^{R}}$ .

In the base case (see Table 2), we set wages for SDs to $20 $ / h$ (Hall and Krueger, 2018) and their variable costs to $0.34 $ / km$ (Bösch et al., 2018; Lanzetti et al., 2023). For DDs, we use route-based compensation schemes (see, e.g., Grubhub or postmates), and for ODs, we assume a constant compensation per request as ODs are only paid for the detour from their private route. Moreover, we assume that CDs’ value of time corresponds to $0.225 $ / \min$ in the base case (Wadud, 2017). Hence, if we take into account the instance’s average velocity of $v = 19 km / h$ , we obtain a payment per km of $c^{DD} = \frac{0.225 \frac{$}{\min} \cdot 60 \frac{\min}{h}}{19 \frac{km}{h}} \approx 0.7 $ / km$ for DDs. We set the OD compensation to $c_{i j}^{OD} = c^{OD} = 5 $/request$ in the base case, as this corresponds to the minimum compensation expectation for ODs when delivering a request, according to a representative study (cf. Le and Ukkusuri, 2019). In the base case, we set $C^{sev} = \infty$ to account for a context wherein firing is impossible. Moreover, we consider a penalty of 10$ per undelivered request. This ensures that it is always cheaper to outsource requests to CDs or deliver them with SDs than not delivering them in the base case. We set the initially available CDs to 500 for both DDs and ODs in the base case. We perform sensitivity analyses for all parameters according to Table 2.

Table 2.

Base case parameters and their variations.

Quantity	Parameter	Base case	Variation range
CD joining rates (cf. equation (4.2))	$q^{α}$	0.09	$[0.01, 0.17]$
Severance payment (cf. equation (2.5))	$C^{sev}$	$\infty$	$[0, 60]$
SD fix costs per hour (cf. equation (2.5))	$C^{fix}$	$20 $ / h$	$[4, 34]$
DD per km costs (cf. equation (3.4a))	$c^{DD}$	$0.7 $ / km$	$[0.5, 7.5]$
OD costs per request (cf. equation (3.4a))	$c^{OD}$	$5.5 $ / request$	$[1, 8]$
Share of CD arrivals within $\bar{T}$	$ζ^{α}$	$ζ^{DD} = 1, ζ^{OD} = 0.13$	$[0.25, 1]$
Number of requests (request density)	$\sum_{i \in M} λ_{i T}$	3,000	$[1, 000, 5, 000]$

Table 3.

Deviation from optimal solution, $δ (%)$ , of policies obtained with PL-VFA, MY, and $n$ -step lookahead policy for different initial DD fleet sizes and demand scenarios. The best performing approach is marked in bold.

	Constant					Growth					Peak
$(n_{0}^{SD}, n_{0}^{DD}, n_{0}^{OD})$	MY	$n = 1$	$n = 2$	$n = 3$	PL-VFA	MY	$n = 1$	$n = 2$	$n = 3$	PL-VFA	MY	$n = 1$	$n = 2$	$n = 3$	PL-VFA
(0,6,0)	4.18	4.19	4.19	4.13	0.00	1.63	1.63	1.64	1.64	0.02	4.25	4.42	4.68	3.62	0.02
(0,9,0)	5.79	4.09	3.15	2.68	0.05	2.92	2.96	2.96	2.97	0.53	4.69	4.41	4.44	3.44	0.40
(0,12,0)	3.38	3.42	3.42	3.43	0.15	3.12	3.04	1.77	1.22	0.51	4.17	4.33	4.58	3.63	0.28

PL-VFA = piecewise linear value function approximation; MY = myopic policy; DD = dedicated gig-driver; SD = salaried driver; OD = opportunistic gig-driver.

To assess the results, we evaluate the quotient $h$ (%) between total cumulated costs in the final time step of a mixed fleet (i.e., consisting of SDs and CDs) and an SD-only fleet, and the cost saving $\bar{h} (%)$

\begin{aligned} h (%) & = 100 \frac{\sum_{t = 0}^{T} C_{t}^{tot} (mixed fleet of SDs and CDs)}{\sum_{t = 0}^{T} C_{t}^{tot} (SD-only fleet)}; \\ \bar{h} (%) & = 100 (%) - h (%) . \end{aligned}

(4.3)

5. Results

In the first part of this section, we validate our PL-VFA (Section 5.1) before analyzing the structural properties of a policy derived by PL-VFA in the base case (Section 5.2). In Section 5.3, we study the policies’ and parameter variations’ impact on total costs from an LSP perspective. Finally, we take the CDs’ perspective and compare different behavioral assumptions. We implemented the strategic level’s MDP in Python and used Gurobi 9.1.2 to solve the operational problem. We performed all experiments on a workstation with a GHz i9-9900 CPU at 16 $\times$ 3.10 GHz and 16 GB RAM. If not mentioned otherwise, reported results are average values based on executing the respective policy 50 consecutive times.

5.1. Validation of PL-VFA

To validate the PL-VFA approach, we evaluate PL-VFA on smaller instances and consider only DDs. In these instances, we can compute a solution with BDP. We study three demand scenarios: constant demand, growing demand, and peak demand. Moreover, we vary the initially available numbers of DDs. We benchmark the results obtained by PL-VFA with a myopic policy (MY), which always hires enough SDs to serve the demand in the current time step $t$ , and an $n$ -step lookahead policy that bases its hiring decisions on the assumed perfect information available for $n$ future steps. We set $n = 1, 2, 3$ to simulate perfect forecasts of up to 6 weeks and refer to Appendix H in the E-Companion for details. We compare the results to BDP, which yields the optimal solution. We define the total cumulated costs ${\bar{C}}_{T}$ in the final time step $T$ and the gap to the optimal solution $δ (%)$ as

{\bar{C}}_{T} = \sum_{t = 0}^{T} C_{t}^{tot}; δ (%) = 100 \cdot \frac{{\bar{C}}_{T} - {\bar{C}}_{T}^{BDP}}{{\bar{C}}_{T}^{BDP}} .

We summarize the average gaps

δ

in Table 3. The PL-VFA algorithm converged after

10 k

iterations for the constant and growth demand case and after

5 k

iterations for the peak demand case. We refer the interested reader to Appendix I in the E-Companion for a more detailed comparison of this section’s results and the instance’s characteristics. PL-VFA (almost) matches BDP results in all cases. The remaining approaches show deviations of up to

5.79 %

. Given that PL-VFA’s costs consistently stay within an error margin of a maximum of

0.53 %

compared to the costs derived from BDP, it can serve as a reliable and effective tool to obtain hiring policies.

5.2. Hiring Policy Comparison in the Base Case

We begin this section by studying the difference in the number of SDs hired by a policy obtained from PL-VFA and MY, which we denote by $π^{PL - VFA}$ and $π^{MY}$ , respectively. To this end, Figure 5(a) shows the difference in the number of SDs hired between $π^{MY}$ and $π^{PL - VFA}$ in $t = 0$ as a function of initial DDs and ODs. Firstly, we observe that the difference is always positive, hence $π^{MY}$ always hires more SDs than $π^{PL - VFA}$ . The difference increases with the number of DDs. This is plausible as if CD supply is initially not high, $π^{MY}$ hires enough SDs to minimize total costs in $t$ , whereas $π^{PL - VFA}$ hires less SDs than required to minimize total costs in $t$ to prevent an oversupply of SDs in later time steps, where it cannot fire SDs anymore. When CD supply is initially very low, $π^{PL - VFA}$ hires similarly many SDs as $π^{MY}$ , because the CD fleet will not become large enough over time to contribute to request deliveries. Appendix K in the E-Companion details the number of SDs hired for $π^{PL - VFA}$ and $π^{MY}$ . Next we study the difference between $π^{PL - VFA}$ and $π^{MY}$ from a temporal point of view. Let $n_{t}^{SD, opt}$ denote the number of SDs required to cover the entire demand in a time step $t$ , given the current CD fleet composition and demand. Figure 5(b) shows the difference between $n_{t}^{SD, opt}$ and $n_{t}^{SD}$ as a function of $t$ for both $π^{PL - VFA}$ and $π^{MY}$ . When using $π^{PL - VFA}$ the LSP underhires, that is, does not have sufficient drivers to serve all requests, in the first time steps of the time horizon and overhires, that is, has more drivers than they require, in the second half. When using $π^{MY}$ the LSP overhires from the first time steps on. The number of overhired SDs is $\sim 20 %$ lower in the final time step $T$ when using $π^{PL - VFA}$ than when using $π^{MY}$ . The results are plausible, as $π^{PL - VFA}$ anticipates future CD supply, it refrains from hiring many SDs which might become obsolete as the number of CDs grows. The myopic policy $π^{MY}$ does not take into account future CD supply and, therefore, always tries to fulfill the demand in the current time step, which causes increased overhiring over time.

Figure 5.

Difference between $π^{PL - VFA}$ and $π^{MY}$ and over- versus underhiring over time. (a) $π^{MY} - π^{PL - VFA}$ in $t = 0$ ; (b) over- versus underhiring over time. PL-VFA = piecewise linear value function approximation; MY = myopic policy.

Result 1

The number of overhired SDs in $T$ is $\sim 20 %$ lower when using $π^{PL - VFA}$ than when using $π^{MY}$ . The myopic nature of $π^{MY}$ causes it to overhire SDs from early time steps on.

5.3. Sensitivity Analysis

Figure 6 shows $\bar{h}$ as a function of different DD and OD joining rates. With increasing CD joining rates, we observe higher cost savings of up to $81 %$ , when increasing $q^{DD}$ , and of up to $78 %$ , when increasing $q^{OD}$ . Moreover, we observe that for increasing $q^{α}$ , $π^{PL - VFA}$ outperforms $π^{MY}$ by up to 5 percentage points when varying $q^{DD}$ (cf. Figure 6(a)), and 2 percentage points when varying $q^{OD}$ (cf. Figure 6(b)). A 5 percentage points higher $\bar{h}$ corresponds to $19 %$ lower total costs when using $π^{PL - VFA}$ . Moreover, $π^{PL - VFA}$ outperforms or equals the performance of $n$ -step lookahead methods in almost every scenario, except for $q^{OD} = 0.01$ . Notably, unlike the $n$ -step lookaheads which rely on perfect forecasts for $n$ steps, $π^{PL - VFA}$ operates without assuming exact future knowledge. Remarkably, PL-VFA secures up to a 2-percentage point advantage over the highest performing benchmark, translating into $10 %$ lower total costs than the best performing benchmark. The total cost decrease’s sensitivity concerning the variation of $q^{OD}$ is low compared to the sensitivity concerning $q^{DD}$ . This is plausible, as the ODs’ likelihood of being matched to requests is lower than that of DDs. Opposed to DDs, ODs only accept requests whose origin and destination correspond to their origin and destination. Moreover, the share of ODs arriving within $\bar{T}$ is lower, that is, $ζ^{OD} ≪ ζ^{DD}$ .

Figure 6.

Variation of CD joining rate. (a) Variation of the DD joining rate $q^{DD}$ ; (b) variation of OD joining rate $q^{OD}$ . CD = crowdsourced driver; DD = dedicated gig-driver; OD = opportunistic gig-driver.

Figure 7.

$n_{t}^{SD} - n_{t}^{SD, opt}$ for each time step $t$ . (a) $q^{DD} = 0.05$ ; (b) $q^{DD} = 0.09$ ; (c) $q^{DD} = 0.17$ . SD = salaried driver; DD = dedicated gig-driver.

Result 2

The advantage of $π^{PL - VFA}$ over $π^{MY}$ grows with increasing $q^{α}$ and can lead to up to $19 %$ lower total costs compared to $π^{MY}$ , while almost always matching or outperforming methods based on perfect information lookahead, yielding up to $10 %$ lower total costs.

To provide a better intuition on how $π^{PL - VFA}$ achieves smaller total costs than $π^{MY}$ , we study the temporal development of the number of SDs. To this end, Figure 7 shows the difference between $n_{t}^{SD, opt}$ and $n_{t}^{SD}$ as a function of $t$ for $q^{DD} = 0.05$ , $q^{DD} = 0.09$ (base case), and $q^{DD} = 0.17$ . For $q^{DD} = 0.05$ (cf. Figure 7(a)), the difference between $n_{t}^{SD, opt}$ and $n_{t}^{SD}$ grows over the time horizon. While $π^{PL - VFA}$ underhires in the first time steps, $π^{MY}$ overhires over the entire time horizon. For $q^{DD} = 0.09$ (cf. Figure 7(b)) the behavior is similar to $q^{DD} = 0.05$ , but the number of under- and overhired SD increases. Moreover, the difference between $π^{MY}$ and $π^{PL - VFA}$ also increases. Figure 7(c) explores under- and overhiring for $q^{DD} = 0.17$ . When using $π^{PL - VFA}$ , we observe the same behavior as for $q^{DD} = 0.05$ and $q^{DD} = 0.09$ , however, with even stronger under- and overhiring as well as a stronger difference between $π^{MY}$ and $π^{PL - VFA}$ . When CD joining rates are low, $π^{PL - VFA}$ cannot leverage potential future CD supply in its decision-making and the need for SDs remains higher over the entire time horizon. Hence, the difference between $π^{MY}$ and $π^{PL - VFA}$ is lower. When CD joining rates are higher, SDs risk becoming obsolete. Hence, $π^{PL - VFA}$ underhires in early time steps to prevent an obsolete SD pool in later time steps. As total cumulated costs are smaller for $q^{DD} = 0.17$ than for $q^{DD} = 0.09$ , we interpret the previous result as follows: by underhiring in early time steps $π^{PL - VFA}$ hedges against overhiring in later time steps, that is, against having to remunerate a large amount of SDs that are not required anymore as the number of CDs increased over time. We provide the impact of underhiring on the LSP’s service level in Appendix L in the E-Companion.

Result 3

$π^{PL - VFA}$ hedges against overhiring in later time steps by hiring up to 50 SDs less than required in early time steps and, therefore, investing in penalties for requests not delivered.

We now study the effect of $q^{DD}$ and $q^{OD}$ jointly. To this end, Figure 8(a) shows the cost saving compared to an SD-only fleet, $\bar{h}$ . We observe that increasing joining rates lead to higher cost savings. Moreover, we can see that cost savings are more sensitive to $q^{DD}$ than to $q^{OD}$ . For example, when $q^{DD} = 0.13$ , varying $q^{OD}$ has no impact on cost savings. The LSP achieves the highest cost saving, that is, $78 %$ , when $q^{DD}$ is highest.

Figure 8.

Combined impact of varying joining rates on $\bar{h}$ (left) and $h$ (right) in $t = T$ . (a) Cost saving $\bar{h}$ ; (b) cost ratio $h$ (split into components).

Figure 9.

Variation of CD costs. (a) DDs’ costs per km $c^{DD}$ ; (b) ODs’ costs per request $c^{OD}$ . CD = crowdsourced driver; DD = dedicated gig-driver; OD = opportunistic gig-driver.

Result 4

The increase of $q^{DD}$ reduces costs by up to $78 %$ and, therefore, more than the increase of $q^{OD}$ . Hence, DDs are the main driver for cost savings within the CD fleet.

Figure 8(b) shows the driver and penalty cost split as a percentage of total costs for varying $q^{DD}$ and $q^{OD}$ . We observe that SDs have, overall, the highest cost share, with up to $50 %$ for $q^{DD} = q^{OD} = 0.01$ . DD costs have the overall second highest cost share with up to $30 %$ when $q^{DD} = 0.13$ , while ODs have lower importance in the cost mix. Penalty costs are only high when the DD joining rate is low and highest for $(q^{DD} = 0.01, q^{OD} = 0.01)$ with a share of up to $40 %$ .

Result 5

In mixed fleets, SDs are the main total costs driver with a cost share of up to $50 %$ , followed by DDs with up to $30 %$ .

Figure 10.

Variation of severance payment and SD fix costs. (a) Variation of severance payment $C^{sev}$ ; (b) variation of SDs’ fix costs $C^{fix}$ . SD = salaried driver.

In Figure 9(a), we report DD costs per km. The cost-saving potential decreases with increasing $c^{DD}$ . For low $c^{DD}$ the advantage of using $π^{PL - VFA}$ is highest and leads to higher cost savings compared to $π^{MY}$ of up to 5 percentage points. In Figure 9(a), we report OD costs per request. We observe that $\bar{h}$ remains constant over the entire range of $c^{OD}$ . The advantage of using $π^{PL - VFA}$ observed for low $c^{DD}$ in Figure 9(a) is plausible since the LSP can increasingly leverage DDs when their costs are low. Finally, $π^{PL - VFA}$ matches or outperforms lookahead policies for all $c^{DD}$ and $c^{OD}$ , and different values of $n$ .

Result 6

The cost saving potential is more sensitive to DD costs than to OD costs. The advantage of $π^{PL - VFA}$ is highest when CD costs are low, for which $π^{PL - VFA}$ yields more than 5 percentage points higher cost savings than $π^{MY}$ . Moreover, $π^{PL - VFA}$ is able to match or outperform the strongest lookahead policy in all cases by up to 0.5 percentage points.

Finally, to understand the impact of firing flexibility in workforce planning, we vary the severance payment $C^{sev}$ according to Table 2 and report the results in Figure 10(a). We also report the base case, wherein $C^{sev} = \infty$ for comparison. We observe that with increasing $C^{sev}$ , the saving potential compared to an SD-only fleet decreases. The cost advantage of using $π^{PL - VFA}$ increases with increasing $C^{sev}$ and remains approximately constant from $C^{sev} = 40 $$ on. When $C^{sev}$ is low, the LSP can lay off SDs at any time without additional costs. Hence, the advantage of using PL-VFA is negligible. When $C^{sev}$ is higher, the value of not hiring SDs is potentially higher as SDs can become obsolete in later time steps. Firing them then results in penalty costs that could have been avoided if they were not hired in the first place. This trade-off is only made by $π^{PL - VFA}$ . Moreover, we note that when $C^{sev}$ is low, PL-VFA holds no discernible advantage compared to the lookahead policies, as the optimal policy simply involves greedy hiring and firing. With moderately high $C^{sev}$ , $π^{PL - VFA}$ falls short of the lookahead methods’ performance. We can attribute this to the PL-VFA’s challenge in identifying the optimal slope set, exacerbated by a larger action space due to extra firing decisions. The $n$ -step lookaheads’ apparent superiority stems from their unrealistic advantage of perfect future predictions, an advantage not available in practical scenarios. As the severance costs $C^{sev}$ increase, approaching the base case where $C^{sev} = \infty$ , firing becomes impossible. Consequently, the action space reduces to only hiring decisions, for which PL-VFA manages to approximate the optimal slope set, thereby outperforming all other methods including the lookahead policies. In Figure 10(b), we explore the impact of $C^{fix}$ on $\bar{h}$ . For small $C^{fix}$ $\bar{h}$ is smaller and the difference between $π^{PL - VFA}$ and $π^{MY}$ is negligible. The cost saving potential increases up to $82.5 %$ for $π^{PL - VFA}$ and $81 %$ for $π^{MY}$ . The difference between $π^{PL - VFA}$ and $π^{MY}$ increases up to 7 percentage points when $C^{fix} = 16 $$ , which corresponds to $19 %$ lower total costs. When $C^{fix}$ increases further, the difference between both policies decreases again. For low fixed SD costs, the decision of hiring or not hiring SDs does not significantly impact total costs. As $C^{fix}$ increases, this decision needs to be traded off more carefully. Interestingly, the advantage of using PL-VFA vanishes for even higher fixed costs, as both policies refrain from hiring costly SDs. When varying $C^{fix}$ , $π^{PL - VFA}$ always outperforms the $n$ -step lookhahead policies, with a cost saving advantage of up to $1$ percentage points compared to $n = 3$ when $C^{fix} = 3 $$ , which corresponds to $3 %$ lower costs.

Result 7

Compared to $π^{MY}$ , $π^{PL - VFA}$ achieves three percentage points higher cost savings when severance payments become infinite (base case) and 7 percentage points higher cost savings, that is, $19 %$ lower total costs, when SD fix costs are set to $16 $ / h$ . While $π^{PL - VFA}$ cannot outperform the $n$ -step lookahead approaches for varying $C^{sev}$ , it does for varying $C^{fix}$ with total lower costs of up to $3 %$ .

Varying the share of CDs being active within $\bar{T}$ , $ζ^{α}$ , (see Figure 11(a)) reveals that higher $ζ^{α}$ lead to a higher share of CD costs in the cost mix of up to $35 %$ when $ζ^{DD} = ζ^{OD} = 1$ , thereby reducing SDs’ utilization. Additionally, for identical $ζ^{α}$ , DDs incur higher costs, suggesting their preferential use by the LSP over ODs. Finally, varying the request density (cf. Figure 11(b)) shows that with higher request density, SD costs rise to $82 %$ of total costs, while being at $0 %$ when the request density is low. This increase is plausible, given that more requests necessitate a larger delivery supply for maintaining high service levels. With CD supply remaining the same, the additional demand is met by an increase in SD usage.

Figure 11.

Variation of $ζ^{α}$ and request density $\sum_{i \in M} λ_{i T}$ . (a) Variation of $ζ^{α}$ ; (b) variation of $\sum_{i \in M} λ_{i T}$ .

Result 8

When the availability of CDs increases to $ζ^{DD} = ζ^{OD} = 1$ within $\bar{T}$ , the share of CD costs rises to $35 %$ , indicating a higher degree of outsourcing by the LSP. When the request density increases, the LSP increasingly relies on SDs, which yields an increased share in total costs of up to $82 %$ .

5.4. Impact Factors on Delivery Option Attractiveness

To understand the conditions under which an operator prefers certain drivers over others in the delivery process, we perform a scenario analysis: we first construct scenarios that vary by one parameter compared to the base case. Then, we analyze the average share of requests delivered by each driver type over the time horizon $T$ . First, we focus on parameters directly controlled by the LSP, that is, personnel costs for SDs and payments to CDs. For SDs, we focus on low $C^{fix}$ implying a scenario in which SDs’ wage expectation is low, which allows the LSP to hire more SDs. For CDs, we focus on scenarios with low $c^{DD}$ and $c^{OD}$ implying low payment per request expectations of CDs. We vary the payments to CDs within a corridor, in which we assume no impact on the driver supply ( $q^{DD}$ and $q^{OD}$ ). This is plausible since lower payments make CDs more attractive in the request matching process, thereby compensating for the reduced payments per request via an increased amount of total requests assigned. Second, we consider two parameters the LSP has only indirect control over (e.g., via promotional campaigns): low and high CD supply. We summarize the scenarios in the first column of Table 4 and add the base case for reference. For each scenario, we analyze whether a certain driver type dominates the other driver types or if the requests are equally split between multiple driver types.

Table 4.
Scenarios and preferred delivery option.

Preferred

delivery

Scenario option^a

Base case (BC): See Table 2 Balanced

Scenario 1 (S1): Low SD wages ( $C^{fix} = 4 $$ ) Balanced

Scenario 2 (S2): Low DD costs ( $c^{DD} = 0.5 $ / km$ ) DD

Scenario 3 (S3): Low OD costs ( $c^{OD} = 1 $ / req .$ ) Balanced

Scenario 4 (S4): Low CD supply ( $q^{α} = 0.01$ ) SD

Scenario 5 (S5): High CD supply ( $q^{α} = 0.13$ ) Balanced

	Preferred
Base case (BC): See Table 2	Balanced
Scenario 1 (S1): Low SD wages ( $C^{fix} = 4 $$ )	Balanced
Scenario 2 (S2): Low DD costs ( $c^{DD} = 0.5 $ / km$ )	DD
Scenario 3 (S3): Low OD costs ( $c^{OD} = 1 $ / req .$ )	Balanced
Scenario 4 (S4): Low CD supply ( $q^{α} = 0.01$ )	SD
Scenario 5 (S5): High CD supply ( $q^{α} = 0.13$ )	Balanced

SD = salaried driver; DD = dedicated gig-driver; OD = opportunistic gig-driver; CD = crowdsourced driver.

^aShare of delivered requests of $\geq 60 %$ not considered balanced, but dominated by one driver type

Figure 12 shows the average share of requests served by each driver type over the time horizon $T$ for each of the scenarios in Table 4 and we summarize Figure 12's results in the second column of Table 4. In BC, S1, S3, and S5, the majority of requests is served by either SDs or DDs. Thus, we consider the preferred delivery option to be balanced. Interestingly, the LSP still relies in almost equal shares on DDs in S1, although SD wages are low. This can be attributed to the sufficient supply of DDs in the base case and, which eliminates the need of hiring many SDs although they are cheap. In S5, SDs remain crucial in the delivery mix despite high CD supply, which indicates their importance for certain request routes for which CDs are unsuitable. In S2 DDs serve more than 60% of requests, clearly being the preferred delivery option for the LSP, which is plausible as low DD costs substantially favor DDs. Contrarily, SDs are the preferred option delivery option for the LSP in S4, which can be explained by the lack of CD supply. We observe that, even if conditions are favoring ODs (S3), they are only responsible for a small share of delivered requests.

Figure 12.

Average share of requests delivered (in %) over $T$ by different driver types.

Result 9

In the base case, when SD wages are low, and when CD supply is high, the delivery options are balanced. However, when payments to DDs and CD supply are low, the share of requests surpasses $60 %$ for DDs and SDs, respectively.

In the previous analysis, we observed that ODs play a minor role in the request delivery process. Apart from payments to ODs, two more factors influence ODs’ share in request matches: temporal patterns ( $ζ^{OD}$ ) and spatial patterns ( $P_{i j}^{OD}$ ). However, the LSP has no control over these operational factors. To understand their impact we investigate the share of requests delivered by ODs for different number of DDs and ODs for three scenarios: the base case for reference, a varied temporal pattern, and a varied spatial pattern (cf. Figure 13). In Figure 13(b), we change ODs’ temporal patterns, represented by $ζ^{OD}$ , from 0.13 to 0.5, thereby increasing the number of ODs being active within $\bar{T}$ . The share of requests delivered by ODs reaches around $50 %$ when the number of ODs is high, and only a few DDs are available. However, when the number of DDs is high, the share of requests delivered by ODs remains as low as in the base case. In Figure 13(c), we synchronize ODs’ spatial driving patterns, $P_{i j}^{OD}$ , with request route patterns, that is, $P_{i j}^{OD} = P_{i j}^{R}$ . This has a stronger impact on the share of requests delivered by ODs as it reaches values of above $80 %$ when the number of ODs is high and DDs is low. This bears a significant potential for municipalities to ameliorate sustainable logistics. Unlike DDs, ODs do not induce traffic, and municipalities could motivate LSPs to implement measures increasing the share of OD deliveries. LSPs can, for example, provide incentives for customers to buy during OD peak times by offering rebates during these times, or synchronize request and OD routes by pooling requests at micro-hubs which coincide with regions frequently visited by ODs.

Figure 13.

Share of requests delivered by ODs (in %) in $t = 0$ and $n_{0}^{SD} = 0$ for different fleet compositions for the base case, varied temporal patterns, and varied spatial patterns. (a) Base case; (b) increase of $ζ^{OD}$ to 0.5; (c) synchronizing OD and request routes. ODs = opportunistic gig-drivers; SD = salaried driver.

Result 10

ODs’ share in requests delivered increases to more than $80 %$ when synchronizing ODs’ temporal and spatial patterns with those of requests and the number of DDs is low.

So far, we assumed that CDs leave the LSP according to a fixed resignation rate. In the next section, we study the effect of a resignation probability that depends on the number of unmatched CDs.

5.5. Analysis of Unmatched CDs

Figure 14 reports the average percentage of unmatched DDs and ODs for different joining rates over the entire time horizon. We observe that the percentage of unmatched DDs (cf. Figure 14(a)) is higher than the percentage of unmatched ODs, especially when $q^{DD}$ is high. For high $q^{DD}$ , the percentage of unmatched DDs amounts to $60 %$ . The percentage of unmatched ODs is lower and amounts to a constant value of $10 %$ across different joining rates (cf. Figure 14(b)).

Figure 14.

Percentage of unmatched CDs when the resignation probability does not depend on the number of unmatched CDs. The $z$ -axis is logarithmically scaled. (a) Percentage of unmatched DDs; (b) Percentage of unmatched ODs. CDs = crowdsourced drivers; DDs = dedicated gig-drivers; ODs = opportunistic gig-drivers.

As the joining rate grows, CD supply surpasses demand, and the LSP can no longer outsource demand to CDs. Furthermore, the higher quotient of unmatched DDs is plausible since more DDs are active on the operational level than ODs due to their higher $ζ^{DD}$ . Hence, they are more likely to be active when there is no demand.

Figure 15 shows the number of unmatched CDs now assuming that CDs’ resignation probability depends on $s_{i j}^{α}$ (cf. equation (4.2)). Overall, the quotient of unmatched CDs is significantly lower than in Figure 14. Moreover, the difference between DDs and ODs is small. Both have a relatively constant ratio of unmatched drivers across joining rates of around $10 %$ . The lower ratio of unmatched DDs is plausible, as the DDs’ resignation probability depends on the number of DDs unmatched. Hence, it can significantly surpass the base case resignation probability of 0.01 and, therefore, the joining rate. When DDs leave the LSP’s platform at a higher rate than joining it, the LSP accumulates fewer DDs than in the case when the resignation probability has a constant value of 0.01. Fewer DDs imply less unmatched DDs. In Figure 16, we show the difference in the number of SDs and CDs in the final time step $t = T$ between the case wherein the resignation probability depends and wherein the resignation probability does not depend on the percentage of unmatched CDs. We observe that the LSP hires more than 100 additional SDs and has significantly less CDs at their disposition when the resignation probability depends on the number of unmatched CDs. The lower CD supply urges the LSP to hire more SDs to ensure high service levels.

Figure 15.

Percentage of unmatched CDs when CDs’ resignation probability depends on number of unmatched CDs. The $z$ -axis is logarithmically scaled. (a) Percentage of unmatched DDs; (b) percentage of unmatched ODs. CDs = crowdsourced drivers; DDs = dedicated gig-drivers; ODs = opportunistic gig-drivers.

Figure 16.

Difference between the total number of drivers when resignation probability depends on the number of unmatched CDs and when it does not. The $z$ -axis is logarithmically scaled. (a) Difference for SDs; (b) Difference for DDs; (c) Difference for ODs. CDs = crowdsourced drivers; SDs = salaried drivers; DDs = dedicated gig-drivers; ODs = opportunistic gig-drivers.

Result 11

The LSP has to hire $\geq 100$ more SDs, when the resignation probability depends on the number of unmatched CDs, as the effective CD supply is significantly lower.

6. Conclusion

In this article, we studied the strategic workforce planning of an LSP providing on-demand delivery services with a mixed fleet of couriers consisting of SDs and CDs. We integrated long-term strategic SD hiring decisions and short-term operational decisions regarding driver dispatching. We formalized the strategic hiring and firing problem as an MDP and solved it with approximate dynamic programming based on piecewise linear value function approximation, which allows us to study large-scale instances. We incorporated operational costs in the MDP’s cost function using a fluid approximation to account for delivery operations.

We conducted a case study based on a real-world data set from Grubhub for food delivery in a metropolitan area located in the US. Herein, our studies led to several findings, which we synthesize in the following.

Total costs obtained with PL-VFA are either equal to or up to $19 %$ lower than total costs obtained with a myopic hiring policy, while mostly matching our outperforming an up to 3-step lookahead policy with perfect information by up to $10 %$ . PL-VFA achieved this cost saving by hiring less SDs than required to serve the demand and consequently accepts lower service levels in early time steps. It does so to hedge against remunerating obsolete SDs in later time steps as the amount of CDs grew.

SDs and DDs are the main cost drivers in the total cost mix with up to a 50% and a 30% share in total costs, respectively. The significance of SD costs in the total cost mix stresses the importance of finding good SD hiring policies, which minimize the number of SDs hired. DDs have the second highest contribution to total costs, when $q^{DD}$ is high. The DDs’ contribution is more significant than the one of ODs because ODs only accept requests coinciding with their origin and destination. Moreover, their share of arrivals within $\bar{T}$ is significantly lower than the one of DDs.

The LSP has to hire more than $100$ additional SDs when CDs are matching sensitive as an oversupply of CDs, that is, a higher number of CDs than required to serve the demand, does not arise anymore.

This work opens up a promising new research avenue in the field of crowdsourced deliveries, by combining the study of crowdsourced delivery fleets with long-term workforce planning. Specifically, this work provides a foundation for follow-up studies. Firstly, the SD workforce planning problem could be extended by accounting for SDs with different contract durations or working schedules. Moreover, one could introduce uncertainty in the demand dimension on the strategic workforce planning level. Finally, one could implement behavioral components for CDs, for example, discrete choice models based on real-world data, to more accurately represent the CDs’ behavior, for example, regarding resignation processes.

Supplemental Material

sj-pdf-1-pao-10.1177_10591478241268602 - Supplemental material for Strategic Workforce Planning in Crowdsourced Delivery With Hybrid Driver Fleets

Supplemental material, sj-pdf-1-pao-10.1177_10591478241268602 for Strategic Workforce Planning in Crowdsourced Delivery With Hybrid Driver Fleets by Julius Luy, Gerhard Hiermann and Maximilian Schiffer in Production and Operations Management

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

ORCID iDs

Julius Luy

Maximilian Schiffer

Supplemental Material

Supplemental material for this article is available online ().

How to cite this article

Luy J, Hiermann G and Schiffer M (2024) Strategic Workforce Planning in Crowdsourced Delivery With Hybrid Driver Fleets. Production and Operations Management 33(11): 2177–2200.

References

Ahn

H-S

Righter

Shanthikumar

(2005) Staffing decisions for heterogeneous workers with turnover. Mathematical Methods of Operations Research 62: 499–514.

Alnaggar

Gzara

Bookbinder

J-H

(2021) Crowdsourced delivery: A review of platforms and academic literature. Omega 98: 102139.

Amazon (2023) Amazon relay short-term contracts. https://panorama-ins.com/amazon-relay-short-term-contracts/. (accessed 31 January 2024).

Archetti

Savelsbergh

Speranza

(2016) The vehicle routing problem with occasional drivers. European Journal of Operational Research 254(2): 472–480.

Arlotto

Chick

Gans

(2014) Optimal hiring and retention policies for heterogeneous workers who learn. Management Science 60(1): 110–129.

Arslan

Zuidwijk

(2019) Crowdsourced delivery—A dynamic pickup and delivery problem with ad hoc drivers. Transportation Science 53(1): 222–235.

Behrendt

Savelsbergh

Wang

(2022a) Crowdsourced same-day delivery: Joint planning and co-ordination for centralized and decentralized couriers. Available at optimization-online: https://optimization-online.org/?p=20103. (accessed 25 February 2023).

Behrendt

Savelsbergh

Wang

(2022b) A prescriptive machine learning method for courier scheduling on crowdsourced delivery platforms. Transportation Science 57(4): 889–907.

Berbeglia

Cordeau

J-F

Laporte

(2010) Dynamic pickup and delivery problems. European Journal of Operational Research 202(1): 8–15.

10.

Bösch

Becker

Axhausen

(2018) Cost-based analysis of autonomous mobility services. Transport Policy 64: 76–91.

11.

Braverman

Dai

Liu

Ying

(2019) Empty-car routing in ridesharing systems. Operations Research 67(5): 1437–1452.

12.

Cachon

G-P

Daniels

K-M

Lobe

(2017) The role of surge pricing on a service platform with self-scheduling capacity. Manufacturing & Service Operations Management 19(3): 368–384.

13.

Cheng

Sim

Zhao

(2023) Robust workforce management with crowdsourced delivery. Available at SSRN: https://ssrn.com/abstract=4387916. (accessed 19 July 2023).

14.

Dai

Liu

(2017) Capacity planning for o2o on-demand delivery systems with crowd-sourcing. Available at SSRN: https://ssrn.com/abstract=2921230. (accessed 30 September 2022).

15.

Dayarian

Savelsbergh

(2020) Crowdshipping and same-day delivery: Employing in-store customers to deliver online orders. Production and Operations Management 29(9): 2153–2174.

16.

De Feyter

Guerry

M-A

Komarudin

(2017) Optimizing cost-effectiveness in a stochastic Markov manpower planning system under control by recruitment. Annals of Operations Research 253(1): 117–131.

17.

Dimitriou

Georgiou

Tsantas

(2013) The multivariate non-homogeneous markov manpower system in a departmental mobility framework. European Journal of Operational Research 228(1): 112–121.

18.

Enders

Harrison

Pavone

, et al. (2023) Hybrid multi-agent deep reinforcement learning for autonomous mobility on demand systems. In: Proceedings of The 5th Annual Learning for Dynamics and Control Conference, Vol. 211. PMLR, 1284–1296.

19.

Forbes (2021) Global e-commerce sales to hit $4.2 trillion as online surge continues. https://www.forbes.com/sites/joanverdon/2021/04/27/global-ecommerce-sales-to-hit-42-trillion-as-online-surge-continues-adobe-reports/. (accessed 30 September 2022).

20.

Freightwaves (2022) Bringg, flexible labor platform workwhile launch the driver network. https://www.freightwaves.com/news/bringg-flexible-labor-platform-workwhile-launch-drivers-network. (accessed 18 July 2023).

21.

Galkin

Schlosser

Capayova

Takacs

Kopytkov

(2021) Attitudes of Bratislava citizens to be a crowd-shipping non-professional courier. Transportation Research Procedia 55: 152–158.

22.

Gans

Zhou

Y-P

(2002) Managing learning and turnover in employee staffing. Operations Research 50(6): 991–1006.

23.

Gdowska

Viana

Pedroso

(2018) Stochastic last-mile delivery with crowdshipping. Transportation Research Procedia 30: 90–100.

24.

Goyal

Zhang

Benjaafar

(2023) Crowdsourcing last-mile delivery with hybrid fleets under uncertainties of demand and driver supply: Optimizing profitability and service level. Available at SSRN: https://ssrn.com/abstract=4322670. (accessed 25 February 2023).

25.

Grubhub (2018) mdrplib. https://github.com/grubhub/mdrplib. (accessed 30 September 2022).

26.

Guerry

M-A

De Feyter

(2012) Optimal recruitment strategies in a multi-level manpower planning model. The Journal of the Operational Research Society 63(7): 931–940.

27.

Gurvich

Lariviere

Moreno

(2019) Operations in the On-Demand Economy: Staffing Services With Self-Scheduling Capacity. Cham: Springer International Publishing, 249–278.

28.

Hall

Krueger

(2018) An analysis of the labor market for uber’s driver-partners in the United States. ILR Review 71(3): 705–732.

29.

Lavieri

Toriello

Liu

(2016) Strategic health workforce planning. IIE Transactions 48(12): 1127–1138.

30.

Jaillet

Loke

Sim

(2022) Strategic workforce planning under uncertainty. Operations Research 70(2): 1042–1065.

31.

Lanzetti

Schiffer

Ostrovsky

Pavone

(2023) On the interplay between self-driving cars and public transportation. IEEE Transactions on Control of Network Systems 1–12. DOI: https://doi.org/10.1109/TCNS.2023.3338248.

32.

T-V

Ukkusuri

S-V

(2019) Crowd-shipping services for last mile delivery: Analysis from American survey data. Transportation Research Interdisciplinary Perspectives 1: 100008.

33.

Lei

Jasin

Wang

Deng

Putrevu

(2020) Dynamic workforce acquisition for crowdsourced last-mile delivery platforms. Available at SSRN: https://ssrn.com/abstract=3532844. (accessed 30 September 2022).

34.

Mak

H-Y

(2020) Peer-to-peer crowdshipping as an omnichannel retail strategy. Available at SSRN: https://ssrn.com/abstract=3119687. (accessed 30 September 2022).

35.

Nascimento

Powell

(2009) An optimal approximate dynamic programming algorithm for the lagged asset acquisition problem. Mathematics of Operations Research 34(1): 210–237.

36.

Powell

(2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality. 2nd ed. USA: John Wiley & Sons.

37.

Raviv

Tenzer

(2018) Crowd-shipping of small parcels in a physical internet. http://www.eng.tau.ac.il/talraviv/Publications/Crowd-shipping%20of%20small%20parcels%20in%20a%20physical%20internet.pdf. (accessed 30 September 2022).

38.

Reyes

Erera

Savelsbergh

Sahasrabudhe

O’Neil

(2018) The meal delivery routing problem. https://optimization-online.org/?p=15139. (accessed 30 September 2022).

39.

Sampaio

Savelsbergh

Veelenturf

Van Woensel

(2019) Chapter 15 - Crowd-based city logistics. Sustainable Transportation and Smart Logistics. Elsevier, 381–400.

40.

Savelsbergh

Ulmer

(2022) Challenges and opportunities in crowdsourced delivery planning and operations. 4OR 20(1): 1–21.

41.

Song

Huang

H-C

(2008) A successive convex approximation method for multistage workforce capacity planning problem with turnover. European Journal of Operational Research 188(1): 29–48.

42.

Taylor

(2018) On-demand service platforms. Manufacturing & Service Operations Management 20(4): 704–720.

43.

The Washington Post (2022) Grubhub apologizes for ‘free lunch’ promo that slammed NYC restaurants. https://www.washingtonpost.com/food/2022/05/18/grubhub-nyc-promo/. (accessed 30 September 2022).

44.

Torres

Gendreau

Rei

(2022) Vehicle routing with stochastic supply of crowd vehicles and time windows. Transportation Science 56(3): 631–653.

45.

Ulmer

Savelsbergh

(2020) Workforce scheduling in the era of crowdsourced delivery. Transportation Science 54(4): 1113–1133.

46.

Ulmer

Thomas

Campbell

Woyak

(2021) The restaurant meal delivery problem: Dynamic pickup and delivery with deadlines and random ready times. Transportation Science 55(1): 75–100.

47.

Voigt

Kuhn

(2022) Crowdsourced logistics: The pickup and delivery problem with transshipments and occasional drivers. Networks 79(3): 403–426.

48.

Wadud

(2017) Fully automated vehicles: A cost of ownership analysis to inform early adoption. Transportation Research Part A: Policy and Practice 101(C): 163–176.

49.

Wired (2018) This app lets drivers juggle competing uber and lyft rides. https://www.wired.com/story/this-app-lets-drivers-juggle-competing-uber-and-lyft-rides/ (accessed 19 July 2023).

50.

Yildiz

Savelsbergh

(2019) Provably high-quality solutions for the meal delivery routing problem. Transportation Science 53(5): 1372–1388.

51.

Zhang

Pavone

(2016) Control of robotic mobility-on-demand systems: A queueing-theoretical perspective. The International Journal of Robotics Research 35(1-3): 186–203.

52.

Zhou

Y-W

Lin

Zhong

Xie

(2019) Contract selection for a multi-service sharing platform with self-scheduling capacity. Omega 86(1): 198–217.

53.

Zhu

Sherali

(2009) Two-stage workforce planning under demand fluctuations and uncertainty. Journal of the Operational Research Society 60: 94–103.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.44 MB

	Preferred
	delivery
Scenario	option^a
Base case (BC): See Table 2	Balanced
Scenario 1 (S1): Low SD wages ( $C^{fix} = 4 $$ )	Balanced
Scenario 2 (S2): Low DD costs ( $c^{DD} = 0.5 $ / km$ )	DD
Scenario 3 (S3): Low OD costs ( $c^{OD} = 1 $ / req .$ )	Balanced
Scenario 4 (S4): Low CD supply ( $q^{α} = 0.01$ )	SD
Scenario 5 (S5): High CD supply ( $q^{α} = 0.13$ )	Balanced