Sage Journals: Discover world-class research

Abstract

Order splitting is one of the key issues in the e-commerce order fulfillment process. It increases operational costs, elevates carbon emissions, and compromises customer satisfaction. This article focuses on determining the product assortments to store within the multi-warehouse logistics network to minimize the total number of split orders subject to cardinality constraints. We show that this minimizing split orders (MSO) problem is NP-hard and demonstrate that even finding an optimal order fulfillment strategy with a given assortment selection is NP-hard. To further analyze the MSO problem, we introduce a concept termed the second-order dominant indexing rule. This indexing rule corresponds to a group of demand distributions, under which we are able to characterize the structure of the optimal assortment selection for various scenarios. In particular, when assortment overlapping is prohibited, the optimal selection can be explicitly derived. When the demand exhibits a total nested structure, an optimal selection is non-overlapping with more popular products allocated to larger warehouses. We also bridge the two-warehouse order splitting minimization problem with the single-warehouse assortment selection problem in the literature. Building upon this connection, we propose an extended marginal choice indexing (MCI) policy, which is proven to achieve optimality when the demand has a second-order dominant MCI. In addition, we propose an Iterative Improvement Heuristic that refines any existing assortment selection. The efficiency of the proposed heuristics is validated by extensive numerical experiments, demonstrating that the extended MCI policy performs near-optimally even when customer demand is not ideal, and both heuristics outperform the best benchmark in existing literature. Additional experiments on real-world data further confirm their effectiveness and scalability. Finally, we extend our findings to a two-tier multi-warehouse scenario with a back-end warehouse.

Keywords

Warehouse Assortment Selection Multi-Purchase Discrete Choice Model Order Fulfillment Order Splitting Sustainability

1. Introduction

Online shopping has become an indispensable part of our daily lives. According to a recent report by Boston Consulting Group (Barthel et al., 2023), the compound annual growth rate (CAGR) for e-commerce is anticipated to be 9% through 2027, expecting it to account for 41% of global retail sales by 2027, a significant rise from just 18% in 2017. This growth reflects a significant shift in consumer preferences toward online shopping. In response, many e-commerce companies are expanding their product assortments to attract a broader customer base, boost sales, and enhance customer satisfaction (Oboloo, 2021). Additionally, companies like JD and Freshippo have implemented free shipping for orders exceeding a certain value, encouraging customers to purchase more in a single transaction, reducing shipping costs per unit, and increasing the perceived value of purchases.

However, the growing diversity of product offerings and the emergence of large orders with distinct items present significant challenges on how to efficiently fulfill customer orders. On one hand, the increasing variety of products requires multiple warehouses to store the products since storing all the products in a single warehouse is usually too costly, if feasible. In reality, each warehouse is limited to storing only some of the products, necessitating e-commerce companies to operate multiple warehouses. On the other hand, when no single warehouse contains all the products, order splitting occurs. A single order is then split into multiple suborders that are handled by different warehouses. An increase in suborders implies the engagement of more manpower in the fulfillment process, thus raising operational costs. In addition, suborders also result in extra packing materials and additional shipments, higher material and energy consumption, and subsequently, greater carbon emissions (Zhang et al., 2021).

To overcome the ever-growing challenge of split orders, this article aims to minimize the expected split orders by selecting appropriate product assortments at warehouses in a logistic network. By examining this problem, we aim to provide valuable insights for companies in designing warehouse assortments in logistics networks and promoting more efficient, responsive, yet sustainable supply chain solutions. In general, the multi-warehouse assortment selection problem is notoriously challenging due to its combinatorial nature. To focus on the assortment selection problem, we adopt a common approach in the literature that ignores the inventory decisions. In other words, we assume that if a product is allocated to a warehouse, it is always available for fulfillment. For each warehouse, the warehouse capacity in our model is defined by the number of distinct stock-keeping units (SKUs) that can be stored in that warehouse. This constraint is based on the understanding that managing an increased variety of products within the limited space of a warehouse requires substantial organizational effort and affects operational efficiency (Alfaro and Corbett, 2003; Lopienski, 2021).

We have made the following key contributions. First, we present a broader framework for the multi-warehouse assortment selection problem than those in the existing literature. Our objective is to reduce the total number of split orders, which proves to be NP-hard even with just two warehouses. Additionally, we establish that determining an optimal fulfillment strategy to minimize split orders for any specified warehouse assortment selection is already NP-hard. To address these challenges, we present mixed-integer linear programming (MILP) formulations for both problems.

Second, we introduce a novel concept termed second-order dominant indexing rule, which aids in identifying optimal assortment selections. This rule highlights a class of demand distributions with distinct structural properties that inform optimal decisions.

Third, we explore three distinct scenarios where the optimal assortment selection can be directly obtained when a second-order dominant indexing can be found regarding the demand distribution: (i) non-overlapping assortment restrictions, (ii) demands with nested structures, and (iii) systems with two warehouses. In these cases, the optimal strategy involves allocating more popular products to larger warehouses. Specifically, when assortments cannot overlap, the optimal solution is explicitly determined. For nested demand structures, an optimal non-overlapping assortment assigns popular products to larger warehouses. For two-warehouse systems, we propose an extended marginal choice indexing (MCI) policy that achieves optimality. Additionally, we establish a connection between the single-warehouse assortment problem and the split-order minimization problem in two-warehouse systems, leveraging this insight to develop an iterative improvement heuristic (IIH) that refines any initial assortment selection for any demand distribution.

Fourth, through extensive numerical experiments, we demonstrate the efficacy of the proposed heuristics. The results confirm that the extended MCI policy attains optimal solutions when demand is independent and achieves near-optimal performance under general multi-purchase choice models, such as the multi-purchase multinomial logit model. While not explicitly designed as a stand-alone solver, the IIH consistently outperforms existing benchmarks, even when initialized with arbitrary assortment selections, offering robust and computationally efficient solutions. Furthermore, experiments on real-world data further validate the effectiveness and scalability of the proposed methods.

Lastly, we extend our insights into multi-tier multi-warehouse scenarios, featuring a back-end warehouse that consistently supports multiple front-end warehouses. Inspired by our findings from single-tier cases, we reveal the optimal assortment selection under the non-overlapping constraint when a second-order dominant MCI is identifiable. For scenarios involving two warehouses, we introduce a polynomial-time solvable heuristic that enumerates less than the number of times equal to the smaller capacity of the two warehouses. This heuristic achieves optimality when the demand has a second-order dominant MCI. For general cases involving multiple front-end warehouses, we introduce a greedy algorithm and another iterative heuristic to address the problem.

The remainder of the article is structured as follows. Section 2 reviews relevant literature. Section 3 formulates the problem and introduces the second-order dominant indexing rule. Section 4 examines optimal assortments under specific scenarios. Section 5 extends the analysis to multi-tier networks. Finally, Section 6 synthesizes the findings. Proofs are available in the Section EC.2.

2. Literature Review

This study intersects with existing research on warehouse assortment selection and product allocation in supply chain management. It also relates to multi-location assortment optimization in revenue management.

2.1. Warehouse Assortment Selection and Product Allocation

We contribute to the literature on warehouse assortment selection problems. Catalán and Fisher (2012) first formally discuss the multi-warehouse assortment selection problem, the goal of which is to minimize split orders. The authors prove the problem is NP-hard even if there are only two warehouses and provide four heuristics for solving the problem. Our problem setting is similar, with the difference that we model customer demand through a probability distribution function, whereas they consider each customer order individually. This distinction enables a more thorough analysis of the impacts stemming from different customer choice behaviors, thereby making it possible to identify certain structural properties of the optimal selection. In this article, we further demonstrate that, given an assortment selection, determining an optimal order fulfillment policy that minimizes split orders is NP-hard.

Zhu et al. (2021) study a similar problem without allowing assortments to overlap between different warehouses and propose a K-links heuristic clustering algorithm, which is based on the distribution of multi-item orders. Nonetheless, this heuristic lacks a performance guarantee. In contrast, we revisit this special case, delineating the optimal selection when the demand exhibiting particular patterns. Söylemez (2021) examines another similar problem, focusing solely on the occurrence of order splitting, instead of the total number of split orders. Bonekamp (2019) develops an optimization model to maximize the sum of cross-selling factors of products while minimizing split shipments to solve the multi-warehouse assortment allocation problem. The model is an extension of the Quadratic Multiple Knapsack Problem (QMKP), which accounts for product overlapping and load balancing among warehouses. To solve the problem, an extended greedy heuristic and a genetic algorithm are proposed, which are shown to find near optimal solutions in a real-world case study.

Our study is also related to Li et al. (2024), where the authors study the single-warehouse assortment selection problem under a cardinality constraint. They examine two distinct types of cost functions, associated with order splitting cost and spillover fulfillment cost, and focus on minimizing the fulfillment cost with respect to a single warehouse. Although the problem setting is quite different, we connect our problem to theirs in scenarios involving only two warehouses. Leveraging this connection, we present an iterative heuristic to improve any given two-warehouse assortment selection. Moreover, inspired by their introduction of the dominant indexing rule for analyzing the single-warehouse assortment selection problem, we innovatively introduce a concept termed second-order dominant indexing rule that assists us in identifying the structure of the optimal selection in various scenarios for our problem.

From another perspective, Li et al. (2019) investigate product allocation across multiple warehouses in the context of large-scale e-commerce, focusing on allocation density, defined as the ratio of the total number of allocated products to the total number of possible allocations. They find that shipping costs decrease significantly at low allocation densities, whereas at high densities, increases in allocation density result in only marginal cost reductions. Note that they consider product shipment in an aggregated manner rather than at the individual customer order level.

2.2. Multi-Location Assortment Optimization

This work is also related to multi-location assortment optimization. Unlike assortment optimization in the classic single-location setting (see e.g., Gallego and Topaloglu, 2019) and the omni-channel setting (see e.g., Dzyabura and Jagabathula, 2018), in the multi-location context, if a requested product is not available in the assortment of one location, it can be transshipped from other locations, incurring additional shipment costs. If none of the locations stocks the product, the customer may consider a substitutable product. The goal is to maximize expected profit by determining the optimal assortment of products to offer at each location.

To our best knowledge, very limited research studies the multi-location assortment optimization problem. Fisher and Vaidyanathan (2014) and Corsten et al. (2018) study similar problems without considering transshipment between different locations. Bebitoglu et al. (2018) and Çömez-Dolgan et al. (2022) study the capacitated multi-location assortment optimization problem with transshipment among different locations allowed, where each location has its own capacity limit, and both of them show the problem is NP-complete. Bebitoglu et al. (2018) assume that different location serves a separate geographical region whose customers’ demand is governed by a separate multinomial logit model. The authors propose a conic quadratic mixed integer programming formulation to solve it. Through numerical experiments, they show that their approach outperforms the mixed integer linear programming formulation. Çömez-Dolgan et al. (2022) assume customers’ demand are exogenous or independent for each location and if both the first and second choice of a customer cannot be satisfied by any location, the demand will be lost. The authors provide structural properties of the optimal assortments to simplify and speed up the search for the optimal solution. The authors also provide an upper threshold for the probability of customer substitution at each location, which ensures that the location’s capacity is fully utilized at optimal solutions if the substitution probability is below this threshold. Additionally, the authors demonstrate that when all locations have the same capacity, using them to the maximum can decrease expensive transshipment occurrences.

Contrasting with our setting, these studies concentrate solely on single-purchase scenarios where order splitting is not an issue, with the main goal being revenue maximization instead of minimizing split orders.

3. Problem Setting and Second-Order Dominant Indexing Rule

In this section, we begin by providing a formal definition of the assortment selection problem within a multi-warehouse logistics network, focusing particularly on minimizing split orders. Then, we discuss the complexity of this problem. Lastly, we introduce a novel concept pertaining to demand distribution that enables us to identify structural properties of the optimal assortment selection.

3.1. Problem of Minimizing Split Orders and Its Complexity

We consider an e-commerce company offering $N \in N_{+ +} = {1, 2, \dots}$ distinct products indexed as $1, 2, \dots, N$ and we let $N = {1, 2, \dots, N}$ . There are at most $M = 2^{N} - 1$ distinct types of orders. We assume customers’ choices align with a discrete demand distribution $π \in R^{M}$ , where $π_{m} \geq 0$ is the choice probability of order type $m \in [M]$ and $\sum_{m \in [M]} π_{m} = 1$ , Here, $[n]$ denotes the set ${1, 2, \dots, n}$ for any $n \in N_{+ +}$ . In subsequent analysis, $T_{1}, T_{2}, \dots, T_{M}$ will denote the $M$ distinct customer orders, and $π (T)$ will represent the probability of a specific set $T \subseteq N$ being selected.

We assume the e-company operates $D \in N_{+ +}$ warehouses servicing a single region. Let $S_{d} \subseteq N$ represent the product assortment stored in warehouse $d \in [D]$ and we use ${S} = [S_{1}, \dots, S_{D}]^{⊤}$ for ease of notation. Each warehouse cannot store more than its capacity $K_{d}$ , that is, $| S_{d} | \leq K_{d} < N$ for all $d \in [D]$ . For simplicity, we assume that the warehouses are indexed in descending order of their capacities, that is, $K_{1} \geq K_{2} \geq \dots \geq K_{D}$ . We assume, without loss of generality, that no single warehouse has the capacity to store all products. This assumption is based on the rationale that if a warehouse with such capacity existed, the optimal solution would naturally involve storing all products in that one warehouse. Furthermore, we assume that all orders will be fulfilled. These are more formally expressed in the following assumption:

Assumption 1
All orders are required to be fulfilled, but no individual warehouse can store all products.

Assumption 1 implies that $\cup_{d \in [D]} S_{d} = N$ . Given that all orders must be fulfilled, if the product assortment in each warehouse is not carefully selected, frequent order splitting may occur. Worse still, in some extreme cases, an order could be split into more than two suborders, with each being fulfilled by a different warehouse. Such scenarios can dramatically increase the company’s fulfillment and operational costs, degrade the customer’s shopping experience, and increase carbon emissions.

To mitigate such instances and contribute to environmental sustainability, the company aims to optimize the assortment of each warehouse to minimize the total number of suborders. We present the minimizing split order (MSO) problem as follows:
$\begin{aligned} (MSO) : min_{S} & \sum_{m \in [M]} π (T_{m}) C (T_{m} | S) \\ s.t. & | S_{d} | \leq K_{d}, & \forall d \in [D], \\ \cup_{d \in [D]} S_{d} = N, \\ S_{d} \subseteq N, & \forall d \in [D], \end{aligned}$
where $C (T | S)$ represents the minimum number of suborders required to fulfill order $T \subseteq N$ under the multi-warehouse assortment $S$ . Specifically, for a given $S$ , represented by $x \in {0, 1}^{D \times N}$ where $x_{n}^{d} = 1$ if $n \in S_{d}$ and $0$ otherwise, we can calculate $C (T | S (x))$ by solving the following mixed-integer linear programming:
$\begin{aligned} \begin{aligned} C (T | S (x)) = min_{y, z} & \sum_{d \in [D]} z_{d} \\ s.t. & y_{d, n} \leq z_{d}, & \forall d \in [D], n \in [N], \\ y_{d, n} \leq x_{n}^{d}, & \forall d \in [D], n \in [N], \\ \sum_{d \in [D]} y_{d, n} = I (n \in T), & \forall n \in [N], \\ y_{d, n} \in {0, 1}, & \forall d \in [D], n \in [N], \\ z_{d} \in {0, 1}, & \forall d \in [D], \end{aligned} \end{aligned}$
(1)
where $y \in {0, 1}^{D \times N}$ is a binary variable with $y_{d, n} = 1$ if product $n \in T$ is fulfilled by warehouse $d \in [D]$ and $0$ otherwise, and $z \in {0, 1}^{D}$ is another binary variable with $z_{d} = 1$ if parts of $T$ are fulfilled by warehouse $d$ and $0$ otherwise. $I (A)$ denotes the indicator function of event $A$ , equaling $1$ if $A$ is true and $0$ otherwise. More broadly, $C (T | S)$ can represent any fulfillment cost for $T$ under a given assortment $S$ , extending beyond the number of suborders. This highlights the flexibility of the (MSO) formulation in addressing diverse multi-warehouse assortment selection problems.

For a given $S$ , identifying a fulfillment policy that minimizes the number of suborders for an arbitrary customer order $T$ is already NP-hard. We formally state this in the subsequent proposition.
Proposition 1
Given a multi-warehouse system with predefined assortments, it is NP-hard to determine a fulfillment policy that minimize the number of suborders for an arbitrary customer order.

Proposition 1 is proved through a reduction from the Set Cover problem. It underscores the challenge of identifying an optimal fulfillment policy but it does not directly translate to the complexity of solving the MSO problem. In fact, solving MSO is also NP-hard, as formally stated in the following proposition:
Proposition 2 (Proposition 2 of Catalán and Fisher, 2012)

The problem of Minimizing Split Orders is NP-hard even when limited to just two warehouses.

Note that Catalán and Fisher (2012) propose an MILP to solve the MSO problem. For completeness, we present the MILP reformulation for (MSO) as follows:

\begin{aligned} min_{x, y, z} & \sum_{m \in [M]} π_{m} \sum_{d \in [D]} z_{d}^{m} \\ s.t. & \sum_{n \in [N]} x_{n}^{d} \leq K_{d}, & \forall d \in [D], \\ \sum_{d \in [D]} x_{n}^{d} \geq 1, & \forall n \in [N], \\ y_{d, n}^{m} \leq x_{n}^{d}, & \forall d \in [D], n \in [N], m \in [M], \\ y_{d, n}^{m} \leq z_{d}^{m}, & \forall d \in [D], n \in [N], m \in [M], \\ \sum_{d \in [D]} y_{d, n}^{m} = I (n \in T_{m}), & \forall n \in [N], m \in [M], \\ x_{n}^{d} \in {0, 1}, & \forall d \in [D], n \in [N], \\ y_{d, n}^{m} \in {0, 1}, & \forall d \in [D], n \in [N], m \in [M], \\ z_{d}^{m} \in {0, 1}, & \forall d \in [D], m \in [M] . \end{aligned}

(2)

where

y_{d, n}^{m} \in {0, 1}

indicates whether product

n

in order

T_{m}

is fulfilled by warehouse

d

, and

z_{d}^{m} \in {0, 1}

denotes whether any part of order

T_{m}

is fulfilled by warehouse

d

. This MILP provides a general and exact method to minimize split orders for any demand distribution and can be solved using commercial solvers, such as Gurobi. Detailed explanations of each constraint are provided in Section EC.1. A key distinction of our approach is that we aggregate identical orders and adopt a probabilistic perspective, whereas Catalán and Fisher (2012) treat each order individually. This aggregation enables a more detailed analysis of how customer choice behaviors influence optimal assortment selection. Meanwhile, the similarity in the formulation structure enables us to establish the NP-hardness of solving the MSO problem in a manner analogous to their approach, specifically through a reduction from the graph bisection problem.

While using commercial solvers for MILP (2) is a valid approach for addressing the MSO problem, this approach is not scalable and provides limited insights into assortment decisions in response to different customer demands. Subsequently, we will introduce a novel concept termed second-order dominant indexing rule. This rule pertains to a specific category of demand distributions. Within this category, we are able to characterize numerous structural properties of the optimal assortment selection, offering deeper understanding and guidance for strategic decision-making in planning warehouse storage.

3.2. Second-Order Dominant Indexing Rule

Solving the problem of minimizing split orders is generally NP-hard. When customers purchase at most one product, the demand simplifies to traditional single-purchase discrete choice models (DCMs), where order splitting does not occur, and any feasible assortment is optimal. Therefore, we focus on scenarios involving multiple purchases. In this subsection, we introduce a special class of demand distributions, which exhibits beneficial properties for addressing multi-warehouse assortment selection problems.

In what follows, we first introduce several preliminary concepts before formally defining this family of demand distributions. An indexing rule $I$ is a ranking over the elements of $N$ , with $I (n)$ denoting the index of element $n$ under $I$ and $I^{- 1} (k)$ denoting the element ranked $k$ . Such a rule systematically assigns indices to each element in $N$ . Given a set of products $N$ with demand function $π$ , we define the marginal choice probability of product $n \in N$ as $ω_{n} = \sum_{T \subseteq N} π (T) \cdot I (n \in T)$ . A Marginal Choice Indexing (MCI) rule arranges products in descending order of their marginal choice probabilities. Specifically, if $I$ represents an MCI rule, we have $ω_{I^{- 1} (1)} \geq ω_{I^{- 1} (2)} \geq \dots \geq ω_{I^{- 1} (N)}$ ¹.

Introduced by Li et al. (2024), the (First-Order) Dominant Indexing rule is defined as follows: an indexing rule $I$ is dominant with respect to demand function $π$ if, for any two subsets $T = {t_{i}}_{i = 1}^{k} \subseteq N$ and $T^{'} = {t_{i}^{'}}_{i = 1}^{k} \subseteq N$ of equal size $k \in [N]$ , where $I (t_{1}) < \dots < I (t_{k})$ , $I (t_{1}^{'}) < \dots < I (t_{k}^{'})$ , and $I (t_{i}) \leq I (t_{i}^{'})$ for all $i \in [k]$ , the condition $π (T) \geq π (T^{'})$ holds. Note that a dominant indexing rule may not exist for a given demand function $π$ ; however, when it does, it must be an MCI (Theorem 3 in Li et al., 2024). Both the MCI rule and the dominant indexing rule rank products, but they differ in their underlying focus and definitions. The MCI rule orders products solely by their marginal choice probabilities, without considering customer preferences for different subsets. In contrast, a dominant indexing rule, if it exists, captures preference relationships among equally sized subsets, offering valuable insights for optimizing single-warehouse assortments to maximize order fulfillment rate.

However, the dominant indexing rule alone is insufficient to ensure the structure of the optimal assortments in multi-warehouse settings, even for just two warehouses, as we must determine which subset combinations minimize order splitting. As such, we introduce a new concept after exploring the bundle multivariate logit (BundleMVL) Model (Russell and Petersen, 2000; Tulabandhula et al., 2023), a multi-purchase choice model that captures substitution and complementarity effects among products. In a BundleMVL- $L$ model, customers can select up to $L$ items. The probability of choosing a subset $T$ with $| T | \leq L$ is given by $π (T) = V_{T} / (1 + \sum_{T^{'} \subseteq N, | T^{'} | \leq L} V_{T^{'}})$ , where $V_{T} = \exp (\sum_{n \in T} V_{n} + \sum_{n < n^{'} \in T} β_{n n^{'}})$ . Here, $V_{n}$ is the intrinsic utility of product $n$ , and $β_{n n^{'}} = β_{n^{'} n}$ denotes the interaction between products $n$ and $n^{'}$ , with $β_{n n} = 0$ for any $n$ . Note that $β_{n^{'} n}$ can be either negative or positive, indicating whether the pair of products are substitutes (negative) or complements (positive), respectively.

Example 1 (An Example of BundleMVL Model with Four Products)

Consider four products, indexed as $1$ , $2$ , $3$ , $4$ , with demand following a BundleMVL-2 model. The intrinsic utilities are $V_{1} = 4$ , $V_{2} = 3$ , $V_{3} = 2$ , and $V_{4} = 1$ . The interaction parameters are $β_{12} = 0.9$ , $β_{13} = 0.8$ , $β_{14} = 0.7$ , $β_{23} = 0.8$ , $β_{24} = 0.7$ , and $β_{34} = 0.7$ . The resulting probabilities are: $π ({1}) \approx 0.0122$ , $π ({2}) \approx 0.0045$ , $π ({3}) \approx 0.0017$ , $π ({4}) \approx 0.0006$ , $π ({1, 2}) \approx 0.6047$ , $π ({1, 3}) \approx 0.2013$ , $π ({1, 4}) \approx 0.0670$ , $π ({2, 3}) \approx 0.0740$ , $π ({2, 4}) \approx 0.0246$ , $π ({3, 4}) \approx 0.0091$ . Now, consider two warehouses, each with a capacity to store two SKUs ( $K_{1} = 2$ and $K_{2} = 2$ ). Among all feasible assortment selections, $({1, 3}, {2, 4})$ , $({1, 4}, {2, 3})$ , and $({1, 2}, {3, 4})$ , it is not hard to find that the optimal assortment selection is $S_{1} = {1, 2}$ and $S_{2} = {3, 4}$ (or vice versa, with $S_{1}$ and $S_{2}$ swapped), as it minimizes the likelihood of order splitting. Specifically, the highest-demand order ${1, 2}$ can be fulfilled entirely by a single warehouse in this selection. In contrast, the other two selections would split this order between the warehouses, leading to less efficient fulfillment.

Following Example 1, we observe that the marginal choice probabilities satisfy $ω_{1} \approx 0.8852 \geq ω_{2} \approx 0.7078 \geq ω_{3} \approx 0.2861 \geq ω_{4} \approx 0.1013$ , indicating that the current indexing rule $I$ aligns with the MCI rule. We can verify that this MCI is also a first-order dominant indexing rule. This implies that if there is only one warehouse, the assortment ${1, 2}$ is optimal to maximize the order fill rate when $K = 2$ , and ${1, 2, 3}$ is optimal when $K = 3$ .

However, in multi-warehouse settings, the task of identifying optimal assortments is more complex than simply prioritizing a single subset of products based on their individual popularity or demand. Specifically, it requires determining combinations of product subsets that minimize the likelihood of order splitting while ensuring that all products remain available across multiple warehouses to meet customer demand. From the above example, we observe that prioritizing popular customer orders and products can reduce order splitting. This motivates us to investigate demand distributions where such prioritization strategies can be leveraged effectively. While the dominant indexing rule provides insights into preferences among subsets of products, it does not account for the interactions between combinations of subsets, which are critical in multi-warehouse assortment selection. To address this limitation, we introduce the concept of the Second-Order Dominant Indexing Rule, which identifies a class of demand distributions where prioritizing the allocation of popular products leads to solutions that effectively minimize order splitting.

Definition 1 (Second-Order Dominant Indexing Rule w.r.t. Demand Function $π$ )

Consider a universal set $N = {1, 2, \dots, N}$ and a demand function $π$ . We call an indexing rule $I$ is second-order dominant w.r.t. demand function $π$ if, for any pair of subsets $T = {t_{i}}_{i = 1}^{k} \subseteq N$ and $T^{'} = {t_{i}^{'}}_{i = 1}^{k} \subseteq N$ of identical size $k \in [N]$ , satisfying $I (t_{1}) < \dots < I (t_{k})$ , $I (t_{1}^{'}) < \dots < I (t_{k}^{'})$ , and $I (t_{i}) \leq I (t_{i}^{'})$ for all $i \in [k]$ , the following two conditions are met: (i) $π (T) \geq π (T^{'})$ ; (ii) for any two products $a \notin (T \cup T^{'})$ and $b \notin (T \cup T^{'})$ where $I (a) < I (b)$ , it holds that $π (T \cup {a}) + π (T^{'} \cup {b}) \geq π (T \cup {b}) + π (T^{'} \cup {a}) .$

Condition (i) means that for two subsets of the same size, the subset containing smaller indexed products is more popular, which is exactly the definition of first-order dominant indexing. Condition (ii) further posits that adding one more product to each subset would result in a larger total market share if the more popular product is added to the more popular subset, rather than to the less popular subset. Referring to Example 1, this condition can be verified for subsets ${1}$ and ${2}$ with products $3$ and $4$ by showing that $π ({1, 3}) + π ({2, 4}) \geq π ({1, 4}) + π ({2, 3})$ . Similarly, it holds that $π ({1, 2}) + π ({3, 4}) \geq π ({1, 3}) + π ({2, 4})$ . Therefore, all combinations of $(T, T^{'}, a, b)$ satisfy condition (ii), confirming that the selection $({1, 2}, {3, 4})$ is indeed optimal. The second-order dominant indexing rule reflects a form of “supermodularity” in subset demand distributions. Consider any demand distribution within this class. Given that $π (T) \geq π (T^{'})$ and $π ({a}) \geq π ({b})$ , adding the more popular product $a$ to the more popular subset $T$ yields a greater marginal gain in choice probability (market share) than adding the less popular product $b$ to $T$ . This effect resembles principles of supermodularity and increasing differences (convexity), where higher popularity among subsets and elements amplifies their combined impact on demand. This property is instrumental in guiding our strategy of prioritizing popular products in assortment selection, especially in scenarios where additional storage capacity is available. The implications of this strategy are explored in detail in later sections. Since a second-order dominant indexing rule is necessarily a first-order dominant indexing rule, it must also be an MCI if it exists.

Proposition 3
For a given demand function $π$ , if an indexing rule $I$ is second-order dominant with respect to $π$ , then it must also be dominant with respect to $π$ and it must be an MCI.

In what follows, we present useful structural properties of demand functions characterized by first-order or second-order dominant indexing rules.
Proposition 4
Consider a given demand function $π$ and any pair of subsets $T = {t_{i}}_{i = 1}^{k} \subseteq N$ and $T^{'} = {t_{i}^{'}}_{i = 1}^{k} \subseteq N$ of identical size $k \in [N]$ , where $I (t_{1}) < \dots < I (t_{k})$ , $I (t_{1}^{'}) < \dots < I (t_{k}^{'})$ , and $I (t_{i}) \leq I (t_{i}^{'})$ for all $i \in [k]$ , it follows that
if $π$ has a dominant MCI, for any subset $R \subseteq N$ , where $R \cap (T \cup T^{'}) = \emptyset$ , then $π (T \cup R) \geq π (T^{'} \cup R)$ ;

if $π$ has a second-order dominant MCI, for any pair of subsets $R = {r_{i}}_{i = 1}^{l} \subseteq N$ and $R^{'} = {r_{i}^{'}}_{i = 1}^{l} \subseteq N$ , where $r_{j} \neq t_{i}, t_{i}^{'} \forall i \in [k], j \in [l]$ , $r_{j}^{'} \neq t_{i}, t_{i}^{'} \forall i \in [k], j \in [l]$ , $I (r_{1}) < \dots < I (r_{l})$ , $I (r_{1}^{'}) < \dots < I (r_{l}^{'})$ , and $I (r_{i}) \leq I (r_{i}^{'}) \forall i \in [l]$ , then $π (T \cup R) + π (T^{'} \cup R^{'}) \geq π (T \cup R^{'}) + π (T^{'} \cup R)$ .

This proposition essentially extends the conditions in the definitions of first-order and second-order dominant indexing rules to situations with several subsets, which in turn play a crucial role in helping us identify optimal assortment selections, as will be illustrated in the next section. While we use the BundleMVL model as an illustrative example, second-order dominant MCI rules are not identifiable for all cases within this model. Here, we present two classes of multi-purchase choice models for which second-order dominant MCI rules can be identified.
Proposition 5
The marginal choice indexing rule of the following two multi-purchase choice models are second-order dominant: (1)
Independent choice model (ICM) (Lin et al., 2025): Each product is selected independently with probability $p_{n}$ for product $n$ , and the probability of choosing $T \subseteq N$ is $\prod_{i \in T} p_{i} \prod_{j \in N ∖ T} (1 - p_{j})$ ;
(2)
Bundle multivariate logit (BundleMVL) model satisfying $β_{n i} \geq β_{n j} \forall n \in N ∖ {i, j}$ for any $i, j \in N$ with $V_{i} \geq V_{j}$ , and $β_{m a} + β_{m^{'} b} \geq β_{m^{'} a} + β_{m b} \forall m, m^{'} \in N ∖ {a, b}$ with $V_{m} \geq V_{m^{'}}$ for any $a, b \in N$ with $V_{a} \geq V_{b}$ .

Note that while an MCI is always second-order dominant for any ICM, this does not necessarily hold for the BundleMVL model. Only a specific subclass meets these conditions, with Example 1 being one such instance. Specifically, the first half of the conditions in Proposition 5 (2) ensures condition (i) in Definition 1 by preserving the ranking where lower-indexed products have higher purchase probabilities. The second half guarantees condition (ii) by ensuring that adding a more popular product to a more popular subset yields a greater demand increase than alternative pairings.

To simplify analysis, we denote a second-order dominant marginal choice indexing rule as 2-MCI and assume that if a demand function has a 2-MCI $I$ , products are indexed by default according to $I$ .
4. Optimal Multi-Warehouse Assortment Selection

In this section, we demonstrate how to derive the optimal warehouse assortment when demands are characterized by a 2-MCI. Starting with the general MSO problem without additional restrictions, we show that under specific conditions, popular products should be allocated to larger warehouses. To illustrate, we first introduce a key definition.

Definition 2
Given a demand function $π$ and a corresponding MCI $I$ , we say that an assortment $S_{1}$ is superior to another assortment $S_{2}$ with respect to $I$ , denoted by $S_{1} ⪰_{I} S_{2}$ , if and only if for $S_{2} = {i_{1}^{2}, \dots, i_{s_{2}}^{2}}$ , we have $S_{1} \supseteq {i_{1}^{1}, \dots, i_{s_{2}}^{1}}$ such that $I (i_{k}^{1}) \leq I (i_{k}^{2})$ for $k = 1, \dots, s_{2}$ .

Building on this definition, we can now present the following theorem that illustrates how we should allocate products that are not repeatedly stored in multiple warehouses.
Theorem 1
Given a demand distribution $π$ with an associated 2-MCI $I$ , consider any assortments $S_{d_{1}} ⪰_{I} S_{d_{2}}$ and products $a$ and $b$ with $I (a) < I (b)$ , each stored in only one warehouse. The assortments $S_{d_{1}} \cup {a}$ and $S_{d_{2}} \cup {b}$ lead to lower costs compared to $S_{d_{1}} \cup {b}$ and $S_{d_{2}} \cup {a}$ , assuming all other assortments remain unchanged.

Theorem 1 provides useful insights into the structure of the optimal assortment for the general MSO problem. Roughly speaking, when demand features a 2-MCI, products of similar popularity should be stored together, and the more popular products should be stored in larger warehouses. Specifically, Theorem 1 implies that if a larger warehouse stocks more popular products, it should also exclusively house some relatively popular products. Conversely, less popular products, intended to be stored in only one warehouse, should be allocated to smaller warehouses alongside similarly less popular items. Indeed, this theorem is particularly useful when the logistics network is almost established, with only a few products remaining, each awaiting allocation to a single warehouse.

However, Theorem 1 does not provide a straightforward method for determining the optimal selection. In the remainder of this section, we will explore three specific scenarios: settings with a non-overlapping assortment restriction, demands with nested structures, and networks with only two warehouses. These scenarios will allow us to directly derive the optimal decision.
4.1. Non-Overlapping Assortments

Recently, Zhu et al. (2021) consider the assortment allocation problem for multi-warehouse systems with the constraint that assortments for different warehouses are non-overlapping. While they propose a K-links heuristic clustering algorithm, the algorithm lacks theoretical guarantees. In contrast, we identify the optimal assortment selection under the same restriction when the demand has a 2-MCI.

By default, we let $S_{0} = \emptyset$ . In addition, we denote $m s o (S)$ as the minimum number of split orders given assortment $S$ . Now, we can present the following theorem.

Theorem 2
When the assortments are not allowed to overlap, if the demand distribution $π$ has a 2-MCI and $\sum_{d = 1}^{D} K_{d} = N$ , then the assortment allocation $S^{} = (S_{1}^{}, \dots, S_{D}^{})$ with $S_{d}^{} = {n \in N : 1 + \sum_{j = 1}^{d - 1} K_{j} \leq n \leq \sum_{j = 1}^{d} K_{j}} \forall d \in [D]$ is optimal for MSO, provided that the products are indexed based on the MCI and that warehouse capacities satisfy $K_{1} \geq K_{2} \geq \dots \geq K_{D}$ .

Theorem 2 implies that when assortments are not allowed to overlap, we can fully characterize the optimal warehouse assortment if the demand distribution has a 2-MCI. Under the optimal warehouse assortment, larger warehouses store more popular products, and products with similar popularity levels are stored in the same warehouse.

The insights derived from Theorems 1 and 2 lead us to a distinct characterization of the optimal product allocation for items stored exclusively in one location, as elaborated in the subsequent corollary.
Corollary 1
Given a 2-MCI $I$ with respect to $π$ , consider any assortments $S_{d_{1}}$ and $S_{d_{2}}$ where $S_{d_{1}} ⪰_{I} S_{d_{2}}$ , with each only containing products that are exclusively stored in one warehouse. Among the products in $S_{d_{1}} \cup S_{d_{2}}$ , optimally, warehouse $d_{1}$ should stock the most popular products, while warehouse $d_{2}$ should stock the least popular products.

This corollary delineates the optimal assortment for two warehouses with uniquely stored products, which can be easily extended to multiple warehouses under the same storage condition. Beyond Corollary 1, Theorem 2 also inspires an investigation into whether certain demand distributions inherently lead to non-overlapping optimal assortments without imposing it as a constraint. The answer is affirmative, and the next subsection explores one such class of distributions.
4.2. Nested Structure in Product Demand

Now, we explore the MSO problem when the demand distribution features some nested structures. We start with the most fundamental nested structures in product demand distribution.

Definition 3 Total Nested Structure

A demand distribution $π_{T N}$ exhibits a total nested structure if there exists an indexing rule $I$ such that only the choice probability of ${I^{- 1} (1), \dots, I^{- 1} (n)}$ for $n \in [N]$ can be positive.

Definition 3 allows for a natural hierarchy of products based on their demand, with the most popular product indexed by $1$ and the least by $N$ . This indexing rule is a special case of the MCI and corresponds to a nested structure of sets, with the indices reflecting the inclusion relationship among the sets. Assuming the products are indexed according to such an MCI, we have $π_{T N} ([n]) \geq 0 \forall n \in [N]$ , where $π_{T N} ([n])$ can be $0$ for some $n$ . Moreover, the marginal choice probability of product $n$ can be explicitly expressed as $ω_{n} = \sum_{i = n}^{N} π_{T N} ([i])$ , which is decreasing in $n$ . Specifically, $\sum_{i = 1}^{N} π_{T N} ([i]) = ω_{1}$ , which refers to the marginal choice probability of the most popular product, is less than or equal to $1$ . Therefore, a demand distribution with the total nested structure a reflects a market with a popular main product and less popular add-ons. This implies that products that are indexed consecutively have similar popularity levels. Building on the above discussion, it is not hard to verify that this MCI is indeed second-order dominant.

Proposition 6
If a demand distribution has the total nested structure, then the corresponding MCI is second-order dominant.

Given that the MCI of a total nested demand distribution is second-order dominant, Theorem 2 can be applied to find the optimal selection under a non-overlapping restriction. Moreover, in this special scenario, we can delve deeper. The following theorem characterizes the optimal warehouse assortment selection for MSO when the demand distribution exhibits a total nested structure.
Theorem 3
If the demand distribution $π_{T N}$ has a total nested structure, there exists an optimal assortment selection $S^{} = (S_{1}^{}, \dots, S_{D}^{})$ for MSO that is non-overlapping and contains consecutive products. Furthermore, the optimal assortment allocates the most popular products to larger warehouses. Specifically, assume that the products are indexed based on $π_{T N}$ and that warehouse capacities satisfy $K_{1} \geq K_{2} \geq \dots \geq K_{D}$ , the optimal assortment for warehouse $d \in [D]$ can be selected recursively as follows: $S_{d}^{} = {n \in N : min (\sum_{j = 1}^{d - 1} K_{j}, N) < n \leq min (\sum_{j = 1}^{d} K_{j}, N)} \forall d \in [D] .$ The expected number of split orders is $m s o_{T N} (S^{*}) = \sum_{d = 1}^{D} d \cdot (\sum_{n = min (1 + \sum_{j = 1}^{d - 1} K_{j}, N)}^{min (\sum_{j = 1}^{d} K_{j}, N)} π_{T N} ([n])) \cdot I (\sum_{j = 1}^{d - 1} K_{j} < N) .$

Theorem 3 indicates that larger warehouses should store more popular products, while smaller warehouses should house less popular ones. Within each warehouse, products should have similar popularity levels, with roughly equivalent selection probabilities. Furthermore, Theorem 3 provides a stronger result than Theorem 2 for MSO under totally nested demand distributions. Specifically, the given demand pattern ensures that allocating products repeatedly to the remaining capacity does not reduce order splitting, naturally leading to a non-overlapping optimal assortment.

Yet, these insights are specific to totally nested demand distributions. For a broader range of nested structures, although the corresponding MCI may not be second-order dominant, our observations from the totally nested patter can still help us identify structures of the optimal assortments. Next, we will examine two other demand functions showcasing more general nested structures.
Definition 4 (Nestable Structure)

A demand distribution $π_{N}$ has a nestable structure if the product set $N$ can be partitioned into disjoint subsets $P_{1}, \dots, P_{H}$ , such that $P_{i} \cap P_{j} = \emptyset$ for all $i, j \in [H]$ with $i \neq j$ and $\cup_{h \in [H]} P_{h} = N$ . Each customer order includes products only from a single subset $P_{h}$ . Within each subset $P_{h}$ , the restricted demand distribution $π_{P_{h}}$ exhibits a total nested structure.

It is worth noting that the total nested structure is a special case of a nestable structure with $H = 1$ . Whereas if $H > 1$ , a nestable structure in the demand function is no longer defined based on a collection of nested sets. For a demand distribution with a nestable structure, we divide the universal set into $H$ subsets $P_{1}, \dots, P_{h}$ . In each subset $P_{h}$ , the demand follows a total nested structure, and we rank the products according to the index of the total nested structure. Thus, a product is ranked higher than another product if the customer would always purchase this product when the other product is purchased. The nestable structure reflects a market with $H$ main products and some add-ons. The subsequent Example 2 provides an illustration of the nestable structure in customer demand.

Example 2 (Example of Nestable Structure in Demand)

Consider a universal set $N = {A, B, C, D}$ comprising four products. There are two main products $A$ and $C$ and one add-on $B$ for $A$ and one add-on $D$ for $C$ . We partition the universal set into two subsets $N = P_{1} \cup P_{2}$ , with $P_{1} = {A, C}$ and $P_{2} = {B, D}$ . Within $P_{1}$ , $A$ is indexed 1 and $C$ is indexed 2, and within $P_{2}$ , $B$ is indexed 1 and $D$ is indexed 2. The possible sets of products chosen by a customer are ${A}$ , ${A, C}$ , ${B}$ , and ${B, D}$ .

According to Theorem 3 and leveraging the relationship between the total nested structure and the nestable structure, we can further gain intuition on the optimal assortment selection for MSO if the demand distribution is nestable.

Corollary 2
If the demand distribution $π_{N}$ has a nestable structure, there exists an optimal assortment selection $S^{} = (S_{1}^{}, \dots, S_{D}^{})$ for MSO that is non-overlapping. For any two warehouses $d_{1}$ and $d_{2}$ , if $| S_{d_{1}}^{} \cap P_{h} | > | S_{d_{2}}^{} \cap P_{h} |$ , then all the products within $S_{d_{1}}^{} \cap P_{h}$ are ranked prior (i.e., have higher marginal choice probabilities) to those within $S_{d_{2}}^{} \cap P_{h}$ . Furthermore, for each $d \in [D]$ , $S_{d}^{} \cap P_{h}$ consists of products with consecutive indices for all $h = 1, \dots, H$ .

Corollary 2 provides an important insight into the structure of the optimal solution for MSO when the demand distribution is nestable. Once the number of products allocated from each subset $P_{h}$ to each warehouse has been determined, the strategy outlined in Theorem 3 can be directly applied. Building on this, we formulate a tailored MILP for solving MSO under a nestable demand structure, detailed in Section EC.1.1.

Beyond the total nested and nestable structures, certain settings exhibit more complex relationships. In particular, a product may serve as both an add-on for one product and a main product for others, resulting in a continuously nested demand structure with add-ons, as defined below.
Definition 5 (Nested-nested Structure)

A demand distribution $π_{N N}$ has a nested-nested structure if there exists an indexing rule such that all possible customer orders are in consecutive indices.

Example 3 provides an illustration of the nested-nested structure in customer demand.

Example 3 (Example of Nested-Nested Structure in Demand)

Consider a universal set $N = {A, B, C, D}$ has four products. We index them as $A = 1$ , $B = 2$ , $C = 3$ , and $D = 4$ . There are two main products $A$ and $B$ . Product $A$ has add-ons $B$ , $C$ , and $D$ , and product $B$ has add-ons $C$ and $D$ . Then, the collection of sets that a customer would choose under the nested-nested demand is ${A}$ , ${A, B}$ , ${A, B, C}$ , ${A, B, C, D}$ , ${B}$ , ${B, C}$ , ${B, C, D}$ .

It’s noteworthy that the total nested demand structure can also be seen as a special case of the nested-nested demand structure, which is not defined on a collection of nested sets. In the nested-nested demand structure, the indexing defines a sequence of product subsets $Q_{1}, \dots, Q_{H}$ , where each $Q_{h} = {q_{h}, \dots, N}$ follows a total nested structure in the demand distribution, with indices satisfying $1 = q_{1} < q_{2} < \dots < q_{H} \leq N$ . This structure reflects a market with ${q_{h}}_{h = 1}^{H}$ main products, and the products indexed after each main product are the corresponding add-ons. The following corollary shows that the optimal assortment at each warehouse must also contain consecutive products.

Corollary 3
If the demand distribution $π_{N N}$ has a nested-nested structure, there exists an optimal assortment selection $S^{} = (S_{1}^{}, \dots, S_{D}^{})$ for MSO such that for any $S_{d}^{}$ , $S_{d}^{} \cap {q_{h}, \dots, q_{h + 1} - 1}$ , $h = 1, \dots, H$ contains a sequence of consecutive products, where we define $q_{H + 1} - 1 := N$ .

Corollary 3 infers that an optimal assortment selection at warehouse $d$ should consist of no more than $H$ sequences of consecutive products. This insight broadens Theorem 3 by stating that the optimal assortment allocates similar products within the same warehouse.

Although perfectly nested structures are rare in practice, more flexible variations, such as nestable and nested-nested structures, are common. Even without strict nesting, high co-purchase likelihood with primary products is frequently observed. For example, in a university textbook series like Calculus I, Calculus II, and Calculus III*, purchasing the first textbook often leads to the purchase of subsequent ones to continue learning. Similarly, in the luggage industry, buying a large suitcase typically creates demand for related items like carry-ons or toiletry bags. In toy sales, the initial purchase of a Barbie doll often drives demand for compatible accessories. In home appliances, products like coffee machines or blenders frequently lead to the purchase of related items such as filters or extra blades. Lastly, in the furniture sector, IKEA illustrates how demand for core products like beds or sofas generates demand for complementary items like mattresses and cushions. These examples highlight the prevalence of nested demand structures in real-world e-commerce, where customer purchases are strongly influenced by primary products. Understanding these patterns is crucial for developing effective assortment selection strategies.
4.3. Two-Warehouse Case

According to previous discussions, when assortments are restricted to be non-overlapping, we are able to discern useful structures of the optimal selection. However, finding the optimal allocation for products that can be stored in multiple warehouses is much more challenging, as calculating the minimum number of split orders is generally NP-hard. Nonetheless, this problem becomes more manageable in scenarios involving only two warehouses. We will focus on this special case, referred to as MSO₂, in this subsection. Initially, we propose an extended marginal choice indexing policy, which achieves optimality when the demand has a 2-MCI. Following this, we explore the relationship between solving the MSO problem and the single-warehouse assortment selection (SWAS) problem, and propose an innovative iterative heuristic designed to improve any given two-warehouse assortment selection. Furthermore, we validate the effectiveness of these two proposed heuristics through extensive numerical experiments.

When there are only two warehouses, a product is either stored in only one warehouse or in both. Although addressing the MSO problem remains NP-hard, as indicated by Proposition 2, it is not necessary to solve MILP (1) to determine the optimal fulfillment strategy for each order. To clarify, for any given assortment across the two warehouses, if an order cannot be satisfied by either warehouse alone, it will be split into two suborders, with each warehouse fulfilling one. According to these observations and acknowledging that fulfilling each order requires at least one shipment for fulfillment, we can reformulate Problem (MSO) for two warehouses, denoted as (MSO₂), as follows:

\begin{aligned} (MSO_{2}) : min_{S_{1}, S_{2} \subseteq N} & \sum_{m \in [M]} π_{m} [I (T_{m} ⊈ S_{1}) \cdot I (T_{m} ⊈ S_{2})] \\ s.t. & | S_{d} | \leq K_{d}, & \forall d \in {1, 2}, \\ S_{1} \cup S_{2} = N . \end{aligned}

The objective of Problem (MSO₂) differs from that of Problem (MSO) due to the exclusion of the constant

\sum_{m \in [M]} π_{m}

. Essentially, the objective of (MSO) quantifies the expected number of suborders needed to fulfill a customer’s order, while the objective of (MSO₂) reflects the likelihood that an order is split for fulfillment. If

\sum_{m \in [M]} π_{m}

were to be added back, the objectives of (MSO₂) and (MSO) would be the same.

4.3.1. Extended Marginal Choice Indexing Policy

Revisit Example 1 and consider a scenario with two warehouses, where $K_{1} = 3$ and $K_{2} = 2$ . The optimal allocations in this case are $S_{1} = {1, 2, 3}$ and $S_{2} = {1, 4}$ . Under this allocation, only the less popular orders, ${2, 4}$ and ${3, 4}$ , require fulfillment from multiple warehouses. This allocation follows a simple yet structured policy: the top-ranked products (based on the marginal choice indexing) are first assigned to the larger warehouse, while the remaining products are allocated to the smaller warehouse. Any additional vacancies are filled with the highest-ranked products. We refer to this approach as the Extended MCI Policy for the MSO₂ problem. Specifically, ths policy designates $S_{1} = {1, \dots, K_{1} + K_{2} - N} \cup {K_{1} + K_{2} - N + 1, \dots, K_{1}} = {1, \dots, K_{1}}$ and $S_{2} = {K_{1} + 1, \dots, N} \cup {1, \dots, K_{1} + K_{2} - N}$ , where products are indexed according to an MCI and $K_{1} \geq K_{2}$ . Under this policy, products are partitioned into three categories: the most popular products are stored in both warehouses, moderately popular ones are stored exclusively in the larger warehouse, and the least popular ones are stored exclusively in the smaller warehouse. Figure 1 illustrates this policy using a Venn diagram.

Figure 1.

Venn diagram representation of the extended MCI policy.

It worth noting that the extended MCI Policy builds on the MCI policy introduced in Li et al. (2024), which was originally designed for single-warehouse assortment selection. The MCI policy selects products with the highest marginal choice probabilities for storage and achieves optimality under mild assumptions, demonstrating efficiency in practical applications. In extending the MCI policy to two-warehouse systems, the most popular products are first allocated to the larger warehouse, with the remaining products assigned to the smaller one to ensure all orders can be fulfilled. The spare capacity of the smaller warehouse is then supplemented with the most popular products, ensuring an efficient allocation. In both strategies, the marginal choice probabilities of products play a crucial role in allocation. This is supported by evidence that, for certain demand distributions, a corresponding MCI demonstrates compelling properties and helps to determine the optimal assortment selection for a single warehouse. We will show that under certain mild conditions, the extended MCI policy guarantees an optimal solution for the MSO₂ problem.

Theorem 4

Under Assumption 1, for a given demand function $π$ , if $π$ has a 2-MCI, then the corresponding extended MCI policy is optimal for the MSO₂ problem.

Theorem 4 highlights the critical role of a second-order dominant indexing rule for establishing the optimality of the extended MCI policy in solving MSO within two-warehouse systems. Additionally, this confirms Theorem 1 in the context of two-warehouse systems. As illustrated in Figure 1, to obtain a lower $m s o$ , products exclusively selected by the extended MCI policy to store in the larger warehouse indeed exhibit higher popularity compared to those exclusively allocated to the smaller warehouse.

Clearly, if the underlying demand distribution follows those presented in Proposition 5, the extended MCI policy achieves optimality. Although Theorem 4 provides us with critical insights into the optimality of the extended MCI policy for certain demand distributions, it is important to recognize that not all demand has a second-order dominant MCI. Consequently, the application of the extended MCI policy does not universally guarantee an optimal solution.

Despite that many distributions may not exhibit second-order dominant preferences among products’ demand, or such characteristics are only present within restrictive subclasses of the choice model, such as the multi-choice random utility model described in Example EC.1 (due to its complexity, this example is excluded from the main text; see Lin et al., 2025 for reference), we will demonstrate in later numerical experiments that the extended MCI policy yields near-optimal solutions. Moreover, in the next subsection, we will introduce the Iterative Improvement Heuristic, which can be applied to the extended MCI policy to help achieve even better performance.

While extending the MCI policy to a two-warehouse setting is intuitive, applying it to multi-warehouse settings presents significant challenges, especially when aiming to achieve similar optimality as stated in Theorem 4. The complexity arises from the need to account for products that may be stored in multiple warehouses with varying frequencies. Despite this, we offer one possible version in Section EC.5 for consideration, acknowledging that this variation may not always maintain such good property.

4.3.2. Connection to Single-Warehouse Assortment Selection Problem and the Iterative Improvement Heuristic

Tackling the multi-warehouse assortment selection problem is generally challenging because assortments across various warehouses need to be jointly optimized. Now, consider the instance in which the assortment for one of the two warehouses, denoted by $\tilde{S} \subseteq N$ , is predetermined. Under this circumstance, the problem becomes identifying the optimal assortment, $S \subseteq N$ , subject to a cardinality constraint $K < N$ , for storage in the other warehouse to minimize the number of split orders. In other words, the objective is to minimize the probability that this warehouse fails to completely fulfill an order, ensuring that all products not stored in the given warehouse are selected. Hence, this problem is formulated as follows:

\begin{aligned} min_{S \subseteq N} & \sum_{m \in [M]} π_{m} I (T_{m} ⊈ \tilde{S}) \cdot I (T_{m} ⊈ S) \\ s.t. & | S | \leq K, \\ \tilde{S} \cup S = N . \end{aligned}

(3)

Upon observing the objective function of Problem (3), it is evident that it weakly decreases as

| S |

increases (see Proposition EC.1 for details). Moreover, since

\sum_{m \in [M]} π_{m} I (T_{m} ⊈ \tilde{S}) \cdot I (T_{m} ⊈ S) = \sum_{m \in [M]} π_{m} I (T_{m} ⊈ \tilde{S}) - π_{m} I (T_{m} ⊈ \tilde{S}) I (T_{m} \subseteq S)

and

\sum_{m \in [M]} π_{m} I (T_{m} ⊈ \tilde{S})

is a constant, solving Problem (3) is equivalent to solving the following maximization problem.

\begin{aligned} max_{S \subseteq N, | S | = K} & \sum_{m \in [M]} π_{m} I (T_{m} ⊈ \tilde{S}) \cdot I (T_{m} \subseteq S) \\ s.t. & \tilde{S} \cup S = N . \end{aligned}

(4)

It is important to note that Problem (4) closely resembles the order fill rate maximization (OFRM) problem discussed in Li et al. (2024), which aims to select an assortment

S

for a single warehouse to maximize the order fill rate, subject to a cardinality constraint of

K

. The key difference is the constraint

\tilde{S} \cup S = N

, requiring

S

to include all products not in

\tilde{S}

, ensuring fulfillment for all products in the two-warehouse system. This reduces the problem to determining which products from

\tilde{S}

should fill the remaining capacity of

S

to maximize its order fill rate and minimize order splitting across the system.

Leveraging this insight, we present the following proposition, which establishes the equivalence between Problem (4) and an OFRM problem.

Proposition 7

In the two-warehouse assortment selection problem, if one of the assortments is fixed, then the problem reduces to an order fill rate maximization problem for a single warehouse.

To see this, we define ${\tilde{S}}^{c} = N ∖ \tilde{S}$ as the complement of $\tilde{S}$ in $N$ , and we let $S = {\tilde{S}}^{c} \cup S^{'}$ , where $S^{'}$ remains to be determined. Here, $S^{'}$ is constrained to $S^{'} \subseteq \tilde{S}$ with the condition that $| S^{'} | = K - | {\tilde{S}}^{c} |$ . Additionally, we define $π_{m}^{'} = π_{m} I (T_{m} ⊈ \tilde{S}) \forall m \in [M]$ and $T_{m}^{'} = T_{m} ∖ {\tilde{S}}^{c} \forall m \in [M]$ . Now, we can reformulate Problem (4) as

\begin{aligned} max_{S^{'} \subseteq \tilde{S}, | S^{'} | = K - | {\tilde{S}}^{c} |} & \sum_{m \in [M]} π_{m}^{'} \cdot I (T_{m}^{'} \subseteq S^{'}) . \end{aligned}

(5)

Note that Problem (5) shares exactly the same formulation as the OFRM problem, with

\tilde{S}

serving as the universal set and the cardinality constraint set to

K - | {\tilde{S}}^{c} |

. Hence, MSO₂ is closely connected with the OFRM problem in the single-warehouse assortment selection context. Upon observing Proposition 7, any methods developed for OFRM can be effectively applied to help solve the MSO₂ problem.

Subsequently, we will present a heuristic designed to improve any two-warehouse assortment selection strategy, which we refer to as the Iterative Improvement Heuristic (IIH). One of the most important elements in the IIH is to solve Problem (4) or (5). One can always update $\tilde{S}$ , $π_{m}$ , and $T_{m}$ in each step and apply techniques for OFRM to solve Problem (5). To our knowledge, the only existing methods that solve OFRM to optimality are the MILP formulations presented in Wu et al. (2019) and Li et al. (2024). Also, we can directly reformulate Problem (4) as an MILP for solving.

\begin{aligned} max_{x, y} & \sum_{m = 1}^{M} π_{m} I (T_{m} ⊈ \tilde{S}) y_{m}, \\ s.t. & \sum_{n = 1}^{N} x_{n} = K, \\ y_{m} \leq x_{n}, & \forall m \in [M], n \in T_{m}, \\ x_{n} = 1, & \forall n \in N ∖ \tilde{S}, \\ x_{n} \in {0, 1}, & \forall n \in \tilde{S}, \\ y_{m} \geq 0, & \forall m \in [M], \end{aligned}

(6)

where

x \in {0, 1}^{N}

is the binary decision variable with

x_{n} = 1

if product

n

is included in the assortment

S

and

0

otherwise, and

y \in {0, 1}^{M}

is the binary decision variable with

y_{m} = 1

if order

m

can be fulfilled by

S

, that is,

T_{m} \subseteq S

and

0

otherwise. Indeed, Problem (6) is similar to the MILP for solving OFRM, with two notable modifications. First, the inclusion of a third constraint that guarantees all products can be fulfilled. Second, we adjust the term

π_{m}

π_{m} I (T_{m} ⊈ \tilde{S})

within the objective function, since orders that can already be fulfilled by the predetermined warehouse contribute no additional benefit to the overall efficiency of the two-warehouse system. In fact, by our numerical experiments, there is no notable difference in computational efficiency if we apply MILP formulations to optimally solve Problem (4) or (5). To simplify notation, we let

m s o_{2} (S_{1}, S_{2}) = \sum_{m \in [M]} π_{m} [I (T_{m} ⊈ S_{1}) \cdot I (T_{m} ⊈ S_{2})]

, and hereby introduce the Iterative Improvement Heuristic, presented in Algorithm 1. We have several remarks regarding the IIH. First, it can be readily verified that the IIH will converge after an adequate number of iterations.

Proposition 8

The Iterative Improvement Heuristic incrementally enhances the current two-warehouse assortment selection at each step and converges as the number of iterations becomes sufficiently large.

Second, it may not be necessary to calculate the objective value $m s o_{2}$ at each step; instead, one could check if a new selection is obtained until no further improved assortment is found, or until reaching the maximum number of iterations. Third, while solving Problem (6) is theoretically challenging, our numerical experiments indicate that commercial solvers, such as Gurobi, can resolve it within 3 seconds for almost all instances (even for real-world industrial size). Furthermore, we posit that should new techniques be developed for efficiently solving the OFRM problem, they could be implemented with minor modifications to enhance the efficiency of the IIH. Last but not least, while the IIH does not guarantee an optimal solution for MSO₂, it offers a straightforward approach to improve upon any current warehouse assortment selection.

Although two-warehouse assortment selection closely relates to the order fill rate maximization problem, it is worth noting that such an observation cannot be easily found in general multi-warehouse settings. This is because, in a two-warehouse system, an order will be split into two suborders if neither warehouse can fulfill it independently. In this context, if an assortment is fixed, improving the order fill rate of another warehouse can enhance the efficiency of the whole system. However, this is not necessarily the case when it comes to more than two warehouses, since the improvement of the order fill rate for one warehouse does not necessarily lower the chance that an order be split into two or more suborders. What’s worse, even checking whether a new selection reduces the number of split orders is NP-hard.

4.3.3. Numerical Experiment

In this subsection, we numerically evaluate the performance of the extended MCI policy and the Iterative Improvement Heuristic by applying them to a two-warehouse assortment selection scenario aimed at minimizing split orders.

For our comparative benchmark, we choose the Bestsellers heuristic, originally proposed in Catalán and Fisher (2012). Among the four greedy-based heuristics explored in their study, this heuristic has been shown to perform the best. The Bestsellers heuristic initially assigns the top $B = \frac{\sum_{d = 1}^{D} K_{d} - N}{D - 1}$ best-selling products to every warehouse. Subsequently, the remaining products are evaluated in descending order of sales, with each product allocated to the warehouse where the average co-purchase probability with the already allocated products is highest. This probability is calculated using the co-purchase matrix $C = {c_{i j}}_{N \times N}$ , where $c_{i j}$ represents the probability that products $i$ and $j$ are included in the same order, and $c_{i i}$ represents the probability of product $i$ appearing in an order. Given that assortment non-overlapping is not the primary focus of our study, we refrain from making comparisons with the K-links heuristic clustering algorithm as proposed in Zhu et al. (2021).

In the following, we conduct two sets of synthetic experiments. The first is based on the independent choice model (ICM), where the choice of each product is independent of others (Lin et al., 2025). The second simulates the multi-purchase multinomial logit (MP-MNL) Model, which captures interactions among multiple products within a single purchase (Bai et al., 2023). Across each experimental set, we conduct trials involving varying numbers of products and diverse warehouse capacities. Specifically, we examine cases with $6$ , $8$ , and $10$ products. For each specified number of products, we generate 1,000 sub-cases, each corresponding to a unique demand distribution, and calculate the MSO under different policies and warehouse capacity constraints. Note that customers may choose not to purchase in certain scenarios, which can result in $m s o$ values of less than 1. To evaluate the scalability of the proposed methods, we also conduct experiments with larger product sets. In these experiments, the total warehouse capacity is set to 140% of the number of products, with the capacities of the two warehouses allocated in a $6 : 4$ ratio.

4.3.4. Experiments Based on the Independent Choice Model

In this section, we simulate customer choices based on the Independent Choice Model to conduct experiments and analyze outcomes. The ICM assumes that the selection of each product is independent of others, leading to the choice probability of any subset $T \subseteq N$ being represented as $π_{I C M} (T) = \prod_{n \in T} p_{n} \prod_{n^{'} \in N ∖ T} (1 - p_{n^{'}})$ . Here, $p_{n}$ refers to the inherent choice probability of product $n$ , which also aligns with the marginal choice probability of the same product.

As per the ICM’s definition, once individual product choice probabilities are identified, we can accurately calculate the choice probability for all subsets within the universal set. Accordingly, we randomly sample $1, 000$ sub-cases, compute the demand (choice probability) for each, and compare the results of various policies. As established in Theorem 4, the extended MCI policy achieves optimality under the ICM. In this scenario, when initialized with the selection provided by the extended MCI policy, the IIH converges to the optimal solution in a single iteration, as expected. To provide a more robust evaluation, we implement the IIH with an arbitrary initial assortment for the larger warehouse and report the average performance across 100 randomly selected samples in our experiments.

The results of the comparisons are summarized in Table 1, with visualizations for each sub-case shown in Figures 2 to 7 (additional details are provided in Section EC.4.1). In the table, the “opt ratio” represents the optimality ratio, calculated as one minus the relative difference in $m s o$ compared to the optimal $m s o$ .

Figure 2.

ICM with 6 items (1).

Figure 3.

ICM with 8 items (1).

Figure 4.

ICM with 10 items (1).

Figure 5.

ICM with 6 items (2).

Figure 6.

ICM with 8 items (2).

Figure 7.

ICM with 10 items (2).

Table 1.

Comparison of different policies under the ICM simulations.

#	Num of		Mean	Mean	Mean Avg	Mean	Mean eMCI	Mean IIH	Mean BS
Sub-Case	Products	Capacity	OPT mso	eMCI mso	IIH mso	BS mso	opt ratio (%)	opt ratio (%)	opt ratio (%)
1	6	[5,2]	1.1237	1.1237	1.4105	1.5065	100.00	74.48	65.93
		[5,3]	1.1099	1.1099	1.2992	1.3268	100.00	82.94	80.45
		[5,4]	1.0830	1.0830	1.1612	1.1622	100.00	92.78	92.69
2	6	[5,2]	1.1187	1.1187	1.4028	1.5033	100.00	74.60	65.62
		[4,3]	1.3256	1.3256	1.5421	1.5680	100.00	83.67	81.71
3	8	[7,3]	1.0979	1.0979	1.4002	1.4970	100.00	72.47	63.66
		[7,4]	1.0903	1.0903	1.3114	1.3496	100.00	79.72	76.22
		[7,5]	1.0764	1.0764	1.2023	1.2069	100.00	88.30	87.88
		[7,6]	1.0551	1.0551	1.0994	1.0972	100.00	95.80	96.01
		[7,7]	1.0278	1.0278	1.0278	1.0278	100.00	100.00	100.00
4	8	[7,2]	1.1063	1.1063	1.4637	1.6374	100.00	67.70	52.00
		[6,3]	1.2882	1.2882	1.6770	1.7451	100.00	69.82	64.54
		[5,4]	1.4946	1.4946	1.7449	1.7657	100.00	83.25	81.86
5	10	[7,4]	1.4589	1.4589	1.8418	1.8798	100.00	73.76	71.16
		[7,5]	1.4521	1.4521	1.7644	1.7872	100.00	78.49	76.92
		[7,6]	1.4342	1.4342	1.6370	1.6574	100.00	85.86	84.44
		[7,7]	1.3954	1.3954	1.4587	1.5055	100.00	95.46	92.11
6	10	[9,2]	1.0867	1.0867	1.4828	1.7147	100.00	63.55	42.21
		[8,3]	1.2441	1.2441	1.7225	1.8406	100.00	61.55	52.05
		[7,4]	1.4349	1.4349	1.8316	1.8731	100.00	72.36	69.47
		[6,5]	1.6180	1.6180	1.8639	1.8808	100.00	84.80	83.76

These results validate Theorem 4, demonstrating that the extended MCI policy achieves optimality when the demand distribution follows the ICM. As illustrated in Figures 2 to 7, the OPT and eMCI lines overlap, consistent with the theoretical guarantee provided by the theorem. In general, the extended MCI policy outperforms the IIH when starting with a randomly assigned initial assortment, both proposed heuristics surpass the performance of the Bestsellers heuristic. Surprisingly, although the IIH is intended to enhance any existing assortment selection without guaranteed performance, it exceeds the benchmark in all instances, no matter what initial assortment is selected. Furthermore, when the assortment of one warehouse is fixed, an increase in the capacity of the other warehouse results in a reduction of the total $m s o$ . Moreover, we observe that keeping the total capacity of the two-warehouse system constant, an equitable distribution of the product does not aid in eliminating order splitting. In contrast, a system consisting of a relatively larger warehouse complemented by a smaller one can achieve a lower $m s o$ . This observation is intuitive; a larger warehouse, by storing a broader variety of products, can significantly reduce the number of split orders. Meanwhile, the smaller warehouse accommodates those less popular products.

Furthermore, we extend our experiments to scenarios involving a larger number of products. As the number of products increases, the exponential growth in possible customer orders ( $2^{N}$ ) renders exact choice probability calculations infeasible. To address this, we simulate 10,000 customer choices based on the ICM model and conduct 10 repetitions for each product count, averaging the performance of various policies to ensure robustness. The results are summarized in Table 2. It is important to note that the IIH in these experiments is initialized with the solution provided by the extended MCI policy. The “MILP” results refer to the exact optimal solutions obtained by solving MILP (2). However, we observe that even for cases with 20 products, solving the MILP using Gurobi v12.0 requires more than one hour. Therefore, we report the best feasible solution obtained within the one-hour threshold, marking computation times with an asterisk (*) if the solver exceeds this limit. As shown in Table 2, the proposed extended MCI policy and IIH consistently outperform the Bestseller heuristic. Remarkably, both methods also outperform the 1-hour best feasible solution obtained from solving the exact MILP in the majority of cases. However, it is important to note that in practice, product demands are not independent in general. Hence, we proceed to conduct experiments based on the MP-MNL model, which captures product interrelationships within the framework of random utility theory.

4.3.5. Experiments Based on the Multi-Purchase Multinomial Logit Model

In the following experiments, we simulate customer choices using the Multi-Purchase Multinomial Logit model. In this model, customers have a random intended purchase quantity (IPQ), denoted as $Q$ , which ranges from $0$ to $N$ . Each product $n \in N^{+}$ is assigned a utility $U_{n}$ , which is composed of a deterministic component $V_{n}$ and an idiosyncratic noise $ϵ_{n}$ , such that $U_{n} = V_{n} + ϵ_{n}$ for all $n \in N^{+}$ . In the MP-MNL model, $ϵ = [ϵ_{0}, ϵ_{1}, \dots, ϵ_{N}]^{⊤}$ is assumed to follow the i.i.d. Gumbel distribution.

In our experimental design, we independently generate each product’s deterministic utilities from a standard normal distribution for every repeated sub-case. The utility of the outside option is uniformly set to zero. We presume customers’ IPQ as a discrete uniform distribution over

Q = {0, 1, \dots, ⌊ N / 2 ⌋}

, which implies that customers intend to purchase

q \in Q

products with equal probability. This setup, based on the observation that customers rarely choose the entire product set, simplifies the model. Each sub-case incorporates the sampling of

10, 000

i.i.d. standard Gumbel distributed noise elements. Customers then either select the maximum of

Q

products with the highest utility, provided their utility surpasses that of the outside option, or only opt for those products whose utility exceeds the outside option. This decision-making process is repeated for each sample.

Table 2.
Comparison of different policies under the ICM model for larger product sets.

	Single	Multi		Mean	Mean	Mean	Mean	Mean	Mean	Mean	Mean	Relative
Num of	Purchase	Purchase		eMCI	IIH	IIH	Num of IIH	BS	BS	MILP	MILP	eMCI Impro.
Products	Prob.	Prob.	Capacity	mso	mso	Time (s)	Iterations	mso	Time (s)	mso	Time (s)	over BS (%)
10	0.92	99.08	[8, 6]	1.1818	1.1817	1.38	1.0	1.4693	0.01	1.1817	9.25	19.57
20	0.0	100.0	[16, 12]	1.2901	1.2900	54.58	1.0	1.7307	0.35	1.4454	*	25.46
30	0.0	100.0	[25, 17]	1.5109	1.5104	319.67	1.2	1.9247	1.20	1.7022	*	21.50
50	0.0	100.0	[42, 28]	1.5136	1.5133	571.56	1.0	1.9841	3.02	1.7912	*	23.71
100	0.0	100.0	[84, 56]	1.7738	1.7735	1805.61	1.2	1.9998	10.86	1.9492	*	11.30

It can be verified that the MP-MNL model does not meet the sufficient conditions for the extended MCI policy to achieve optimality. In light of this, we apply the IIH to the selection obtained by the extended MCI policy, specifically by setting the initial assortment for warehouse $1$ to include the most popular $K_{1}$ products, and we report the corresponding results of this approach in this subsection (these differ from those in the previous experiments based on the ICM).

The results of the comparisons are summarized in Table 3, with accompanying visualizations for each sub-case depicted in Figures 8 to 13 (more details can be found in Section EC.4.2). Notably, despite the optimal condition from Theorem 4 not being met in this scenario, the performance of the MCI policy still approaches optimality. The strong performance of the extended MCI policy may be attributed to the presence of a first-order dominant MCI for the MP-MNL model, as established in Corollary 2 of Li et al. (2024). Although the MCI of MP-MNL models does not fully satisfy condition (ii) in Definition 1, our numerical tests show that fewer than 5% of all valid tuples $(T, T^{'}, a, b)$ violate this condition (details can be found in Section EC.4.2.1). Consequently, the extended MCI policy achieves an optimality ratio exceeding $99.95 %$ across all sub-cases. It is worth noting that the MP-MNL model is a specific instance of the broader class of Multi-Choice Random Utility Models (Lin et al., 2025). These broader demand distributions includes a more restrictive subclass where an MCI is proven be second-order dominant (details are provided in Example EC.1). However, the MP-MNL model does not generally fall within this subclass. Furthermore, the IIH can enhance the assortment generated by the extended MCI policy, thereby achieving an optimality ratio that is nearly $100 %$ . Both proposed heuristics substantially surpass the performance of the Bestsellers heuristic. Also, as stated in the previous subsection, we find that when the assortment of one warehouse is fixed, increasing the capacity of the other reduces the total number of split orders. Additionally, when the total capacity of the two-warehouse system is constant, a configuration with a larger warehouse complemented by a smaller one performs better than a system with two similarly-sized warehouses.

Figure 8.

MP-MNL with 6 items (1).

Figure 9.

MP-MNL with 8 items (1).

Figure 10.

MP-MNL with 10 items (1).

Figure 11.

MP-MNL with 6 items (2).

Figure 12.

MP-MNL with 8 items (2).

Figure 13.

MP-MNL with 10 items (2).

Table 3.

Comparison of different policies under the MP-MNL model simulations.

#	Num of		Mean	Mean	Mean	Mean	Mean eMCI	Mean IIH	Mean BS
Sub-Case	Products	Capacity	OPT mso	eMCI mso	IIH mso	BS mso	opt ratio (%)	opt ratio (%)	opt ratio (%)
1	6	[5,2]	0.7027	0.7027	0.7027	0.7842	99.99	100.00	88.39
		[5,3]	0.6898	0.6899	0.6899	0.7175	99.99	100.00	95.99
		[5,4]	0.6798	0.6800	0.6799	0.6866	99.98	100.00	99.00
2	6	[5,2]	0.7051	0.7052	0.7051	0.7891	99.99	100.00	88.09
		[4,3]	0.7562	0.7563	0.7563	0.8129	99.98	99.99	92.50
3	8	[7,3]	0.7692	0.7693	0.7692	0.8575	99.98	100.00	88.51
		[7,4]	0.7608	0.7610	0.7609	0.8039	99.98	99.99	94.34
		[7,5]	0.7528	0.7530	0.7528	0.7701	99.98	99.99	97.70
		[7,6]	0.7456	0.7458	0.7457	0.7500	99.98	99.99	99.41
		[7,7]	0.7395	0.7396	0.7395	0.7396	99.99	100.00	99.99
4	8	[7,2]	0.7739	0.7740	0.7739	0.9225	99.99	100.00	80.79
		[6,3]	0.8329	0.8330	0.8330	0.9774	99.99	100.00	82.66
		[5,4]	0.9011	0.9012	0.9012	0.9926	99.98	99.99	89.84
5	10	[7,4]	0.9462	0.9463	0.9462	1.1278	99.99	100.00	80.81
		[7,5]	0.9294	0.9296	0.9295	1.0412	99.98	99.99	87.96
		[7,6]	0.9061	0.9063	0.9062	0.9608	99.97	99.99	93.97
		[7,7]	0.8756	0.8760	0.8759	0.8928	99.96	99.97	98.04
6	10	[9,2]	0.8181	0.8182	0.8181	1.0207	99.99	100.00	75.23
		[8,3]	0.8768	0.8769	0.8769	1.0984	99.99	100.00	74.73
		[7,4]	0.9462	0.9463	0.9463	1.1264	99.99	99.99	80.96
		[6,5]	1.0190	1.0191	1.0191	1.1355	99.98	99.99	88.57

For all examined scenarios, the two proposed heuristics greatly outperform the Bestsellers heuristic. Importantly, even when the demand distribution fails to meet optimal conditions, the extended MCI policy still provides near-optimal solutions. Moreover, further application of the IIH can enhance its performance and reduce split orders.

Similar to the experiments conducted under the ICM, we extend our analysis to scenarios involving a larger number of products using the MP-MNL model. In these experiments, we assume that a customer’s IPQ can take values up to 5, each with equal probability. This assumption imposes an upper bound on the number of potential customer orders, allowing us to evaluate cases with a substantially larger number of products, up to 1,000. The results are presented in Table 4. We observe that the proposed extended MCI policy and the IIH consistently outperform the Bestseller heuristic across all tested cases. Specifically, the extended MCI policy exceeds the performance of the Bestseller heuristic by at least 4% in all instances. When comparing these results to those in Table 2, a notable difference emerges in Table 4. Unlike the ICM, where customers can purchase up to $N$ products, the experiments based on the MP-MNL model impose a realistic restriction on customer order sizes, reflecting more practical scenarios. Under these conditions, the improvements achieved by applying the IIH become significantly more pronounced, shifting from marginal gains to substantial enhancements. Furthermore, while the extended MCI policy may not always match the performance of the one-hour best feasible solution obtained by solving MILP (2), applying the IIH significantly improves outcomes. With fewer than 10 iterations and a total computation time of less than one minute, the IIH produces solutions that surpass the one-hour best feasible solution obtained from the MILP solver in cases with larger number of products. These results underscore the efficiency and robustness of the proposed methods, especially in scenarios involving larger product sets and more realistic purchase behaviors.

Besides synthetic experiments, we also conduct numerical experiments using real-world data from RiRiShun Logistics to further validate the effectiveness and scalability of the proposed methods. Due to space limitations, the details are provided in Section EC.4.3.

In conclusion, the proposed Extended MCI policy and Iterative Improvement Heuristic (IIH) exhibit both efficiency and scalability across various demand structures, including independent product choices and cases with product interdependencies. Despite its computational cost, the MILP formulation (2) remains a valuable tool, particularly when warehouse assortment decisions are not time-sensitive and demand patterns are uncertain, as it provides an exact optimal solution. In contrast, the two proposed algorithms offer efficient and practical alternatives, especially when specific demand patterns can be identified, providing high-quality solutions in a timely manner.

Table 4.

Comparison of different policies under the MP-MNL model for larger product sets.

	Single	Multi		Mean	Mean	Mean	Mean	Mean	Mean	Mean	Mean	Relative
Num of	Purchase	Purchase		eMCI	IIH	IIH	Num of IIH	BS	BS	MILP	MILP	eMCI Impro.
Products	Prob.	Prob.	Capacity	mso	mso	Time (s)	Iterations	mso	Time (s)	mso	Time (s)	over BS (%)
10	2.43	97.57	[8, 6]	1.0821	1.0819	1.24	1.0	1.1493	0.01	1.0819	53.32	5.85
20	0.75	99.25	[16, 12]	1.0631	1.0627	3.45	1.0	1.1244	0.04	1.0622	*	5.45
30	0.66	99.34	[25, 17]	1.0569	1.0563	6.57	1.0	1.1393	0.06	1.0560	*	7.23
50	0.87	99.13	[42, 28]	1.0456	1.0445	6.57	1.0	1.1171	0.08	1.0443	*	6.4
100	1.49	98.51	[84, 56]	1.0434	1.0414	8.15	1.0	1.1013	0.09	1.0407	*	5.26
200	2.62	97.38	[168, 112]	1.0394	1.0358	8.88	1.4	1.0909	0.09	1.0369	*	4.72
300	3.53	96.47	[252, 168]	1.0385	1.0317	13.76	2.0	1.0905	0.12	1.0377	*	4.77
500	5.13	94.87	[420, 280]	1.0391	1.0288	26.71	4.2	1.0890	0.16	1.0359	*	4.58
1000	7.53	92.47	[840, 560]	1.0316	1.0133	43.98	6.8	1.0731	0.29	1.0156	*	3.87

5. Multi-Warehouse Assortment Selection With Back-End Support

Until now, we have discussed the multi-warehouse assortment selection problem to minimize order splitting under Assumption 1, which requires all products to be stored within a single-tier logistics network. However, in practical scenarios, many e-commerce logistics companies, such as JD.com and Ririshun Logistics, actually operate multi-tier logistics networks. In multi-tier logistics networks, back-end warehouses typically feature larger capacities and offer essential support to front-end warehouses. Apparently, warehouse assortment selection becomes more complicated in this context. To distinguish this situation, we refer to it as multi-tier multi-warehouse assortment selection (MMWAS), in contrast to the previously discussed single-tier multi-warehouse assortment selection (MWAS).

It is noteworthy that if every front-end warehouse can receive support from any back-end warehouse, determining assortment selections for back-end warehouses simplifies to the MWAS problem. Also, if each front-end warehouse has its own service region without any overlap, determining the assortment selection for a single front-end warehouse reduces to the single-warehouse assortment selection discussed in Li et al. (2024). Conversely, without these conditions, new challenges such as overlapping service regions, evaluating spillover fulfillment strategies (i.e., fulfilling a customer order directly from a back-end warehouse rather than from front-end warehouses), and the joint allocation of products within the multi-tier system must be considered to address the MMWAS problem.

Building on our findings from MWAS, we extend our analysis to the MMWAS problem. In this section, we focus on a two-tier system with a single back-end warehouse storing all product varieties, aiming to determine the assortment selections for multiple front-end warehouses. For simplicity, we retain the previously used notation.

To avoid triviality, we posit that the fulfillment cost incurred by the back-end warehouse exceeds that of any front-end warehouse. This assumption guarantees that relying solely on the back-end warehouse for order fulfillment is not always the optimal strategy. However, if the spillover fulfillment cost from the back-end warehouse is extremely high, all products will be forced to be stored within front-end warehouses, reducing the problem to MWAS. Thus, we make the following assumption:

Assumption 2
While fulfilling an order, $T \subseteq N$ , from the back-end warehouse incurs higher costs than from a front-end warehouse, the cost of splitting an order and fulfilling multiple suborders from different front-end warehouses exceeds the cost of fulfilling the entire order via the back-end warehouse.

Assumption 2 is based on the observation that the strategic placement of front-end warehouses closer to customers leads to reduced transportation expenses and quicker delivery times, thereby making order fulfillment from these locations typically more cost-efficient than from the back-end warehouse. However, splitting orders into multiple suborders incurs additional handling, transport, and operational costs, outweighing the singular higher cost of spillover fulfillment. This assumption highlights the trade-offs between warehousing expenses, fulfillment efficiency, and customer satisfaction. Notably, this setting can also be regarded as if no single front-end warehouse can fully fulfill an order, then the order is considered a lost sale. Additionally, the requirement that $\cup_{d \in [D]} S_{d} = N$ can be relaxed in this context due to the support offered by the back-end warehouse.

Under Assumption 2, we have the following two observations: $(i)$ any non-empty order $T \subseteq N$ requires one shipment for fulfillment; $(i i)$ minimizing the fulfillment cost of the entire logistics network is equivalent to minimizing the total probability that an order must be fulfilled by the back-end warehouse. Now, we can formulate the problem and refer to it as (MSO-S), with “S” indicating the existence of back-end support:
$\begin{aligned} (MSO-S) : min_{S} & \sum_{m \in [M]} π_{m} [\prod_{d \in [D]} I (T_{m} ⊈ S_{d})] \\ s.t. & | S_{d} | \leq K_{d}, & \forall d \in [D], \\ S_{d} \subseteq N, & \forall d \in [D] . \end{aligned}$
It is important to emphasize that the objective of (MSO-S) reflects the proportion of orders that must be fulfilled by the back-end warehouse. Notice that the MSO-S problem significantly differs from the general MSO problem, primarily because complex fulfillment strategies do not need to be considered, similar to the situation with the MSO₂ problem. Specifically, if an order cannot be fulfilled by any of the front-end warehouses, it will be fulfilled by the back-end warehouse. Similar to our discussion in Section 4.3.2, it is easy to find that the objective of (MSO-S) weakly decreases as the capacity of any warehouse increases (see Proposition EC.2 for details).
5.1. Non-Overlapping Assortment Selection in Front-End Warehouses

To solve the MSO-S problem, if assortment overlapping is not allowed among front-end warehouses, the following theorem characterizes an optimal warehouse assortment allocation when a second-order dominant indexing rule exists for the demand function $π$ .

Theorem 5
Under Assumption 2, if the demand distribution $π$ has a 2-MCI and the assortments of different front-end warehouses are required to be non-overlapping, then the assortment allocation given in Theorem 3 is optimal (the least popular $N - \sum_{d = 1}^{D} K_{d}$ products are discarded if $N > \sum_{d = 1}^{D} K_{d}$ ) for MSO-S, provided that the products are indexed based on the MCI and $K_{1} \geq K_{2} \geq \dots \geq K_{D}$ .

Similar to Theorem 2, the optimal assortment selection at the front-end warehouses also involves storing more popular products in larger warehouses, with products of similar popularity levels being stored together in the same warehouse. If the front-end warehouses cannot handle all the products, then the least popular $N - \sum_{d = 1}^{D} K_{d}$ will only be stored in the back-end warehouse.
5.2. Overlapping Assortment Selection in Front-End Warehouses

To address MSO-S without the restriction on non-overlapping assortments among front-end warehouses, we begin with the simplest scenario involving just two front-end warehouses, denoted as MSO₂-S. By comparing the MSO₂-S problem with the MSO₂ problem, we establish the following proposition about the optimal solution structure for MSO₂-S.

Proposition 9
Under Assumption 2, if the demand distribution $π$ has a 2-MCI, then the optimal policy for (MSO₂-S) has the following structure $S_{1}^{} = {1, 2, \dots, K_{1}}$ and $S_{2}^{} = {1, 2, \dots, n^{'}, K_{1} + 1, \dots, K_{1} + K_{2} - n^{'}}$ for some $n^{'} \leq K_{2} - 1$ , provided that the products are indexed based on the MCI and $K_{1} \geq K_{2}$ .

Although Proposition 9 does not directly provide an optimal policy for solving the MSO₂-S problem when the demand distribution has a second-order dominant MCI, the insights it provides are immensely valuable. Given that the value of $n^{'}$ is limited to $K_{2} - 1$ , this leads to a polynomial-time solvable algorithm that explores $K_{2} - 1$ potential $S_{2}$ selections to find out the optimal solution, as is displayed in Algorithm 2.

For general cases involving multiple front-end warehouses, it is intuitive to come up with a greedy algorithm that reduces the chance of fulfilling an arbitrary order from the back-end warehouse step by step. Consider any given set of assortments $(S_{1}, \dots, S_{d})$ for some $d \in N_{+ +}$ . We define $M^{C} (S_{1}, \dots, S_{d}) = {m \in [M] : \prod_{i = 1}^{d} I (T_{m} ⊈ S_{i}) = 1}$ as the collection of order types that cannot be fulfilled by the system of front-end warehouses $(S_{1}, \dots, S_{d})$ . Then, to select the assortment for an additional front-end warehouse with capacity $K$ that minimizes the probability of fulfilling any order in $M^{C} (S_{1}, \dots, S_{d})$ by the back-end warehouse, we can solve the following MILP:
$\begin{aligned} max_{x, y} & \sum_{m \in M^{C}} π_{m} y_{m}, \\ s.t. & \sum_{n = 1}^{N} x_{n} \leq K, \\ y_{m} \leq x_{n}, & \forall m \in M^{C}, n \in T_{m}, \\ x_{n} \in {0, 1}, & \forall n \in N, \\ y_{m} \geq 0, & \forall m \in M^{C}, \end{aligned}$
(7)
where $M^{C} = M^{C} (S_{1}, \dots, S_{d})$ , $x \in {0, 1}^{N}$ is the binary decision variable with $x_{n} = 1$ if product $n$ is included in the assortment $S$ and $0$ otherwise, and $y \in {0, 1}^{| M^{C} |}$ is the binary decision variable with $y_{m} = 1$ if order $m$ can be fulfilled by $S$ , that is, $T_{m} \subseteq S$ and $0$ otherwise. Now, we present the greedy algorithm, Algorithm 3, to solve MSO-S. This approach greedily selects assortments starting from the largest warehouse and progressing to the smallest, as a larger capacity allows a warehouse to fully fulfill more orders. While the assortments are not jointly optimized, this greedy algorithm offers an intuitive approach to solving the MSO-S problem sequentially. Indeed, if $M^{C} = \emptyset$ at any point, we can terminate the algorithm early, as an optimal selection has been achieved, allowing all orders to be fulfilled without resorting to back-end support.

Building on the similarity between MSO-S and MSO₂, we develop an iterative algorithm (Algorithm 4) similar to Algorithm 1 to refine any existing assortment selection, including solutions from the greedy algorithm. The convergence of Algorithm 4 is established in Proposition 10.

Proposition 10
Algorithm 4 converges when the number of iterations is sufficiently large.

It is essential to note that Algorithm 3 and 4 cannot be directly applied to solve the general MSO problem without significant adjustments. This limitation stems from the requirement for a single-tier multi-warehouse system to store all products. Indeed, there may be scenarios where including some of the least popular products has a minimal impact on reducing the objective value, especially when compared to the inclusion of more popular items. Such less popular products might be ignored by these two algorithms, thus failing to meet the requirement to fulfill all orders as required by the MSO problem.

To sum up, we have expanded upon our findings from single-tier multi-warehouse assortment selection to address the multi-tier multi-warehouse assortment selection. Note that this represents a preliminary exploration into a specific subclass of the MMWAS problem. We believe this direction is important, rich in potential, and holds great promise for future research.
6. Conclusion

This article investigates the multi-warehouse assortment selection problem, with a primary focus on minimizing split orders, a task that proves to be NP-hard. To address this challenging problem, we provide an MILP formulation that can be solved using commercial solvers. However, such an approach, while computationally viable, does not yield structural insights into how products should be optimally allocated across multiple warehouses. To bridge this gap, we introduce the concept of the second-order dominant indexing rule, which facilitates the characterization of optimal assortment structures under in various scenarios. Specifically, when a 2-MCI exists for the demand distribution, we can explicitly determine the optimal non-overlapping assortment. For total nested demand structures, the optimal assortment allocates more popular products to larger warehouses. In two-warehouse contexts, we develop the extended MCI policy, which is proven optimal under demand distributions with 2-MCI. Additionally, we establish a link between the MSO $_{2}$ problem and order fill rate maximization in single-warehouse settings, introducing the Iterative Improvement Heuristic to refine any initial assortment. Extensive numerical experiments validate the extended MCI policy’s optimality for independent product demand and its near-optimal performance under more general demand distributions. Expanding to multi-tier scenarios, such as a back-end warehouse supporting multiple front-end warehouses, we establish sufficient conditions for optimal solutions with non-overlapping constraints. Without such restrictions, we introduce a polynomial-time algorithm for the two-warehouse setting, proven optimal under 2-MCI. For general multi-warehouse settings, we propose two novel heuristics to effectively address the problem.

As for future research directions, several aspects of multi-warehouse assortment selection could be further explored. First, expanding the focus beyond order-splitting minimization to incorporate fulfillment cost trade-offs and more complex multi-warehouse configurations would provide valuable insights. Investigating multi-tier systems where limited splits are preferable to back-end fulfillment also presents a promising direction. Additionally, examining the interplay between fulfillment policies and assortment decisions could offer practical implications for optimizing multi-warehouse logistics. Beyond its application in warehouse assortment selection, the second-order dominant indexing rule has potential applications in various business environments. For instance, in inventory management, firms can leverage the structured demand pattern to optimize stock levels by prioritizing high-demand products, improving inventory efficiency and service levels. Specifically, integrating assortment selection with inventory planning while considering volume and quantity constraints could yield deeper insights.

Supplemental Material

sj-pdf-1-pao-10.1177_10591478251365581 - Supplemental material for Multi-Warehouse Assortment Selection: Minimizing Order Splitting in E-Commerce Logistics

Supplemental material, sj-pdf-1-pao-10.1177_10591478251365581 for Multi-Warehouse Assortment Selection: Minimizing Order Splitting in E-Commerce Logistics by Hongyuan Lin, Xiaobo Li and Fang Liu in Production and Operations Management

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received the following financial support for the research, authorship and/or publication of this article: The work of Xiaobo Li was supported in part by the National Natural Science Foundation of China (Grant 72171156), and by the Singapore Ministry of Education Academic Research Fund [Tier 1 Grants 23-0619-P0001 and 24-0500-A0001; Tier 3 Grant MOE-2019-T3-1-010]. The work of Fang Liu was supported by the Major Program of the National Natural Science Foundation of China (Grants 72192843 and 72192840).

ORCID iDs

Hongyuan Lin

Xiaobo Li

Fang Liu

Supplemental Material

Supplemental materials for this article are available online (doi: ).

Notes

How to cite this article

Lin H, Li X and Liu F (2025) Multi-Warehouse Assortment Selection: Minimizing Order Splitting in E-Commerce Logistics. Production and Operations Management xx(x): 1–21.

References

Alfaro

Corbett

(2003) The value of SKU rationalization in practice (the pooling effect under suboptimal inventory policies and nonnormal demand). Production and Operations Management 12(1): 12–29.

Bai

Feldman

Segev

, et al. (2023) Assortment optimization under the multi-purchase multinomial logit choice model. Operations Research 72(6): 2631–2664.

Barthel

Faraldi

Robnett

, et al. (2023) Winning formulas for e-commerce growth. https://www.bcg.com/publications/2023/winning-formulas-for-e-commerce-growth (accessed 15 January 2024).

Bebitoglu

Şen

Kaminsky

(2018) Multi-location assortment optimization under capacity constraints. Available at SSRN 3249175.

Bonekamp

(2019) Optimizing multi-warehouse assortment allocation in an online retail environment. Master’s Thesis, Eindhoven University of Technology.

Catalán

Fisher

(2012) Assortment allocation to distribution centers to minimize split customer orders. Available at SSRN 2166687.

Çömez-Dolgan

Moussawi-Haidar

Jaber

, et al. (2022) Capacitated assortment planning of a multi-location system under transshipments. International Journal of Production Economics 251: 108550.

Corsten

Hopf

Kasper

, et al. (2018) Assortment planning for multiple chain stores. OR Spectrum 40: 875–912.

Dzyabura

Jagabathula

(2018) Offline assortment optimization in the presence of an online channel. Management Science 64(6): 2767–2786.

10.

Fisher

Vaidyanathan

(2014) A demand estimation procedure for retail assortment optimization with results from implementations. Management Science 60(10): 2401–2415.

11.

Gallego

Topaloglu

(2019) Revenue Management and Pricing Analytics. New York: Springer, Vol. 209.

12.

Lin

Liu

(2024) Should only popular products be stocked? Warehouse assortment selection for e-commerce companies. Manufacturing & Service Operations Management 26(4): 1372–1386.

13.

Zheng

Zhou

, et al. (2019) Demand prediction, predictive shipping, and product allocation for large-scale e-commerce. Available at SSRN 3277125.

14.

Lin

(2025) Intra-category multi-choice preferences learning and assortment recommendation in e-commerce. Production and Operations Management X(X): 1–27. DOI: https://doi.org/10.1177/10591478251350853.

15.

Lopienski

(2021) How can SKU rationalization help you improve business performance. https://www.shipbob.com/blog/sku-rationalization/ (accessed 16 August 2023).

16.

Oboloo (2021) The importance of offering product variety for business success. https://oboloo.com/the-importance-of-offering-product-variety-for-business-success/ (accessed 27 February 2024).

17.

Russell

Petersen

(2000) Analysis of cross category dependence in market basket selection. Journal of Retailing 76(3): 367–392.

18.

Söylemez

(2021) Assortment Planning Considering Split Orders. Master’s Thesis, Bilkent Universitesi (Turkey).

19.

Tulabandhula

Sinha

Karra

, et al. (2023) Multi-purchase behavior: Modeling, estimation, and optimization. Manufacturing & Service Operations Management 25(6): 2298–2313.

20.

Mao

, et al. (2019) Assortment selection for a frontend warehouse: A robust data-driven approach. In: 49th International conference on computers and industrial engineering (CIE 2019), pp.56–64.

21.

Zhang

Lin

Huang

, et al. (2021) Multi-warehouse package consolidation for split orders in online retailing. European Journal of Operational Research 289(3): 1040–1055.

22.

Zhu

Huang

, et al. (2021) Optimization of product category allocation in multiple warehouses to minimize splitting of online supermarket customer orders. European Journal of Operational Research 290(2): 556–571.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

2.51 MB

Multi-Warehouse Assortment Selection: Minimizing Order Splitting in E-Commerce Logistics

Abstract

Keywords

1. Introduction

2. Literature Review

2.1. Warehouse Assortment Selection and Product Allocation

2.2. Multi-Location Assortment Optimization

3. Problem Setting and Second-Order Dominant Indexing Rule

3.1. Problem of Minimizing Split Orders and Its Complexity

Example 1 (An Example of BundleMVL Model with Four Products)

Definition 1 (Second-Order Dominant Indexing Rule w.r.t. Demand Function π )

Definition 3 Total Nested Structure

Example 2 (Example of Nestable Structure in Demand)

Example 3 (Example of Nested-Nested Structure in Demand)

4.3.1. Extended Marginal Choice Indexing Policy

4.3.4. Experiments Based on the Independent Choice Model

Table 2. Comparison of different policies under the ICM model for larger product sets.

Supplemental Material

sj-pdf-1-pao-10.1177_10591478251365581 - Supplemental material for Multi-Warehouse Assortment Selection: Minimizing Order Splitting in E-Commerce Logistics

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iDs

Supplemental Material

Notes

How to cite this article

References

Supplementary Material

Definition 1 (Second-Order Dominant Indexing Rule w.r.t. Demand Function $π$ )

Table 2.
Comparison of different policies under the ICM model for larger product sets.