Abstract
Product returns are prevalent in practice. Many retailers provide lenient free return policies but with specific return window within which customers are allowed to return products. Motivated by this phenomenon, we consider a single-product online learning and pricing problem with stochastic product returns. A salient feature is that the demand function, depending on price and return window decisions, is initially unknown and must be learned on the fly. The retailer thus faces the classic exploration–exploitation trade-off. Moreover, we consider an inventory constraint, introducing an additional trade-off between earning revenue and managing inventory. We propose a modeling framework to integrate pricing and return window decisions, and develop a deterministic fluid model that serves as the full-information benchmark. To tackle the learning problem, we design a novel nonparametric learning algorithm that seamlessly integrates inverse stochastic gradient descent (SGD) and Upper Confidence Bound (UCB) methods. Under mild assumptions on demand and revenue functions, we establish a regret upper bound for our learning algorithm as
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
