Sage Journals: Discover world-class research

Abstract

We consider a dynamic pricing and learning problem where a seller prices multiple products and learns from sales data about unknown demand. We study the parametric demand model in a Bayesian setting. To avoid the classical problem of incomplete learning, we propose dithering policies under which prices are probabilistically selected in a neighborhood surrounding the myopic optimal price. By analyzing the effect of dithering in facilitating learning, we establish regret upper bounds for three typical settings of demand model. We show that the dithering policy achieves an upper bound of order $\log T$ when the parameter set is finite. It can be modified to achieve a constant regret bound under an additional assumption. We also prove an upper bound of order $\sqrt{T \log T}$ when the parameter set is compact and convex. Each bound matches (up to a logarithmic factor) the existing lower bound of any pricing policy. In this way, we show that dithering policies achieve asymptotically optimal performance in three different parameter settings, which demonstrates dithering as a unified approach to strike the balance between exploration and exploitation.

Keywords

Bayesian learning dynamic pricing exploration–exploitation regret analysis

Get full access to this article

View all access options for this article.

References

Agrawal

Goyal

(2012). Analysis of thompson sampling for the multi‐armed bandit problem. In Conference on learning theory (pp. 39–41). JMLR Workshop and Conference Proceedings.

Auer

Cesa‐Bianchi

Fischer

(2002). Finite‐time analysis of the multiarmed bandit problem. Machine Learning, 47, 235–256.

Banjević

Kim

M. J.

(2019). Thompson sampling for stochastic control: The continuous parameter case. IEEE Transactions on Automatic Control, 64(10), 4137–4152.

Bastani

Bayati

Khosravi

(2021). Mostly exploration‐free algorithms for contextual bandits. Management Science, 67(3), 1329–1349.

Borovkov

A. A.

(1998). Ergodicity and Stability of Stochastic Processes. Wiley, Chichster, U.K.

Broder

Rusmevichientong

(2012). Dynamic pricing under a general parametric choice model. Operations Research, 60(4), 965–980.

Carbery

Christ

Wright

(1999). Multidimensional van der Corput and sublevel set estimates. Journal of the American Mathematical Society, 12(4), 981–1015.

Chen

Jasin

Duenyas

(2021). Joint learning and optimization of multi‐product pricing with finite resource capacity and unknown demand parameters. Operations Research, 69(2), 560–573.

Cheung

W. C.

Simchi‐Levi

Wang

(2017). Dynamic pricing and demand learning with limited price experimentation. Operations Research, 65(6), 1722–1731.

10.

denBoer

A. V.

(2014). Dynamic pricing with multiple products and partially specified demand distribution. Mathematics of Operations Research, 39(3), 863–888.

11.

denBoer

A. V.

(2015). Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys in Operations Research and Management Science, 20(1), 1–18.

12.

denBoer

A. V.

Zwart

(2014). Simultaneously learning and optimizing using controlled variance pricing. Management Science, 60(3), 770–783.

13.

Dong

Kouvelis

Tian

(2009). Dynamic pricing and inventory control of substitute products. Manufacturing & Service Operations Management, 11(2), 317–339.

14.

Ferreira

K. J.

Simchi‐Levi

Wang

(2018). Online network revenue management using thompson sampling. Operations Research, 66(6), 1586–1602.

15.

Harrison

J. M.

Keskin

N. B.

Zeevi

(2012). Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Science, 58(3), 570–586.

16.

Keskin

N. B.

Zeevi

(2014). Dynamic pricing with an unknown demand model: Asymptotically optimal semi‐myopic policies. Operations Research, 62(5), 1142–1167.

17.

Keskin

N. B.

Zeevi

(2016). Chasing demand: Learning and earning in a changing environment. Mathematics of Operations Research, 42(2), 277–307.

18.

Huh

W. T.

(2011). Pricing multiple products with the multinomial logit and nested logit models: Concavity and implications. Manufacturing & Service Operations Management, 13(4), 549–563.

19.

Lobo

Boyd

(2003). Pricing and learning with uncertain demand. In INFORMS revenue management conference . Working paper, Duke University, Durham, NC.

20.

Moon

Bimpikis

Mendelson

(2018). Randomized markdowns and online monitoring. Management Science, 64(3), 1271–1290.

21.

Morel

Stalk

Stanger

Wetenhall

(2003). Pricing myopia. The Boston Consulting Group Perspectives.

22.

Murphy

K. P.

(2012). Machine learning: A probabilistic perspective. MIT Press.

23.

Nambiar

Simchi‐Levi

Wang

(2019). Dynamic learning and pricing with model misspecification. Management Science, 65(11), 4980–5000.

24.

Phillips

R. L.

(2005). Pricing and revenue optimization. Stanford University Press.

25.

Qiang

Bayati

(2016). Dynamic pricing with demand covariates . https://ssrn.com/abstract=2765257

26.

Robert

(2007). The Bayesian choice: From decision‐theoretic foundations to computational implementation. Springer Science & Business Media.

27.

Rothschild

(1974). A two‐armed bandit theory of market pricing. Journal of Economic Theory, 9(2), 185–202.

28.

Russo

Van Roy

(2016). An information‐theoretic analysis of thompson sampling. The Journal of Machine Learning Research, 17(1), 2442–2471.

Bayesian dithering for learning: Asymptotically optimal policies in dynamic pricing

Abstract

Keywords

Get full access to this article

References