Adaptive Learning Recommendation Strategy Based on Deep Q-learning

Abstract

Personalized recommendation system has been widely adopted in E-learning field that is adaptive to each learner’s own learning pace. With full utilization of learning behavior data, psychometric assessment models keep track of the learner’s proficiency on knowledge points, and then, the well-designed recommendation strategy selects a sequence of actions to meet the objective of maximizing learner’s learning efficiency. This article proposes a novel adaptive recommendation strategy under the framework of reinforcement learning. The proposed strategy is realized by the deep Q-learning algorithms, which are the techniques that contributed to the success of AlphaGo Zero to achieve the super-human level in playing the game of go. The proposed algorithm incorporates an early stopping to account for the possibility that learners may choose to stop learning. It can properly deal with missing data and can handle more individual-specific features for better recommendations. The recommendation strategy guides individual learners with efficient learning paths that vary from person to person. The authors showcase concrete examples with numeric analysis of substantive learning scenarios to further demonstrate the power of the proposed method.

Keywords

adaptive learning Markov decision process recommendation system reinforcement learning

Get full access to this article

View all access options for this article.

References

Barron

A. R.

(1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39, 930-945. doi:10.1109/18.256500

Bellman

R. E.

(2003). Dynamic programming. Mineola, NY: Dover Publications.

Brown

J. S.

Burton

R. R.

(1978). Diagnostic models for procedural bugs in basic mathematical skills. Cognitive Science, 2, 155-192. doi:10.1207/s15516709cog0202_4

Chen

Liu

Ying

(2018). Recommendation system for adaptive learning. Applied Psychological Measurement, 42, 24-41. doi:10.1177/0146621617697959

Choi

J. J.

Laibson

Madrian

B. C.

Metrick

(2009). Reinforcement learning and savings behavior. The Journal of Finance, 64, 2515-2534. doi:10.1111/j.1540-6261.2009.01509.x

Coffield

Moseley

Hall

Ecclestone

(2004). Learning styles and pedagogy in post-16 learning: A systematic and critical review. London, England: Learning and Skills Research Centre.

Frank

M. J.

Seeberger

L. C.

O’reilly

R. C.

(2004). By carrot or by stick: Cognitive reinforcement learning in parkinsonism. Science, 306, 1940-1943. doi:10.1126/science.1102941

Goodfellow

Bengio

Courville

Bengio

(2016). Deep learning (Vol. 1). Cambridge, MA: MIT Press.

Hasselt

H. v.

Guez

Silver

(2016). Deep reinforcement learning with double q-learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (pp. 2094-2100). AAAI Press. Retrieved from http://dl.acm.org/citation.cfm?id=3016100.3016191

10.

Kaelbling

L. P.

Littman

M. L.

Moore

A. W.

(1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237-285. doi:10.1613/jair.301

11.

Kober

Peters

(2012). Reinforcement learning in robotics: A survey. In Wiering

van Otterlo

(Eds.), Reinforcement learning: State-of-the-art Learning theory (pp. 579-610). Berlin, Germany: Springer.

12.

Melo

F. S.

Ribeiro

M. I.

(2007). Q-learning with linear function approximation. In Bshouty

N. H.

Gentile

(Eds.), Berlin, Germany: Springer. doi:10.1007/978-3-540-72927-3_23.

13.

Mnih

Kavukcuoglu

Silver

Graves

Antonoglou

Wierstra

Riedmiller

(2013). Playing atari with deep reinforcement learning (ArXiv preprint arXiv:1312.5602). Retrieved from http://arxiv.org/abs/1312.5602

14.

Mnih

Kavukcuoglu

Silver

Rusu

A. A.

Veness

Bellemare

M. G.

. . . Hassabis

(2015). Human-level control through deep reinforcement learning. Nature, 518, 529-533. doi:10.1038/nature14236

15.

Niño-Mora

(2009). A restless bandit marginal productivity index for opportunistic spectrum access with sensing errors. In Núñez-Queija

Resing

(Eds.), Proceedings of the 3rd Euro-NF conference on network control and optimization (Vol. 5894, pp. 60-74). Berlin, Germany: Springer. doi:10.1007/978-3-642-10406-0_5

16.

Powell

W. B.

(2007). Approximate dynamic programming: Solving the curses of dimensionality. New York, NY: Wiley-Interscience. doi:10.1002/9780470182963

17.

Reckase

M. D.

(2009). Multidimensional item response theory (Vol. 150). New York, NY: Springer.

18.

Silver

Huang

Maddison

Guez

Sifre

van den Driessche

. . . Hassabis

(2016). Mastering the game of go with deep neural networks and tree search. Nature, 529, 484-489. doi:10.1038/nature16961

19.

Skinner

B. F.

(1938). The behavior of organisms: An experimental analysis. Oxford, UK: Appleton-Century.

20.

Sleeman

Brown

J. S.

(1982). Intelligent tutoring systems. London, England: Academic Press. Retrieved from https://hal.archives-ouvertes.fr/hal-00702997

21.

Sutton

R. S.

Barto

A. G.

(1998). Reinforcement learning: An introduction (Vol. 1, No. 1). Cambridge, MA: MIT Press.

22.

Sutton

R. S.

McAllester

Singh

Mansour

(1999). Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems (pp. 1057-1063), Cambridge, MA: MIT Press. Retrieved from http://dl.acm.org/citation.cfm?id=3009657.3009806

23.

Tang

Chen

Liu

Ying

(2019). A reinforcement learning approach to personalized learning recommendation systems. British Journal of Mathematical and Statistical Psychology, 72, 108-135. doi:10.1111/bmsp.12144

24.

Watkins

C. J. C. H.

Dayan

(1992). Q-learning. Machine Learning, 8, 279-292. doi:10.1007/BF00992698

25.

Wenger

(1987). Artificial intelligence and tutoring systems: Computational and cognitive approaches to the communication of knowledge. San Francisco, CA: Morgan Kaufmann.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.18 MB