任民:中国科学院自动化研究所在读博士,本科毕业于国防科技大学,研究兴趣为生物特征识别。
报告题目:Chapter 2: Multi-armed Bandits
报告摘要:In this chapter we study the evaluative aspect of reinforcement learning in a simplified setting, one that does not involve learning to act in more than one situation. This non-associative setting is the one in which most prior work involving evaluative feedback has been done, and it avoids much of the complexity of the full reinforcement learning problem. Studying this case enables us to see most clearly how evaluative feedback differs from, and yet can be combined with, instructive feedback.
The particular non-associative, evaluative feedback problem that we explore is a simple version of the k-armed bandit problem. We use this problem to introduce a number of basic learning methods which we extend in later chapters to apply to the full reinforcement learning problem. At the end of this chapter, we take a step closer to the full reinforcement learning problem by discussing what happens when the bandit problem becomes associative, that is, when actions are taken in more than one situation.
Spotlight:
1、站在强化学习视角审视Multi-armed Bandits问题;
2、通过这一特定的任务环境,深入浅出,直击强化学习的基本的思想方法。