0%

论文题目:Prioritized Experience Replay

1. 论文简述

在 Nature DQN 和 Double DQN 论文中经验回放池的采样是基于均匀分布采样,一种更合理的方式应该考虑这些经验中哪些更具有对训练更有价值,也就是给这些经验值分配不同的优先级权重,在采样时这些重要的经验被抽取的概率对更大。DQN 论文中提及很早之前有研究做过一种 “Prioritized sweeping” 方法,就是实现经验回放的不均匀采样。本篇论文在前人的研究基础上提出一种新的框架——优先经验回放,使优先级更大的经验被选中的几率更大。“DQN + 优先经验回放”的方法在 Atari 游戏的测试中比 “DQN + 均匀经验回放”的方法更好(49个游戏有41个性能更优越)。

阅读全文 »

这篇笔记主要提及下面四篇关于DQN的著名论文的后两篇:

[1] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. 1–9. Retrieved from http://arxiv.org/abs/1312.5602

[2] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., … Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236

[3] Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-Learning. 30th AAAI Conference on Artificial Intelligence, AAAI 2016, 2094–2100.

[4] Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., & De Frcitas, N. (2016). Dueling Network Architectures for Deep Reinforcement Learning. 33rd International Conference on Machine Learning, ICML 2016, 4(9), 2939–2947.

阅读全文 »

这篇笔记主要提及下面四篇关于DQN的著名论文的前两篇:

[1] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. 1–9. Retrieved from http://arxiv.org/abs/1312.5602

[2] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., … Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236

[3] Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-Learning. 30th AAAI Conference on Artificial Intelligence, AAAI 2016, 2094–2100.

[4] Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., & De Frcitas, N. (2016). Dueling Network Architectures for Deep Reinforcement Learning. 33rd International Conference on Machine Learning, ICML 2016, 4(9), 2939–2947.

阅读全文 »

博客中通常会用latex来写数学公式,但是浏览器并不总是能渲染这些公式,导致在浏览器上看是一堆latex代码,这就很烦了。有一种比较简便的方法去渲染这些公式,就是在浏览器中添加相关的mathjax插件。遗憾的是,并不能保证每个浏览器都会有相应的mathjax插件,并且浏览博客的读者浏览器也不能保证一定装了这些插件。为了从根源上解决问题,我们直接让hexo具备渲染mathjax的能力,这样无论浏览器是否开启mathjax插件,公式都可以完美呈现在读者面前。

阅读全文 »

1. 简述

论文题目:《CONTINUOUS CONTROL WITH DEEP REINFORCEMENT
LEARNING》。该论文提出了基于 deterministic policy gradient 的 DDPG(deep deterministic policy gradient) 算法,能够运用在连续的动作空间中,能够 learn policy “end to end”。

阅读全文 »