论文信息 - Frame Skip Is a Powerful Parameter for Learning to Play Atari

Frame Skip Is a Powerful Parameter for Learning to Play Atari

We show that setting a reasonable frame skip can be critical to the performance of agents learning to play Atari 2600 games. In all of the six games in our experiments, frame skip is a strong determinant of success. For two of these games, setting a large frame skip leads to state-of-the-art performance. The rate at which an agent interacts with its environment may be critical to its success. In the Arcade Learning Environment (ALE) (Bellemare et al. 2013) games run at sixty frames per second, and agents can submit an action at every frame. Frame skip is the number of frames an action is repeated before a new action is selected. Existing reinforcement learning (RL) approaches use static frame skip: HNEAT (Hausknecht et al. 2013) uses a frame skip of 0; DQN (Mnih et al. 2013) uses a frame skip of 2-3; SARSA and planning approaches (Bellemare et al. 2013) use a frame skip of 4. When action selection is computationally intensive, setting a higher frame skip can significantly decrease the time it takes to simulate an episode, at the cost of missing opportunities that only exist at a finer resolution. A large frame skip can also prevent degenerate super-human-reflex strategies, such as those described by Hausknecht et al. for Bowling, Kung Fu Master, Video Pinball and Beam Rider. We show that in addition to these advantages agents that act with high frame skip can actually learn faster with respect to the number of training episodes than those that skip no frames. We present results for six of the seven games covered by Mnih et al.: three (Beam Rider, Breakout and Pong) for which DQN was able to achieve near- or superhuman performance, and three (Q*Bert, Space Invaders and Seaquest) for which all RL approaches are far from human performance. These latter games were understood to be difficult because they require ‘strategy that extends over long time scales.’ In our experiments, setting a large frame skip was critical to achieving state-of-the-art performance in two of these games: Space Invaders and Q*Bert. More generally, the frame skip parameter was a strong determinant of performance in all six games. Our learning framework is a variant of Enforced Subpopulations (ESP) (Gomez and Miikkulainen 1997), a neuroevolution approach that has been successfully imple

[1] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[2] Jürgen Schmidhuber,et al. Co-evolving recurrent neurons learn deep memory POMDPs , 2005, GECCO '05.

[3] Jürgen Schmidhuber,et al. Training Recurrent Networks by Evolino , 2007, Neural Computation.

[4] Risto Miikkulainen,et al. Incremental Evolution of Complex General Behavior , 1997, Adapt. Behav..

[5] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[6] Risto Miikkulainen,et al. A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[7] Mostafa Vafadost. Temporal Abstraction in Monte Carlo Tree Search , 2013 .