Abstract
1- Introduction
2- Method
3- Experiments
4- Analysis
5- Conclusions
6- Acknowledgments
References
Abstract
In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro’s TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN achieved human-level performance in many Atari 2600 games. The purpose of this study is twofold. First, we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input. Second, we suggest that the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection can be competitive with DQN, without the need for a separate target network. We validate our proposed approach by, first, achieving new state-of-the-art results in both stochastic SZ-Tetris and Tetris with a small 10×10 board, using TD(λ) learning and shallow dSiLU network agents, and, then, by outperforming DQN in the Atari 2600 domain by using a deep Sarsa(λ) agent with SiLU and dSiLU hidden units.
Introduction
Neural networks have enjoyed a renaissance as function approximators in reinforcement learning (Sutton and Barto, 1998) in recent years. The DQN algorithm (Mnih et al., 2015), which combines Q-learning with a deep neural network, experience re5 play, and a separate target network, achieved human-level performance in many Atari 2600 games. Since the development of the DQN algorithm, there have been several proposed improvements, both to DQN specifically and deep reinforcement learning in general. Van Hasselt et al. (2015) proposed double DQN to reduce overestimation of the action values in DQN and Schaul et al. (2016) developed a framework for more 10 efficient replay by prioritizing experiences of more important state transitions. Wang et al. (2016) proposed the dueling network architecture for more efficient learning of the action value function by separately estimating the state value function and the advantages of each action. Mnih et al. (2016) proposed a framework for asynchronous learning by multiple agents in parallel, both for value-based and actor-critic methods. 15 To date, the most impressive application of using deep reinforcement learning is AlphaGo (Silver et al., 2016, 2017), which has achieved superhuman performance in the ancient board game Go. The purpose of this study is twofold. First, motivated by the high performance of the expected energy restricted Boltzmann machine (EE-RBM) in our earlier studies 20 (Elfwing et al., 2015, 2016), we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input. After we first proposed the SiLU (Elfwing et al., 2017), Ramachandran et al. (2017) recently performed a comprehensive compar25 ison between the SiLU, the rectifier linear unit (ReLU; Hahnloser et al., 2000), and 6 other activation functions in the supervised learning domain. They found that the SiLU consistently outperformed the other activation functions when tested in 3 deep architectures on CIFAR-10/100 (Krizhevsky, 2009), in 5 deep architectures on ImageNet (Deng et al., 2009), and on 4 test sets for English-to-German machine translation