واحد های خطی S شکل وزن دار برای تقریب تابع شبکه عصبی در یادگیری تقویتی
ترجمه نشده

واحد های خطی S شکل وزن دار برای تقریب تابع شبکه عصبی در یادگیری تقویتی

عنوان فارسی مقاله: واحد های خطی S شکل وزن دار برای تقریب تابع شبکه عصبی در یادگیری تقویتی
عنوان انگلیسی مقاله: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning
مجله/کنفرانس: شبکه های عصبی – Neural Networks
رشته های تحصیلی مرتبط: مهندسی کامپیوتر، فناوری اطلاعات
گرایش های تحصیلی مرتبط: هوش مصنوعی، شبکه های کامپیوتری
کلمات کلیدی فارسی: یادگیری تقویتی، واحد خطی وزن سیگموئیدی، تابع تقریبی، تتریس، آتاری ۲۶۰۰، یادگیری عمیق
کلمات کلیدی انگلیسی: reinforcement learning، sigmoid-weighted linear unit، function
نوع نگارش مقاله: مقاله پژوهشی (Research Article)
شناسه دیجیتال (DOI): https://doi.org/10.1016/j.neunet.2017.12.012
دانشگاه: Dept. of Brain Robot Interface – ATR Computational Neuroscience Laboratories – Japan
صفحات مقاله انگلیسی: 26
ناشر: الزویر - Elsevier
نوع ارائه مقاله: ژورنال
نوع مقاله: ISI
سال انتشار مقاله: 2018
ایمپکت فاکتور: ۷٫۱۹۷ در سال ۲۰۱۷
شاخص H_index: ۱۲۱ در سال ۲۰۱۸
شاخص SJR: ۲٫۳۵۹ در سال ۲۰۱۸
شناسه ISSN: 0893-6080
فرمت مقاله انگلیسی: PDF
وضعیت ترجمه: ترجمه نشده است
قیمت مقاله انگلیسی: رایگان
آیا این مقاله بیس است: خیر
کد محصول: E10530
فهرست مطالب (انگلیسی)

Abstract

1- Introduction

2- Method

3- Experiments

4- Analysis

5- Conclusions

6- Acknowledgments

References

بخشی از مقاله (انگلیسی)

Abstract

In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Tesauro’s TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN achieved human-level performance in many Atari 2600 games. The purpose of this study is twofold. First, we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input. Second, we suggest that the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection can be competitive with DQN, without the need for a separate target network. We validate our proposed approach by, first, achieving new state-of-the-art results in both stochastic SZ-Tetris and Tetris with a small 10×10 board, using TD(λ) learning and shallow dSiLU network agents, and, then, by outperforming DQN in the Atari 2600 domain by using a deep Sarsa(λ) agent with SiLU and dSiLU hidden units.

Introduction

Neural networks have enjoyed a renaissance as function approximators in reinforcement learning (Sutton and Barto, 1998) in recent years. The DQN algorithm (Mnih et al., 2015), which combines Q-learning with a deep neural network, experience re5 play, and a separate target network, achieved human-level performance in many Atari 2600 games. Since the development of the DQN algorithm, there have been several proposed improvements, both to DQN specifically and deep reinforcement learning in general. Van Hasselt et al. (2015) proposed double DQN to reduce overestimation of the action values in DQN and Schaul et al. (2016) developed a framework for more 10 efficient replay by prioritizing experiences of more important state transitions. Wang et al. (2016) proposed the dueling network architecture for more efficient learning of the action value function by separately estimating the state value function and the advantages of each action. Mnih et al. (2016) proposed a framework for asynchronous learning by multiple agents in parallel, both for value-based and actor-critic methods. 15 To date, the most impressive application of using deep reinforcement learning is AlphaGo (Silver et al., 2016, 2017), which has achieved superhuman performance in the ancient board game Go. The purpose of this study is twofold. First, motivated by the high performance of the expected energy restricted Boltzmann machine (EE-RBM) in our earlier studies 20 (Elfwing et al., 2015, 2016), we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear unit (SiLU) and its derivative function (dSiLU). The activation of the SiLU is computed by the sigmoid function multiplied by its input. After we first proposed the SiLU (Elfwing et al., 2017), Ramachandran et al. (2017) recently performed a comprehensive compar25 ison between the SiLU, the rectifier linear unit (ReLU; Hahnloser et al., 2000), and 6 other activation functions in the supervised learning domain. They found that the SiLU consistently outperformed the other activation functions when tested in 3 deep architectures on CIFAR-10/100 (Krizhevsky, 2009), in 5 deep architectures on ImageNet (Deng et al., 2009), and on 4 test sets for English-to-German machine translation