چکیده
1. مقدمه
2 پیشینه نظری
3 مطالعات مرتبط
4 روش شناسی
5 آزمایش و نتایج
6. نتیجه گیری
منابع
Abstract
1 Introduction
2 Theoretical Background
3 Related Works
4 Methodology
5 Experiments and Results
6 Conclusions
Declarations
References
چکیده
شبیه ساز سه بعدی فوتبال روبوکاپ یک مسابقه فوتبال رباتی است که بر اساس یک شبیهساز با وفاداری بالا با عوامل انساننمای مستقل ساخته شده است و آن را به یک بستر آزمایشی جالب برای رباتیک و هوش مصنوعی تبدیل میکند. با توجه به موفقیت اخیر یادگیری تقویتی عمیق (DRL) در وظایف کنترل مداوم، تیم های زیادی از این تکنیک برای توسعه حرکات در Soccer 3D استفاده می کنند. این مقاله بر یادگیری رفتارهای ربات انسان نما تمرکز دارد: تکمیل یک مسیر مسابقه با بیشترین سرعت ممکن و دریبل زدن در برابر یک حریف. رویکرد ما از یک کنترلکننده سلسله مراتبی استفاده میکند که در آن یک خطمشی بدون مدل یاد میگیرد تا با الگوریتم راه رفتن مبتنی بر مدل تعامل برقرار کند. سپس، از الگوریتمهای DRL برای یک عامل استفاده میکنیم تا نحوه انجام این رفتارها را بیاموزیم. در نهایت، سیاست دریبل آموخته شده در محیط Soccer 3D مورد ارزیابی قرار گرفت. آزمایشهای شبیهسازیشده نشان میدهند که عامل DRL در برابر رفتار کدگذاریشده دستی که توسط تیم روباتیک ITAndroids در ۶۸.۲ درصد از تلاشهای دریبل استفاده میشود، پیروز میشود.
توجه! این متن ترجمه ماشینی بوده و توسط مترجمین ای ترجمه، ترجمه نشده است.
Abstract
RoboCup 3D Soccer Simulation is a robot soccer competition based on a high-fidelity simulator with autonomous humanoid agents, making it an interesting testbed for robotics and artificial intelligence. Due to the recent success of Deep Reinforcement Learning (DRL) in continuous control tasks, many teams have been using this technique to develop motions in Soccer 3D. This article focuses on learning humanoid robot behaviors: completing a racing track as fast as possible and dribbling against a single opponent. Our approach uses a hierarchical controller where a model-free policy learns to interact model-based walking algorithm. Then, we use DRL algorithms for an agent to learn how to perform these behaviors. Finally, the learned dribble policy was evaluated in the Soccer 3D environment. Simulated experiments show that the DRL agent wins against the hand-coded behavior used by the ITAndroids robotics team in 68.2% of dribble attempts.
Introduction
RoboCup is an international academic competition created to foster robotics and artificial intelligence research [27]. It has an ambitious long-term goal of having a team of humanoid robots beating the human soccer World Cup champions by 2050. There are many leagues with different game rules and constraints on robot designs to accelerate progress towards this objective.
RoboCup 3D Soccer Simulation (Soccer 3D) is a league of RoboCup based on a robot soccer simulator with a high-fidelity simulation model of the Nao humanoid robot. The particular contributions to RoboCup reside in being a research environment for high-level multi-agent cooperative decision-making, and humanoid robot control [44]. A simulation environment is convenient for machine learning algorithms due to their need for large amounts of data [36]. Dealing with real robots is time-consuming due to the need to recharge batteries or reallocate robots manually to set up experiments. Moreover, experience collection may be largely accelerated by running many simulations in parallel and executing in faster than real-time. Unfortunately, transferring behaviors learned in simulation to real robots is challenging due to the so-called reality gap. Still, some works have succeeded in doing so, usually by executing a final fine-tuning process on the real robot [36].
Conclusions
Our main objective was to learn high-level soccer behaviors using reinforcement learning in this work. We addressed the problem with state-of-the-art model-free deep reinforcement learning algorithms, namely DDPG, TRPO, and PPO. Therefore, we learned behaviors while dealing with the complex dynamics of a humanoid robot.
To facilitate, we used a hierarchical approach where the agent learns to command a model-based walking engine based on the Zero Moment Point (ZMP) concept. The walking engine receives the desired velocities in forward, lateral and rotational directions and outputs the joint angles.
We developed a DRL framework for integrating DRL algorithms with the RoboCup 3D Soccer Simulation environment to accomplish our objective. In our results, PPO achieved the best performance, which was expected, and effectively learned humanoid robot behaviors.