چکیده
1. مقدمه
2. دفتر سفارشات محدود و فرمول توقف بهینه
3. رویکرد یادگیری تحت نظارت
4. رویکرد یادگیری تقویتی
5. آزمایش عددی: راه اندازی
6. آزمایش عددی: نتایج
منابع
ضمیمه ها
Abstract
1. Introduction
2. Limit order book and optimal stopping formulation
3. Supervised learning approach
4. Reinforcement learning approach
5. Numerical experiment: setup
6. Numerical experiment: results
Disclosure statement
References
Appendices
چکیده
ما مشکل زمان اجرا را در اجرای بهینه در نظر می گیریم. به طور خاص، ما مسئله اجرای بهینه یک نظم بینهایت کوچک را به عنوان یک مسئله توقف بهینه فرموله می کنیم. با استفاده از یک معماری شبکه عصبی جدید، ما دو نسخه از رویکردهای مبتنی بر داده را برای این مشکل توسعه میدهیم، یکی بر اساس یادگیری نظارت شده و دیگری مبتنی بر یادگیری تقویتی. یادگیری تفاوت زمانی را می توان به کار برد و این دو روش را به انواع مختلفی گسترش داد. از طریق آزمایشهای عددی بر روی دادههای بازار تاریخی، ما کاهش هزینه قابل توجهی را در این روشها نشان میدهیم. بینشهای حاصل از آزمایشهای عددی، مبادلات مختلفی را در استفاده از یادگیری تفاوت زمانی نشان میدهد، از جمله نرخ همگرایی، کارایی دادهها، و معاوضه بین تعصب و واریانس.
توجه! این متن ترجمه ماشینی بوده و توسط مترجمین ای ترجمه، ترجمه نشده است.
Abstract
We consider the problem of execution timing in optimal execution. Specifically, we formulate the optimal execution problem of an infinitesimal order as an optimal stopping problem. By using a novel neural network architecture, we develop two versions of data-driven approaches for this problem, one based on supervised learning, and the other based on reinforcement learning. Temporal difference learning can be applied and extends these two methods to many variants. Through numerical experiments on historical market data, we demonstrate significant cost reduction of these methods. Insights from numerical experiments reveals various tradeoffs in the use of temporal difference learning, including convergence rates, data efficiency, and a tradeoff between bias and variance.
Introduction
Optimal execution is a classic problem in finance that aims to optimize trading while balancing various tradeoffs. When trading a large order of stock, one of the most common tradeoffs is between market impact and price uncertainty. More specifically, if a large order is submitted as a single execution, the market would typically move in the adverse direction, worsening the average execution price. This phenomenon is commonly referred to as the ‘market impact’. In order to minimize the market impact, the trader has an incentive to divide the large order into smaller child orders and execute them gradually over time. However, this strategy inevitably prolongs the execution horizon, exposing the trader to a greater degree of price uncertainty. Optimal execution problems seek to obtain an optimal trading schedule while balancing a specific tradeoff such as this.
We will refer to the execution problem mentioned above as the parent-order problem, where an important issue is to divide a large parent order into smaller child orders to mitigate market impact. In this paper, we focus on the optimal execution of the child orders, that is, after the parent order is divided, the problem of executing each one of the child orders. The child orders are quite different in nature compared to the parent order. The child orders are typically much smaller in size, and the prescribed execution horizons are typically much shorter. In practice, a parent order is typically completed within hours or days, while a child orders are typically completed within seconds or minutes. Because any further dividing of an order can be viewed as another parent-order problem, we will only consider the child-order problem at the most atomic level. At this level, the child orders will not be further divided. In other words, each child order will be fulfilled in a single execution.
Numerical experiment: results
This section presents the results of the numerical experiments and discusses the interpretation of these results.
Best performances TD
learning is applied to both the SL and RL method, with various update step m (see section 3.4.1). These algorithms, SL-TD(m-step) and RL-TD(m-step), are trained using the training data, tuned with the validation data, and performances are reported using the testing data. Neural network architecture, learning rate, update step m, and other hyperparameters are tuned to maximize the performance. The best performances using SL and RL are reported in table 1. These figures are price gains per episode averaged over all 50 stocks. The price gain is reported in percentage of half-spread. The detailed performance for each stock can be found in appendix 5 (see table A6).
Given sufficient data and time, the RL method outperforms the SL method. This is true under both the stock-specific regime and the universal regime. The models trained under the universal regime generally outperform the models trained under the stock-specific regime as well.