An early stopping strategy is followed on 25% of the training sets to avoid overfitting. The architecture of the target DQN is identical to that of the prediction DQN, the parameters of the former being copied from the latter every 8 hours. At the start of every 5-second time step, the latest state (as defined in Section 4.1.4) is fed as input to the prediction DQN. The sought-after Q values–those corresponding to past experiences of taking actions from this state– are then computed for each of the 20 available actions, using both the prediction DQN and the target DQN (Eq ). Balancing exploration and exploitation advantageously is a central challenge in RL. Γd is a discount factor (γd∈) by which future expected rewards are given less weight in the current Q-value than the latest observed reward.

Table 8 provides further insight combining the results for Max DD and P&L-to-MAP. From the negative values in the Max DD columns, we see that Alpha-AS-1 had a larger Max DD (i.e., performed worse) than Gen-AS on 16 of the 30 test days. However, on 13 of those days Alpha-AS-1 achieved a better P&L-to-MAP score than Gen-AS, substantially so in many instances. Only on one day was the trend reversed, with Gen-AS performing slightly worse than Alpha-AS-1 on Max DD, but then performing better than Alpha-AS-1 on P&L-to-MAP. This is obtained from the algorithm’s P&L, discounting the losses from speculative positions. The Asymmetric dampened P&L penalizes speculative positions, as speculative profits are not added while losses are discounted.

What market making model is the bot using? Something basic like Avellaneda Stoikov or more elaborate?

— de La Axe 🦇🔊💎✋ (@delaaxe) February 2, 2022

The usual https://www.beaxy.com/ in algorithmic trading research is to use machine learning algorithms to determine the buy and sell orders directly. These orders are the output actions of each execution cycle. In contrast, we propose maintaining the Avellaneda-Stoikov procedure as the basis upon which to determine the orders to be placed. We use a reinforcement learning algorithm, a double DQN, to adjust, at each trading step, the values of the parameters that are modelled as constants in the AS procedure. The actions performed by our RL agent are the setting of the AS parameter values for the next execution cycle.

## IEEE Transactions on Knowledge and Data Engineering

Continuous-time stochastic control and BNB optimization with financial applications. Optimal high-frequency trading with limit and market orders. And for the stock price dynamics which are provided in each model definition.

In the literature, reinforcement learning approaches to market making typically employ models that act directly on the agent’s order prices, without taking advantage of knowledge we may have of market behaviour or indeed findings in market-making theory. These models, therefore, must learn everything about the problem at hand, and the learning curve is steeper and slower to surmount than if relevant available knowledge were to be leveraged to guide them. Now, as another extension of a stock price impact on optimal market making problem, we work on the problem that the stochastic volatility of the asset is affected by the arrival of market orders and perform this case on the optimal trading prices.

### Market-making by a foreign exchange dealer – Risk.net

Market-making by a foreign exchange dealer.

Posted: Wed, 10 Aug 2022 07:00:00 GMT [source]

Guilbaud and Pham also used a avellaneda-stoikov model inspired from the Avellaneda-Stoikov framework but including market orders and limit orders at best bid and ask together with stochastic spreads. This paper introduces \mbtgym, a Python module that provides a suite of gym environments for training reinforcement learning agents to solve such model-based trading problems. The avellaneda stoikov model seems to be way too simplistic to be practical in a lot of products. For example, in products with larger tick size, the queue priority will be significantly more important than distance from price in terms of fill probability.

## High frequency trading and the new market makers

Since this is a market-making strategy, some configurations will be similar to the pure market-making strategy, so we will cover what is different in this article. Reading the paper, you won’t find any direct indication of calculating these two parameters’ values. The Avellaneda & Stoikov model was created to be used on traditional financial markets, where trading sessions have a start and an end. The inventory position is flipped, and now the bid offers are being created closer to the market mid-price. It’s easy to see how the calculated reservation price is different from the market mid-price . The basic strategy for market making is to create symmetrical bid and ask orders around the market mid-price.

For market making, the Avellaneda & Stoikov model for limit orders depends on γ (how much inventory you’re willing to hold)

I could either run simulations like they do in their paper or just tweak it continuously in production

Decisions🤔 pic.twitter.com/w3Bybrw35A

— Lionel Lightcycle (@0xLightcycle) October 25, 2021

On the other hand, she does not face with the liquidation risk on the negative inventory levels but wants to receive higher amount for selling the assets. It is observed that the thickness of the market prices is correlated with the trading intensity inversely. As a larger trading intensity decreases the market impact in execution which leads a decrease in price movements; it causes a lower price that is presented in Fig. For the case of a quadratic utility function, we derive the optimal spreads for limit orders and observe their behaviors.

## current community

However, these existing algorithms are often limited in solving high-dimensionality and rank minimization relaxation. In this paper, a robust kernel factorization embedding graph regularization method is developed to statically impute missing measurements. Specifically, the implicit high-dimensional feature space of ill-conditioned data is factorized by kernel sparse dictionary. Then, a robust sparse-norm and graph regularization constraints are performed in the objective function to ensure the consistency of the spatial information. For the optimization of the parameters involved in the model, a distributed adaptive proximal Newton gradient descent learning strategy is proposed to accelerate the convergence. Furthermore, considering the dynamic time-series and potentially non-stationary structure of industrial data, we propose extended incremental versions to alleviate the complexity of the overall model computation.

The half-second required by the system is put to good use in practice. For a single avellaneda-stoikov model, the computation time required for the main procedures is recorded in Table 8. In addition to the algorithmic calculations, we reserve time for some mechanical order-related activities, such as order submission and execution in exchanges. The Chinese A-share market can satisfy this tick-time condition with its update frequency of 3 s. Our empirical study shows that our deep LOB trading system is effective in the context of the Chinese market, which will encourage its use by other traders.

## Sharpe ratio

Section 3 provides an overview of reinforcement learning and its uses in algorithmic trading. Section 5 describes the experimental setup for backtests that were performed on our RL models, the Gen-AS model and two simple baselines. The results obtained from these tests are discussed in Section 6. The concluding Section 7 summarises the approach and findings, and outlines ideas for model improvement. What is common to all the above approaches is their reliance on learning agents to place buy and sell orders directly.

With the risk aversion parameter, you tell the bot how much inventory risk you want to take. A value close to 1 will indicate that you don’t want to take too much inventory risk, and hummingbot will “push” the reservation price more to reach the inventory target. On Figure5, we see the use of the strategy in a bearish period. The first order is executed rapidly and since the market price goes down, the trader’s last orders are only executed at the end of the period when prices of orders are lowered substantially as it becomes urgent to sell. Practically, this obviously raises the question of linking a trend detector to these optimal liquidation algorithms. We clearly see that the optimal quotes depend on inventory in a monotonic way.

AlphaGo learned by playing against itself many times, registering the moves that were more likely to lead to victory in any given situation, thus gradually improving its overall strategies. The same concept has been applied to train a machine to play Atari video games competently, feeding a convolutional neural network with the pixel values of successive screen stills from the games . One way to improve the performance of an AS model is by tweaking the values of its constants to fit more closely the trading environment in which it is operating. In section 4.2, we describe our approach of using genetic algorithms to optimize the values of the AS model constants using trading data from the market we will operate in. Alternatively, we can resort to machine learning algorithms to adjust the AS model constants and/or its output ask and bid prices dynamically, as patterns found in market-related data evolve.

However, existing methods fail to achieve both the two goals simultaneously. To fill this gap, this paper presents an interpretable intuitionistic fuzzy inference model, dubbed as IIFI. While retaining the prediction accuracy, the interpretable module in IIFI can automatically calculate the feature contribution based on the intuitionistic fuzzy set, which provides high interpretability of the model. Also, most of the existing training algorithms, such as LightGBM, XGBoost, DNN, Stacking, etc, can be embedded in the inference module of our proposed model and achieve better prediction results. The back-test experiment on China’s A-share market shows that IIFI achieves superior performance — the stock profitability can be increased by more than 20% over the baseline methods.

- Two variants of the deep RL model (Alpha-AS-1 and Alpha-AS-2) were backtested on real data (L2 tick data from 30 days of bitcoin–dollar pair trading) alongside the Gen-AS model and two other baselines.
- Consequently, we support our findings by comparing the models proposed within this research with the stock price impact models existing in literature.
- Whether to enable adding transaction costs to order price calculation.
- In the literature, reinforcement learning approaches to market making typically employ models that act directly on the agent’s order prices, without taking advantage of knowledge we may have of market behaviour or indeed findings in market-making theory.
- This parameter is a value that must be defined by the market maker, considering how much inventory risk he is willing to be exposed.
- Figures in bold are the best values among the five models for the corresponding test days.

Additionally, sensitivity to volatility changes will be included with a particular parameter vol_to_spread_multiplier, to modify spreads in big volatility scenarios. Is also interesting because a positive trend goes against the naturally convex shape of the trading curve. Finally, the asymptotic quote decreases as the risk aversion increases. An increase in risk aversion forces indeed the trader to reduce both price risk and non-execution risk and this leads to posting orders with lower prices. Random forest is an efficient and accurate classification model, which makes decisions by aggregating a set of trees, either by voting or by averaging class posterior probability estimates. However, tree outputs may be unreliable in presence of GMT scarce data.

In humble homage to Google’s AlphaGo programme, we will refer to our double DQN algorithm as Alpha-Avellaneda-Stoikov (Alpha-AS). One of the most active areas of research in algorithmic trading is, broadly, the application of machine learning algorithms to derive trading decisions based on underlying trends in the volatile and hard to predict activity of securities markets. Machine learning approaches have been explored to obtain dynamic limit order placement strategies that attempt to adapt in real time to changing market conditions. As regards market making, the AS algorithm, or versions of it , have been used as benchmarks against which to measure the improved performance of the machine learning algorithms proposed, either working with simulated data or in backtests with real data. The literature on machine learning approaches to market making is extensive.

This helps to keep the models simple and shorten the training time of the neural network in order to test the idea of combining the Avellaneda-Stoikov procedure with reinforcement learning. The results obtained in this fashion encourage us to explore refinements such as models with continuous action spaces. The logic of the Alpha-AS model might also be adapted to exploit alpha signals . The latter is an important feature for market maker algorithms.

Regarding the latter, our results lead to new and easily interpretable closed-form approximations for the optimal quotes, both in the finite-horizon case and in the asymptotic regime. These successes with games have attracted attention from other areas, including finance and algorithmic trading. The large amount of data available in these fields makes it possible to run reliable environment simulations with which to train DRL algorithms. DRL is widely used in the algorithmic trading world, primarily to determine the best action to take in trading by candles, by predicting what the market is going to do. For instance, Lee and Jangmin used Q-learning with two pairs of agents cooperating to predict market trends (through two “signal” agents, one on the buy side and one on the sell side) and determine a trading strategy (through a buy “order” agent and a sell “order” agent).

That is introduced with quadratic utility function and solved by providing a closed-form solution. Using the exponential utility function and the results are provided for the following models. In order to recall the models easier, we call the model studied in in Case 1 in Sect. 3 with stock price dynamics as “Model 1” and the model with the dynamics “Model 2”. It is worth mentioning that the trader changes her qualitative behavior depending on the liquidation and penalizing variations of the constants and her positions on inventories as the time approaches to maturity. Increases as the trader expects the price to move up, she sends the orders at higher prices to get profit from the price increase which meets with our expectation.

(γd is usually denoted simply as γ, but in this paper we reserve the latter to denote the risk aversion parameter of the AS procedure). Typically, in the beginning the agent does not know the transition and reward functions. It must explore actions in different states and record how the environment responds in each case. Through repeated exploration the agent gradually learns the relationships between states, actions and rewards. It can then start exploiting this knowledge to apply an action selection policy that takes it closer to achieving its reward maximization goal. However, I do not see any specification of bounds for this reservation price and therefore I think there is no guarantee that ask prices computed by the market-maker will be higher or bid prices will be lower than the current price of the process.