December 7, 2022

Market Depth HFT Trading strategy with Machine Learning


In this post I will describe a successful trading strategy that uses market depth (AKA level 2) data for an HFT trading strategy, the data used is the current state of the order book best bid / ask price, and size.

The strategy is heavily based on obtaining the current state of the bid ask for the specific asset, then by using machine learning to detect entry and exit triggers the strategy trades long and short in 1-second intervals.


Strategy logic

First, the data is aggregated into 1-second intervals, since we have many changes in bid/ask values of a 1-second period the bid / ask prices are averaged and the sizes are aggregated to get a single OFI value per 1-second time interval, the prediction would then be for the next 1-second interval.

The strategy starts by receiving the best bid, best ask, bid size and ask size for the specific traded asset, it then calculates the Order Flow Imbalance to detect any changes that would help in the prediction of the next move

The Order Flow Imbalance is calculated as follows:

(BPn > BPn-1) * BS – (BP < BPn-1) * BS – (APn < APn-1) * AS + (APn > APn-1) * AS


BPn – Bid Price at time stamp n

BPn-1 – Bid Price at time stamp n-1

APn – Ask Price at time stamp n

APn-1 – Ask Price at time stamp n-1

AS – Ask Size

BS – Bid Size

Under this formula for OFI (Order Flow Imbalance),

If the current bid price and ask price are higher than the previous tick, this would result in a high positive number which is the sum of the bid, ask size

If the current bid and ask are lower than the previous tick, that will result in a low number, the sum of the two previous sizes

If the current bid price is higher and the current ask is lower then we get a number that is the difference between the bid size and the ask size

And if the current bid price is lower and the ask price is higher we get a number that is the difference between ask size and bid size

Meaning – an increase in the bid and ask price would translate to a large OFI, a decrease in both prices would result in a negative OFI, and in between, we get OFI that reflects positive demand if the bid price moves higher.

We calculate the OFI value on every tick and collect the data for a machine-learning cycle


Machine Learning Phase

The data collected is very noisy and should be normalized in a way that would make things easier for the prediction model, we use a method of converting the OFI values from their original values to the standard deviation difference from a moving average (in this case the 5 minutes MA), this way the values are normalized to std. dev. units instead of real bid / ask values.

The single feature for the prediction model is the OFI, normalized, value, and is fed into a linear regression model that is trained on the OFI value and the next future price as a label.



After feeding the model with the OFI values, we use the prediction vector as the trigger to go long or short on the tested asset, this time we take positive (predicted) OFI values as a trigger to go long and negative to go short in the next second.

Here are the visual results:


The returns are positive and growing along the price action.


There is valid proof for using ML and HFT to create a profitable trading strategy, ML can be used to predict the very near price movement according to order flow imbalance values,

There is still some work to be done in order to turn this solution into a working strategy but the framework is there,

Trade smart,