Skip to content

Flow Toxicity Metrics

Market microstructure measures for detecting informed trading and quantifying price impact.

Functions

compute_vpin

compute_vpin(
    trades: DataFrame,
    bucket_volume: float,
    n_buckets: int = 50,
) -> pd.DataFrame

Compute the Volume-Synchronized Probability of Informed Trading.

Partitions cumulative trade volume into equal-sized buckets and measures the normalised buy/sell imbalance within each bucket. The trailing average of vpin over n_buckets is the headline VPIN metric.

Parameters:

Name Type Description Default
trades DataFrame

Trades with at least timestamp, price, volume, and direction columns. direction must contain "buy" or "sell" values.

required
bucket_volume float

Total volume per bucket. This is highly instrument-specific — a reasonable starting point is average daily volume / 50.

required
n_buckets int

Window length (in buckets) for the trailing VPIN average. Default is 50, following the original paper.

50

Returns:

Type Description
DataFrame

One row per completed bucket with columns:

  • bucket — zero-based bucket index
  • timestamp_start — first trade timestamp in the bucket
  • timestamp_end — last trade timestamp in the bucket
  • buy_volume — total buy volume in the bucket
  • sell_volume — total sell volume in the bucket
  • vpin|buy_volume - sell_volume| / bucket_volume
  • vpin_avg — trailing mean of vpin over n_buckets

Raises:

Type Description
ConfigError

If required columns are missing.

ObAnalyticsError

If trades is empty.

ValueError

If bucket_volume is not positive.

compute_kyle_lambda

compute_kyle_lambda(
    trades: DataFrame, window: str = "5min"
) -> KyleLambdaResult

Estimate Kyle's Lambda via OLS regression.

For each time window, computes:

  • ΔPrice = last trade price − first trade price
  • signed_volume = Σ(buy volume) − Σ(sell volume)

Then regresses ΔPrice on signed_volume across all windows. The slope (λ) measures how much the price moves per unit of net order flow — a proxy for market illiquidity and adverse selection.

Parameters:

Name Type Description Default
trades DataFrame

Trades with timestamp, price, volume, direction.

required
window str

Pandas frequency string for grouping trades. Default "5min".

'5min'

Returns:

Type Description
KyleLambdaResult

Frozen dataclass with lambda_, t_stat, r_squared, n_windows, and regression_df.

Raises:

Type Description
ConfigError

If required columns are missing.

ObAnalyticsError

If trades is empty.

order_flow_imbalance

order_flow_imbalance(
    trades: DataFrame, window: str = "1min"
) -> pd.DataFrame

Compute normalised order flow imbalance per time window.

For each window:

  • ofi = (buy_volume − sell_volume) / (buy_volume + sell_volume)

Values range from −1 (all sells) to +1 (all buys). Zero indicates balanced flow.

Parameters:

Name Type Description Default
trades DataFrame

Trades with timestamp, volume, direction.

required
window str

Pandas frequency string. Default "1min".

'1min'

Returns:

Type Description
DataFrame

Columns: timestamp, buy_volume, sell_volume, net_volume, ofi.

Raises:

Type Description
ConfigError

If required columns are missing.

ObAnalyticsError

If trades is empty.

Models

KyleLambdaResult dataclass

KyleLambdaResult(
    lambda_: float,
    t_stat: float,
    r_squared: float,
    n_windows: int,
    regression_df: DataFrame = pd.DataFrame(),
)

Result of a Kyle's λ OLS regression.

Attributes:

Name Type Description
lambda_ float

Slope — price change per unit signed order flow (higher = less liquid).

t_stat float

t-statistic for lambda_.

r_squared float

Fraction of ΔPrice variance explained by signed order flow.

n_windows int

Number of time windows in the regression.

regression_df DataFrame

Per-window timestamp/delta_price/signed_volume data.