Skip to content

LOBSTER Format

Support for LOBSTER message files (event types 1–7), orderbook-backed depth, and round-trip writers.

Use via Pipeline(format=LobsterFormat(...)) or Pipeline.from_format("lobster", ...).

Key differences from Bitstamp:

  • Executions in the message fileLobsterTradeReader builds trades directly from type 4/5 rows in the events DataFrame (no separate trades file).
  • Orderbook-backed depthLobsterFormat.compute_depth reads the official orderbook file for ground-truth depth instead of reconstructing from events
  • Integer prices — raw prices are in ten-thousandths of a dollar (price_divisor=10000)

LobsterLoader

LobsterLoader(
    config: PipelineConfig | None = None,
    *,
    trading_date: str | Timestamp,
)

Load raw limit-order events from LOBSTER message files.

Satisfies the :class:~ob_analytics.protocols.EventLoader protocol.

Parameters:

Name Type Description Default
config PipelineConfig

Pipeline configuration.

None
trading_date str or Timestamp

The calendar date of the trading session (LOBSTER timestamps are seconds after midnight and need a date anchor).

required

load

load(source: Any) -> pd.DataFrame

Load LOBSTER message data and return a cleaned events DataFrame.

Parameters:

Name Type Description Default
source str, Path, or directory

Path to a LOBSTER message CSV, or a directory containing message/orderbook file pairs. When a directory is given the loader auto-discovers files by the LOBSTER naming convention.

required

Returns:

Type Description
DataFrame

LobsterTradeReader

LobsterTradeReader(config: PipelineConfig | None = None)

Build trades directly from LOBSTER execution events.

In LOBSTER, each execution event (type 4 or 5) represents the resting (maker) side of a trade. This reader builds trade records directly from those rows in the events frame; no matching is needed because the data already pairs maker rows with executions.

Satisfies the :class:~ob_analytics.protocols.TradeSource protocol.

load

load(events: DataFrame, source: Any) -> pd.DataFrame

Build a trades DataFrame from LOBSTER execution events.

Parameters:

Name Type Description Default
events DataFrame

Events with raw_event_type column populated.

required
source Any

Unused; trade information is embedded in events.

required

Returns:

Type Description
DataFrame

Trades with timestamp, price, volume, direction, maker_event_id, taker_event_id, maker, taker.

LobsterWriter

LobsterWriter(
    config: PipelineConfig | None = None,
    *,
    trading_date: str | Timestamp,
    price_divisor: int | None = None,
)

Write pipeline events back to LOBSTER dual-file format.

Satisfies the :class:~ob_analytics.protocols.DataWriter protocol.

Parameters:

Name Type Description Default
trading_date str or Timestamp

Calendar date of the session.

required
price_divisor int

Multiplier to convert decimal prices back to LOBSTER integers.

None

write

write(
    data: dict[str, DataFrame],
    dest: str | Path,
    *,
    ticker: str = "DATA",
    num_levels: int = 10,
    **kwargs: Any,
) -> tuple[Path, Path]

Write events to LOBSTER message + orderbook files.

Parameters:

Name Type Description Default
data dict

Must contain "events" key.

required
dest str or Path

Output directory.

required
ticker str

Ticker symbol for filename.

'DATA'
num_levels int

Number of orderbook levels to write.

10

Returns:

Type Description
tuple of Path

(message_path, orderbook_path)

LobsterFormat dataclass

LobsterFormat()

Format descriptor for LOBSTER limit-order-book data.

trading_date is taken from the per-run :class:~ob_analytics.protocols.RunContext, not the format constructor — so the same LobsterFormat() instance can be reused across runs with different sessions.

lobster_depth_from_orderbook

lobster_depth_from_orderbook(
    events: DataFrame,
    orderbook_path: Path,
    config: PipelineConfig,
) -> tuple[pd.DataFrame, pd.DataFrame]

Compute depth and depth summary from the LOBSTER orderbook file.

The LOBSTER orderbook file is ground truth: it records the complete visible book state after every message event. This function converts it into the (depth, depth_summary) pair the pipeline expects, avoiding the need to reconstruct depth from message events (which fails when events reference pre-market orders absent from the message file).

Parameters:

Name Type Description Default
events DataFrame

Events DataFrame (used only for timestamps and event IDs).

required
orderbook_path Path

Path to the LOBSTER orderbook CSV.

required
config PipelineConfig

Pipeline configuration.

required

Returns:

Type Description
tuple of (depth DataFrame, depth_summary DataFrame)