Skip to content

Protocol Interfaces

Contracts that pluggable pipeline components must satisfy. Uses structural (duck) typing — implement the right method signature and it works, no inheritance required.

Protocol Method Purpose
EventLoader load(source) → DataFrame Parse raw data into events
TradeSource load(events, source) → DataFrame Build the canonical trades DataFrame
DataWriter write(data, dest) Serialize pipeline outputs
Format factory methods Bundle loader, trade source, and writer for a venue

EventLoader

Bases: Protocol

Loads raw order-book events from a data source.

The returned DataFrame must contain at least the columns required by ob_analytics.schemas.validate_events_df.

load

load(source: Any) -> pd.DataFrame

Load events from source and return a DataFrame.

Parameters:

Name Type Description Default
source Any

Data source identifier. The canonical type is str | Path (a file path), but loaders may accept richer descriptors such as dicts, dataclasses, or connection strings.

required

Returns:

Type Description
DataFrame

Events with at least the columns required by ob_analytics.schemas.validate_events_df.

TradeSource

Bases: Protocol

Builds the trades DataFrame for a given run.

Implementations read explicit trade records (a separate trades.csv, LOBSTER execution rows embedded in the events frame, etc.) and project them into the canonical trades schema.

Returned DataFrame columns:

  • timestamp — pandas datetime64[ns]
  • price — float
  • volume — float
  • direction — categorical buy/sell (taker side)
  • maker_event_id — integer event id of the resting order
  • taker_event_id — integer event id of the aggressing order
  • maker — order id of the resting order
  • taker — order id of the aggressing order
  • maker_og — original_number of the maker event
  • taker_og — original_number of the taker event

load

load(events: DataFrame, source: Any) -> pd.DataFrame

Build the trades DataFrame.

Parameters:

Name Type Description Default
events DataFrame

The processed events frame (post-loader).

required
source Any

The same source value passed to :meth:EventLoader.load. Used by file-based readers to locate companion files.

required

Returns:

Type Description
DataFrame

DataWriter

Bases: Protocol

Writes pipeline results to a format-specific output.

write

write(
    data: dict[str, DataFrame],
    dest: str | Path,
    **kwargs: Any,
) -> Path | tuple[Path, ...]

Write pipeline DataFrames to dest.

Parameters:

Name Type Description Default
data dict of str to DataFrame

Pipeline output keyed by name (e.g. "events", "trades", "depth", "depth_summary").

required
dest str or Path

Output path (file or directory, format-dependent).

required

Format

Bases: Protocol

Structural contract for a data-format descriptor.

A Format bundles the per-format factories the pipeline needs: how to load events, how to acquire trades, and (optionally) how to write results or compute depth directly. Pass instances to Pipeline(format=...).

There is no base class to inherit — any object providing these members satisfies the contract (structural typing). name is a short lowercase identifier (e.g. "bitstamp").

create_loader

create_loader(config: Any, ctx: RunContext) -> EventLoader

Return a loader for this format.

create_trade_source

create_trade_source(
    config: Any, ctx: RunContext
) -> TradeSource

Return a trade source for this format.

create_writer

create_writer(
    config: Any, ctx: RunContext
) -> DataWriter | None

Return a writer for this format, or None if unsupported.

compute_depth

compute_depth(
    events: DataFrame,
    config: Any,
    source: Any,
    ctx: RunContext,
) -> tuple[pd.DataFrame, pd.DataFrame] | None

Return (depth, depth_summary) to override the standard depth pipeline, or None to use it.

config_defaults

config_defaults() -> dict[str, Any]

Return default :class:PipelineConfig overrides for this format.

required_context

required_context() -> list[str]

:class:RunContext field names this format requires.

E.g. LOBSTER returns ["trading_date"] because its filenames carry no date; Bitstamp returns []. Lets the CLI/pipeline validate required context generically instead of special-casing format names. Callers should treat a missing method as [] (structural default).