Skip to content

Data I/O

Parquet serialization and writer registry.

load_data

load_data(path: str | Path) -> dict[str, pd.DataFrame]

Load pre-processed pipeline data from a Parquet directory or pickle file.

Parameters:

Name Type Description Default
path str or Path

If path is a directory, each .parquet file inside is loaded as a DataFrame keyed by its stem (events.parquet"events"). If path is a single file with a .pkl / .pickle extension, it is loaded via :func:pandas.read_pickle for backward compatibility (not recommended for untrusted data).

required

Returns:

Type Description
dict of str to pandas.DataFrame

save_data

save_data(
    lob_data: dict[str, DataFrame],
    path: str | Path,
    *,
    fmt: str = "parquet",
    writer: DataWriter | None = None,
    config: Any = None,
    ctx: Any = None,
    **write_kwargs: Any,
) -> None

Save pipeline data to disk.

Parameters:

Name Type Description Default
lob_data dict of str to pandas.DataFrame

The DataFrames to save (keys become file stems).

required
path str or Path

Destination directory (Parquet) or file (pickle).

required
fmt str

Serialisation format. Built-in values are "parquet" (default) and "pickle". Additional formats (e.g. "bitstamp", "lobster") are available when the corresponding writer factory has been registered via :func:register_writer.

'parquet'
writer DataWriter

A pre-constructed writer instance. When provided, fmt is ignored and the writer is used directly. This is the preferred path when saving from a :class:Pipeline that already holds a configured writer.

None
config Any

Forwarded to a registered writer factory when fmt names one. ctx defaults to an empty :class:~ob_analytics.protocols.RunContext.

None
ctx Any

Forwarded to a registered writer factory when fmt names one. ctx defaults to an empty :class:~ob_analytics.protocols.RunContext.

None
**write_kwargs Any

Extra keyword arguments forwarded to writer.write().

{}

register_writer

register_writer(name: str, factory: WriterFactory) -> None

Register a writer factory under name for use with save_data(fmt=name, ctx=...).

The factory is called as factory(config, ctx) and must return a :class:DataWriter. This is what lets format-specific writers (e.g. :class:~ob_analytics.lobster.LobsterWriter, which needs trading_date) participate in the registry — they pull required parameters from the :class:~ob_analytics.protocols.RunContext.

list_writers

list_writers() -> list[str]

Return a sorted list of registered writer names.