datareactor.dataset module

Classes

Dataset(path_to_dataset)

Read and write metad datasets.

DerivedColumn

Derived column with known lineage.

class datareactor.dataset.Dataset(path_to_dataset)[source]

Bases: object

Read and write metad datasets.

The Dataset object provides methods for reading datasets, adding derived columns, and writing datasets.

Methods

add_columns(columns)

Add the derived columns to the dataset.

export(path_to_dataset)

Write the dataset to disk.

path_to_dataset

The path to the dataset files.

metadata

The metad.MetaData object.

tables

A dictionary mapping table names to dataframes.

add_columns(columns)[source]

Add the derived columns to the dataset.

This modifies the tables and metadata in-place and adds the specified derived columns.

Parameters

columns (list of DerivedColumn) – The derived columns.

export(path_to_dataset)[source]

Write the dataset to disk.

This writes the dataset to the target directory. It creates the tables as CSV files and writes the metadata as a JSON file.

Parameters

path_to_dataset (str) – The path to the dataset.

class datareactor.dataset.DerivedColumn[source]

Bases: object

Derived column with known lineage.

The DerivedColumn represents a derived column which belongs to the specified table.

table_name

The name of the table the column will belong to.

Type

str

values

A series of values which can be added to the table.

Type

list

field

A field object which specifies the name and data type.

Type

dict

constraint

An optional constraint object.

Type

dict, None

constraint = None
field = None
table_name = None
values = None