datareactor package

Top-level package for DataReactor.

Classes

Dataset(path_to_dataset)

Read and write metad datasets.

DataReactor([atoms, sieve])

Transform datasets by generating derived columns.

DerivedColumn

Derived column with known lineage.

class datareactor.Dataset(path_to_dataset)[source]

Bases: object

Read and write metad datasets.

The Dataset object provides methods for reading datasets, adding derived columns, and writing datasets.

Methods

add_columns(columns)

Add the derived columns to the dataset.

export(path_to_dataset)

Write the dataset to disk.

path_to_dataset

The path to the dataset files.

metadata

The metad.MetaData object.

tables

A dictionary mapping table names to dataframes.

add_columns(columns)[source]

Add the derived columns to the dataset.

This modifies the tables and metadata in-place and adds the specified derived columns.

Parameters

columns (list of DerivedColumn) – The derived columns.

export(path_to_dataset)[source]

Write the dataset to disk.

This writes the dataset to the target directory. It creates the tables as CSV files and writes the metadata as a JSON file.

Parameters

path_to_dataset (str) – The path to the dataset.

class datareactor.DataReactor(atoms=None, sieve=None)[source]

Bases: object

Transform datasets by generating derived columns.

The DataReactor class provides methods for transforming relational datasets by creating derived columns with known lineage.

Methods

transform(source, destination)

Read, transform, and write the dataset.

atoms

A list of Atom objects to apply to generate columns.

sieve

The sieve to use to filter columns.

transform(source, destination)[source]

Read, transform, and write the dataset.

This function reads the dataset from the source location, generates derived columns using the atoms, filters the derived columns using a sieve, and writes the modified dataset to the destination location.

Parameters
  • source (str) – The dataset path to read from.

  • destination (str) – The dataset path to write to.

class datareactor.DerivedColumn[source]

Bases: object

Derived column with known lineage.

The DerivedColumn represents a derived column which belongs to the specified table.

table_name

The name of the table the column will belong to.

Type

str

values

A series of values which can be added to the table.

Type

list

field

A field object which specifies the name and data type.

Type

dict

constraint

An optional constraint object.

Type

dict, None

constraint = None
field = None
table_name = None
values = None