datareactor package¶
Top-level package for DataReactor.
Classes
|
Read and write metad datasets. |
|
Transform datasets by generating derived columns. |
Derived column with known lineage. |
-
class
datareactor.
Dataset
(path_to_dataset)[source]¶ Bases:
object
Read and write metad datasets.
The Dataset object provides methods for reading datasets, adding derived columns, and writing datasets.
Methods
add_columns
(columns)Add the derived columns to the dataset.
export
(path_to_dataset)Write the dataset to disk.
-
path_to_dataset
¶ The path to the dataset files.
-
metadata
¶ The metad.MetaData object.
-
tables
¶ A dictionary mapping table names to dataframes.
-
add_columns
(columns)[source]¶ Add the derived columns to the dataset.
This modifies the tables and metadata in-place and adds the specified derived columns.
- Parameters
columns (
list
ofDerivedColumn
) – The derived columns.
-
-
class
datareactor.
DataReactor
(atoms=None, sieve=None)[source]¶ Bases:
object
Transform datasets by generating derived columns.
The DataReactor class provides methods for transforming relational datasets by creating derived columns with known lineage.
Methods
transform
(source, destination)Read, transform, and write the dataset.
-
atoms
¶ A list of Atom objects to apply to generate columns.
-
sieve
¶ The sieve to use to filter columns.
-
transform
(source, destination)[source]¶ Read, transform, and write the dataset.
This function reads the dataset from the source location, generates derived columns using the atoms, filters the derived columns using a sieve, and writes the modified dataset to the destination location.
- Parameters
source (str) – The dataset path to read from.
destination (str) – The dataset path to write to.
-
-
class
datareactor.
DerivedColumn
[source]¶ Bases:
object
Derived column with known lineage.
The DerivedColumn represents a derived column which belongs to the specified table.
-
table_name
¶ The name of the table the column will belong to.
- Type
str
-
values
¶ A series of values which can be added to the table.
- Type
list
-
field
¶ A field object which specifies the name and data type.
- Type
dict
-
constraint
¶ An optional constraint object.
- Type
dict
, None
-
constraint
= None
-
field
= None
-
table_name
= None
-
values
= None
-