datareactor package¶
Top-level package for DataReactor.
Classes
|
Read and write metad datasets. |
|
Transform datasets by generating derived columns. |
Derived column with known lineage. |
-
class
datareactor.Dataset(path_to_dataset)[source]¶ Bases:
objectRead and write metad datasets.
The Dataset object provides methods for reading datasets, adding derived columns, and writing datasets.
Methods
add_columns(columns)Add the derived columns to the dataset.
export(path_to_dataset)Write the dataset to disk.
-
path_to_dataset¶ The path to the dataset files.
-
metadata¶ The metad.MetaData object.
-
tables¶ A dictionary mapping table names to dataframes.
-
add_columns(columns)[source]¶ Add the derived columns to the dataset.
This modifies the tables and metadata in-place and adds the specified derived columns.
- Parameters
columns (
listofDerivedColumn) – The derived columns.
-
-
class
datareactor.DataReactor(atoms=None, sieve=None)[source]¶ Bases:
objectTransform datasets by generating derived columns.
The DataReactor class provides methods for transforming relational datasets by creating derived columns with known lineage.
Methods
transform(source, destination)Read, transform, and write the dataset.
-
atoms¶ A list of Atom objects to apply to generate columns.
-
sieve¶ The sieve to use to filter columns.
-
transform(source, destination)[source]¶ Read, transform, and write the dataset.
This function reads the dataset from the source location, generates derived columns using the atoms, filters the derived columns using a sieve, and writes the modified dataset to the destination location.
- Parameters
source (str) – The dataset path to read from.
destination (str) – The dataset path to write to.
-
-
class
datareactor.DerivedColumn[source]¶ Bases:
objectDerived column with known lineage.
The DerivedColumn represents a derived column which belongs to the specified table.
-
table_name¶ The name of the table the column will belong to.
- Type
str
-
values¶ A series of values which can be added to the table.
- Type
list
-
field¶ A field object which specifies the name and data type.
- Type
dict
-
constraint¶ An optional constraint object.
- Type
dict, None
-
constraint= None
-
field= None
-
table_name= None
-
values= None
-