datareactor.atoms package¶
Classes
Add a random set of columns. |
|
Apply aggregation functions to child rows. |
|
Generate derived columns for a dataset. |
|
Generate derived columns with featuretools. |
|
Count the number of child rows. |
-
class
datareactor.atoms.AddNumericalAtom[source]¶ Bases:
datareactor.atoms.base.AtomAdd a random set of columns.
The AddNumericalAtom generates a derived column which contains the sum of a random set of numerical columns in the same table.
Methods
derive(dataset, table_name)Add a column containing random values.
-
class
datareactor.atoms.AggregationAtom[source]¶ Bases:
datareactor.atoms.base.AtomApply aggregation functions to child rows.
The AggregationAtom generates derived columns which are the resultt of applying aggregation functions to groups of child rows.
Methods
derive(dataset, table_name)Apply pandas aggregation functions to groups of rows.
-
class
datareactor.atoms.Atom[source]¶ Bases:
objectGenerate derived columns for a dataset.
Each Atom is responsible for generating one or more derived columns for the target table.
Methods
derive(dataset, table_name)Generate derived columns for the specified table.
transform(dataset)Generate derived columns for the dataset.
-
derive(dataset, table_name)[source]¶ Generate derived columns for the specified table.
The derive function takes in a dataset and the name of the target column. It returns a list of derived columns which can be concatenated to the target table.
- Parameters
dataset (Dataset) – The dataset.
table_name (str) – The name of the target table.
- Returns
The derived columns.
- Return type
(
listofDerivedColumn)
-
-
class
datareactor.atoms.FeatureToolsAtom[source]¶ Bases:
datareactor.atoms.base.AtomGenerate derived columns with featuretools.
The FeatureToolsAtom generates derived columns based on the features generated by the featuretools library.
Methods
derive(dataset, table_name)Generate features with featuretools.
-
derive(dataset, table_name)[source]¶ Generate features with featuretools.
Note that featuretools does not support all of the types of relational datasets supported by Metadata.JSON.
- Parameters
dataset (Dataset) – The dataset.
table_name (str) – The name of the target table.
- Returns
The derived columns.
- Return type
(
listofDerivedColumn)
-
-
class
datareactor.atoms.RowCountAtom[source]¶ Bases:
datareactor.atoms.base.AtomCount the number of child rows.
The RowCountAtom generates derived columns which specify the number of rows in each of the child columns.
TODO: This causes a segmentation fault on Seznam /AdventureWorks2014.
Methods
derive(dataset, table_name)Count the number of rows in each child table.
-
derive(dataset, table_name)[source]¶ Count the number of rows in each child table.
This function generates a derived column for each child table which contains the number of rows in each group.
For example, if the target table is users, then it might generate a derived column containing the number of rows in the transaction table that belongs to each user.
- Parameters
dataset (Dataset) – The dataset.
table_name (str) – The name of the target table.
- Returns
The derived columns.
- Return type
(
listofDerivedColumn)
-