datareactor.atoms package¶
Classes
Add a random set of columns. |
|
Apply aggregation functions to child rows. |
|
Generate derived columns for a dataset. |
|
Generate derived columns with featuretools. |
|
Count the number of child rows. |
-
class
datareactor.atoms.
AddNumericalAtom
[source]¶ Bases:
datareactor.atoms.base.Atom
Add a random set of columns.
The AddNumericalAtom generates a derived column which contains the sum of a random set of numerical columns in the same table.
Methods
derive
(dataset, table_name)Add a column containing random values.
-
class
datareactor.atoms.
AggregationAtom
[source]¶ Bases:
datareactor.atoms.base.Atom
Apply aggregation functions to child rows.
The AggregationAtom generates derived columns which are the resultt of applying aggregation functions to groups of child rows.
Methods
derive
(dataset, table_name)Apply pandas aggregation functions to groups of rows.
-
class
datareactor.atoms.
Atom
[source]¶ Bases:
object
Generate derived columns for a dataset.
Each Atom is responsible for generating one or more derived columns for the target table.
Methods
derive
(dataset, table_name)Generate derived columns for the specified table.
transform
(dataset)Generate derived columns for the dataset.
-
derive
(dataset, table_name)[source]¶ Generate derived columns for the specified table.
The derive function takes in a dataset and the name of the target column. It returns a list of derived columns which can be concatenated to the target table.
- Parameters
dataset (Dataset) – The dataset.
table_name (str) – The name of the target table.
- Returns
The derived columns.
- Return type
(
list
ofDerivedColumn
)
-
-
class
datareactor.atoms.
FeatureToolsAtom
[source]¶ Bases:
datareactor.atoms.base.Atom
Generate derived columns with featuretools.
The FeatureToolsAtom generates derived columns based on the features generated by the featuretools library.
Methods
derive
(dataset, table_name)Generate features with featuretools.
-
derive
(dataset, table_name)[source]¶ Generate features with featuretools.
Note that featuretools does not support all of the types of relational datasets supported by Metadata.JSON.
- Parameters
dataset (Dataset) – The dataset.
table_name (str) – The name of the target table.
- Returns
The derived columns.
- Return type
(
list
ofDerivedColumn
)
-
-
class
datareactor.atoms.
RowCountAtom
[source]¶ Bases:
datareactor.atoms.base.Atom
Count the number of child rows.
The RowCountAtom generates derived columns which specify the number of rows in each of the child columns.
TODO: This causes a segmentation fault on Seznam /AdventureWorks2014.
Methods
derive
(dataset, table_name)Count the number of rows in each child table.
-
derive
(dataset, table_name)[source]¶ Count the number of rows in each child table.
This function generates a derived column for each child table which contains the number of rows in each group.
For example, if the target table is users, then it might generate a derived column containing the number of rows in the transaction table that belongs to each user.
- Parameters
dataset (Dataset) – The dataset.
table_name (str) – The name of the target table.
- Returns
The derived columns.
- Return type
(
list
ofDerivedColumn
)
-