datareactor.atoms package

Classes

AddNumericalAtom

Add a random set of columns.

AggregationAtom

Apply aggregation functions to child rows.

Atom

Generate derived columns for a dataset.

FeatureToolsAtom

Generate derived columns with featuretools.

RowCountAtom

Count the number of child rows.

class datareactor.atoms.AddNumericalAtom[source]

Bases: datareactor.atoms.base.Atom

Add a random set of columns.

The AddNumericalAtom generates a derived column which contains the sum of a random set of numerical columns in the same table.

Methods

derive(dataset, table_name)

Add a column containing random values.

derive(dataset, table_name)[source]

Add a column containing random values.

class datareactor.atoms.AggregationAtom[source]

Bases: datareactor.atoms.base.Atom

Apply aggregation functions to child rows.

The AggregationAtom generates derived columns which are the resultt of applying aggregation functions to groups of child rows.

Methods

derive(dataset, table_name)

Apply pandas aggregation functions to groups of rows.

derive(dataset, table_name)[source]

Apply pandas aggregation functions to groups of rows.

Returns

The derived columns.

Return type

(list of DerivedColumn)

class datareactor.atoms.Atom[source]

Bases: object

Generate derived columns for a dataset.

Each Atom is responsible for generating one or more derived columns for the target table.

Methods

derive(dataset, table_name)

Generate derived columns for the specified table.

transform(dataset)

Generate derived columns for the dataset.

derive(dataset, table_name)[source]

Generate derived columns for the specified table.

The derive function takes in a dataset and the name of the target column. It returns a list of derived columns which can be concatenated to the target table.

Parameters
  • dataset (Dataset) – The dataset.

  • table_name (str) – The name of the target table.

Returns

The derived columns.

Return type

(list of DerivedColumn)

transform(dataset)[source]

Generate derived columns for the dataset.

The transform function takes in a dataset and returns a sequence of derived columns.

Parameters

dataset (Dataset) – The dataset.

Returns

The derived columns.

Return type

(list of DerivedColumn)

class datareactor.atoms.FeatureToolsAtom[source]

Bases: datareactor.atoms.base.Atom

Generate derived columns with featuretools.

The FeatureToolsAtom generates derived columns based on the features generated by the featuretools library.

Methods

derive(dataset, table_name)

Generate features with featuretools.

derive(dataset, table_name)[source]

Generate features with featuretools.

Note that featuretools does not support all of the types of relational datasets supported by Metadata.JSON.

Parameters
  • dataset (Dataset) – The dataset.

  • table_name (str) – The name of the target table.

Returns

The derived columns.

Return type

(list of DerivedColumn)

class datareactor.atoms.RowCountAtom[source]

Bases: datareactor.atoms.base.Atom

Count the number of child rows.

The RowCountAtom generates derived columns which specify the number of rows in each of the child columns.

TODO: This causes a segmentation fault on Seznam /AdventureWorks2014.

Methods

derive(dataset, table_name)

Count the number of rows in each child table.

derive(dataset, table_name)[source]

Count the number of rows in each child table.

This function generates a derived column for each child table which contains the number of rows in each group.

For example, if the target table is users, then it might generate a derived column containing the number of rows in the transaction table that belongs to each user.

Parameters
  • dataset (Dataset) – The dataset.

  • table_name (str) – The name of the target table.

Returns

The derived columns.

Return type

(list of DerivedColumn)