503

What is Data Tracer?

The data tracer project aims at identifying metadata using machine learning. Where did the data come from, where did it go, what transforms happened, types, connections, are all what we aspire to identify. To develop these machine learning based methods, we develop several machine learning pipelines and make it easy for an end user to use. Explore our several open source libraries, testbeds, benchmarking frameworks, contribute and become part of the community. And most of all, try and give us feedback.

Resources

Explore docs, papers, videos, tutorials. Join our community slack.

Trace

Learn a variety of machine learning models and use them to trace data origins, link them and discover.

Concepts

Learn about different concepts that underpin Data Tracer, evaluation and usage through our tutorials.

What is data lineage?

How do you store and interact with metadata about your data?

How do you use machine learning to find data lineage?

How do you evaluate ML based lineage techniques?

Libraries

Explore our open source libraries, contribute and become part of the community.

Tracer

Tracer

Data Lineage Tracing Library.

DataReactor

DataReactor

Augmenting relational datasets by generating derived columns with known lineage.

Metadata

Metadata

Organized representation and validation of metadata.