What is Data Tracer?

The data tracer project aims at identifying metadata using machine learning. Where did the data come from, where did it go, what transforms happened, types, connections, are all what we aspire to identify. To develop these machine learning based methods, we develop several machine learning pipelines and make it easy for an end user to use. Explore our several open source libraries, testbeds, benchmarking frameworks, contribute and become part of the community. And most of all, try and give us feedback.

Resources

Explore docs, papers, videos, tutorials. Join our community slack.

Papers

Community

Videos

Tutorials

Developer Docs

User guides

Trace

Learn a variety of machine learning models and use them to trace data origins, link them and discover.

Detect Keys

Primary Keys, Foreign Keys.

Detect Keys

Map Column Lineage

Identify which columns led to this column.

Map Column Lineage

Metadata

Capture all the metadata about your data in a structured way

Metadata

Create datasets

Create benchmarking datasets for evaluating ML based lineage techniques.

Create datasets

Benchmarking

A comprehensive benchmarking framework to assess different ML based lineage techniques.

Benchmarking

Concepts

Learn about different concepts that underpin Data Tracer, evaluation and usage through our tutorials.

What is data lineage?

How do you store and interact with metadata about your data?

How do you use machine learning to find data lineage?

How do you evaluate ML based lineage techniques?

Libraries

Explore our open source libraries, contribute and become part of the community.

Tracer

Data Lineage Tracing Library.

DataReactor

Augmenting relational datasets by generating derived columns with known lineage.

Metadata

Organized representation and validation of metadata.

Copyright © 2021 Data to AI Laboratory, Massachusetts Institute of Technology