What is it about?

We introduce the idea that metadata, including project information, data labels, data characteristics and indications of valuable use, can be propagated through a data processing lineage graph. Further, finding examples of significant cooccurrence of propagated and original metadata gives us the basis of an interesting kind of search engine gives interesting recommendations of data given a problem statement even in a near cold-start situation.

Featured Image

Why is it important?

Finding useful data is one of the hardest and squishiest tasks in data scientist since it requires something beyond the literal description of data and projects. Essentially, it requires that a system understand what data really means and correlates that to what a user such as a data scientist actually intends to build. The meaning of data, however, is often undocumented. Even where it is documented, it is normally only documented from the point of view of the producer of the data, not the consumer. This is a problem since, to misquote Wittgenstein, the meaning of data is in its use. We show how such meaning through use can emerge even with very simple analysis and only peripheral use of a (not very) large language model.

Perspectives

This is yet another example where very simple techniques can give what appear to be very advanced results. This comes about because the methods chosen can consume many kinds of metadata omnivorously which is often difficult if an "advanced" but inflexible method is chosen ahead of time. Even though we show practical results in this work, there is a meta lesson that cannot be learned to often that simple methods often produce strong results with minimal mechanism. As such, these simple methods may well be the most practical choice.

Ted Dunning
Hewlett Packard Enterprise Co

Read the Original

This page is a summary of: From Roots to Fruits: Exploring Lineage for Dataset Recommendations, June 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3600046.3600053.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page