TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data

Daniel Kang; John Guibas; Peter D. Bailis; Tatsunori Hashimoto; Matei Zaharia

doi:10.1145/3514221.3517897

What is it about?

Data analytics over unstructured data (videos, images, text, audio) is increasingly using machine learning (ML). Unfortunately, deploying ML is expensive. Thus, to reduce the cost of such queries, many recent systems (e.g., BlazeIt, NoScope, Tahoma, SUPG, etc.) train proxy models to approximate expensive target labelers (e.g., expensive ML models and human labeling services) for each query that needs to be answered. In this work, we present TASTI, which is a trainable semantic index which removes the need to train query-specific proxy models for each query. After the index is constructed, TASTI can generate high quality proxy models that can be used downstream to accelerate queries such as aggregation and selection over large datasets. TASTI's design is motivated by the fact that many queries are highly correlated and share underlying semantic information. For instance, answering a query that counts the number of cars should help us answer a different query involving finding red cars. This property is not leveraged by prior work which focuses on training a new proxy model from scratch for each query.

This page is a summary of: TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data, June 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3514221.3517897.
You can read the full text:

Read

Contributors

The following have contributed to this page

Daniel Kang
Stanford University

TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data

What is it about?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data

What is it about?

Featured Image

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management