ROBOTune: High-Dimensional Configuration Tuning for Cluster-Based Data Analytics

Md Muhib Khan; Weikuan Yu

doi:10.1145/3472456.3472518

What is it about?

Spark is currently the framework of choice for running data analytics workloads. However, the default configuration is shown to be significantly sub-optimal. Furthermore, because of the high number of tunable parameters, finding the optimal configuration for a workload is non-trivial. We employ a Random Forests model to identify application-specific important parameters and then tune them using Bayesian Optimization to overcome this challenge. Also, we include well-performing configurations from previous tuning sessions to speed up the search process. Our evaluation with an extensive set of analytics workloads demonstrates that ROBOTune finds configurations that perform better on average while significantly improving search cost and search speed.

Photo by Tim Mossholder on Unsplash

Why is it important?

Deploying analytics workloads with a near-optimal configuration saves a huge amount of cluster time and resources. Thus, finding well-performing configurations, especially for recurring applications, is quite important. Our work aims at reducing the tuning cost and speed while suggesting well-performing configurations, which is critical for the real-world adoption of automated tuning of analytics workloads.

This page is a summary of: ROBOTune: High-Dimensional Configuration Tuning for Cluster-Based Data Analytics, August 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3472456.3472518.
You can read the full text:

Read

Resources

Presentation
ICPP'21 presentation
The conference presentation slides for ICPP'21.

Contributors

The following have contributed to this page

Md Muhib Khan

ROBOTune: Configuration Tuning for Spark-based Analytics Workloads

What is it about?

Why is it important?

Resources

ICPP'21 presentation

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

ROBOTune: Configuration Tuning for Spark-based Analytics Workloads

What is it about?

Featured Image

Why is it important?

Read the Original

Resources

ICPP'21 presentation

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management