Towards increasing reliability of Amazon EC2 spot instances with a fault-tolerant multi-agent architecture

José Pergentino Araújo Neto; Donald M. Pianto; Célia Ghedini Ralha

doi:10.3233/mgs-190312

What is it about?

Cloud providers have recently offered their unused resources as transient instances. Amazon sells idle cloud resources as spot instances pricing by an auction-based market mechanism to reduce the cost without any availability guarantee. Thus, to dynamically and autonomously manage cloud resources to execute user applications ensuring greater reliability with cheaper spot instances is an open problem. In this context, we propose a fault-tolerant multi-agent architecture as middleware of cloud providers and users to mediate access to a wide range of heterogeneous resources providing a resilient application execution environment with a dynamic flexible fault-tolerant mechanism based on adaptive checkpointing. Our architecture combines a case-based reasoning model with a survival analysis model to predict failure events and refine fault-tolerant plans with adequate parameters to increase reliability optimizing total execution time and costs. We evaluated the proposed architecture with real historical data collected from Amazon EC2 price changes including, with approximately 21 million records and generating 1,362,816 scenarios stored in our case knowledge database. The results considering the time to revocation achieved high levels of accuracy (98%) with a gain up to 74.48% to total execution time, reducing total cost when compared to other approaches in the literature.

Photo by Ilya Pavlov on Unsplash

Why is it important?

we propose a fault-tolerant multi-agent architecture as middleware of cloud providers and users to mediate access to a wide range of heterogeneous resources providing a resilient application execution environment with a dynamic flexible fault-tolerant mechanism based on adaptive checkpointing fault tolerance technique.

Perspectives

To contribute to the cloud computing research.
Pergentino Araujo
Universidade de Brasilia

This page is a summary of: Towards increasing reliability of Amazon EC2 spot instances with a fault-tolerant multi-agent architecture, Multiagent and Grid Systems, October 2019, IOS Press,
DOI: 10.3233/mgs-190312.
You can read the full text:

Read

Resources

Project
BRA2Cloud
This project investigates the application of agent-based architectures to create a resilient environment using unsecured transient servers to offer trusted services or run applications using Cloud Computing idle resources. Exploring idle resources is an efficient way to save energy and money (e.g., reuse unused CPU and memory to provide services and run applications). The BRA2Cloud architecture combines machine learning and a statistical model to predict instance survival time and helps to refine fault tolerance parameters to provide trusted services, reducing monetary cost. This model compiles and analyses Amazon EC2 Spot Instances’ historic price change data to predict revocation events. Our agents pursue an efficient usage of Spot Instances, providing a novel resilient environment between users and cloud resources, through machine learning, to predict revocation events and define suitable Fault Tolerance mechanisms with their respective parameters. This is a key step toward successful and efficient usage of these instances to provide trusted services with minimal interruptions at cheapest prices. Experiments indicate that this model can be used under realistic working conditions with better use of idle resources.

Contributors

The following have contributed to this page

Pergentino Araujo
Universidade de Brasilia

Increasing reliability of Amazon EC2 spot instances with a fault-tolerant multi-agent architecture

What is it about?

Why is it important?

Perspectives

Resources

BRA2Cloud

Contributors

You might also like

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Increasing reliability of Amazon EC2 spot instances with a fault-tolerant multi-agent architecture

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

BRA2Cloud

Contributors

Share this page:

You might also like

Local motion phases for learning multi-contact character movements

Relational incentives theory.

The restricted hull operator of M-fuzzifying convex structures1

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management