What is it about?

Cloud applications are often implemented as distributed services that call each other, creating complex application call graphs. Tracking such call graphs is crucial to diagnose and resolve performance issues. This paper presents DyMonD, a holistic framework that dynamically monitors the software layer of the cloud network to track dependencies between application components and derive performance metrics. It adapts a deep learning model to identify the service type of each component, and visualizes all information in form of a call graph. Our evaluation results confirm that DyMonD can infer the proper call graph and identify the services at run-time with acceptable overhead and good accuracy.

Featured Image

Why is it important?

The increasing complexity of distributed service-oriented cloud applications requires advanced application monitoring. This includes showing how the different components invoke each other in form of a call graph, identifying the service type each component provides (e.g., a MySQL database or a web-service), and providing some component-level performance metrics like throughput. Having such a global view of the application at run-time can help detect performance bottlenecks, understand how the current workload impacts inter-component dependency and/or discover mis-configurations that might cause poor performance or outages (e.g., a caching service is called less than expected).. In this paper, we demonstrate DyMonD, a dynamic network- based framework for call graph discovery and visualization that utilizes the software layer of the network to observe the connections between different application components and auttomatically determines their communication structure. DyMonD starts and stops monitoring dynamically on demand, does not require platform or application instrumentation, and has very low overhead. DyMonD employs a novel deep learning model to classify service types through the packet data without relying on static configurations or deep knowledge of the services. In contrast to other deep-learning based solutions, DyMonD provides excellent prediction results even if messages are encrypted or if monitoring starts some time after connection setup. DyMonD offers optionally additional information for unencrypted web-services through deep packet inspection and natural language processing. Furthermore, DyMonD reports some application-level performance metrics such as response time and throughput.

Perspectives

While other several approaches tracking the call graph of distributed application at the run-time through software instrumentation or static configuration for the individual services, DyMonD offers similar functionality but as a service; i.e. without having any knowledge about the application, platform or the services running. While service identification is a rather straightforward process for the application call graph monitoring tools that use software instrumentation, this is more challenge when DyMonD only has access to the message flows between the application components. DyMonD employs a novel deep learning model based on Bidirectional LSTM to perform network-based service identification over the captured network flows.

Mona Elsaadawy
McGill University

Read the Original

This page is a summary of: DyMonD, December 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3491086.3492471.
You can read the full text:

Read

Contributors

The following have contributed to this page