What is it about?
Cybersecurity researchers rely heavily on system and security logs to study attacks, develop detection systems, and evaluate security tools. However, obtaining realistic security logs is challenging because real logs often contain sensitive information, are difficult to share, and may not cover a wide variety of attack scenarios. This work introduces LogGenAgent, an AI-powered framework that generates realistic synthetic security logs. The system focuses on multi-host, multi-stage environments where events occur across multiple machines and different phases of an attack. Unlike traditional approaches that generate logs in a single step, LogGenAgent follows an agent-based workflow. It first generates logs using a Large Language Model (LLaMA 3.1) and then evaluates the generated logs using several complementary quality checks. The framework measures semantic similarity, statistical properties, structural consistency, workflow correctness, and distinguishability from real logs. Based on the evaluation results, the system automatically refines its generation strategy and produces improved logs in subsequent iterations. This evaluation-driven refinement process helps the generated logs better match the characteristics of real-world security data. Experiments using authentication and system logs collected from a controlled laboratory environment show that LogGenAgent produces synthetic logs that are closer to real logs than traditional one-shot generation methods. The generated logs preserve important characteristics such as event distributions, workflow patterns, and log templates while remaining difficult to distinguish from real data. The proposed framework can help researchers, educators, and security practitioners obtain realistic cybersecurity datasets without exposing sensitive operational information.
Featured Image
Photo by Jake Walker on Unsplash
Why is it important?
High-quality cybersecurity datasets are essential for developing intrusion detection systems, threat hunting tools, and security analytics platforms. Unfortunately, real security logs are often confidential, difficult to share, and expensive to collect. This creates a major challenge for researchers attempting to reproduce experiments and compare new security techniques. LogGenAgent addresses this problem by providing a systematic approach for generating realistic synthetic security logs. The framework combines large language models with an agentic feedback loop that continuously evaluates and improves generated outputs. Rather than relying solely on text similarity, the system considers statistical behavior, workflow consistency, structural patterns, and machine-learning-based distinguishability tests. The resulting logs are more realistic and better preserve the characteristics of real-world environments. This work contributes toward creating privacy-preserving cybersecurity datasets that can support research, education, benchmarking, and future intrusion detection studies. As AI-generated cybersecurity data becomes increasingly important, LogGenAgent offers a practical and scalable solution for generating realistic multi-host, multi-stage security logs.
Perspectives
While working on cybersecurity datasets, we repeatedly encountered the same challenge: realistic multi-host security logs are difficult to obtain and even harder to share. Many publicly available datasets are either outdated, incomplete, or generated from simplified scenarios. This motivated us to explore whether recent advances in large language models could help generate realistic security logs while preserving important behavioral patterns. Developing LogGenAgent taught us that realistic log generation requires more than producing text that looks correct. The generated logs must also preserve workflows, event distributions, structural templates, and relationships between activities occurring across different hosts. This realization led us to design an agentic framework that continuously evaluates and refines its own outputs. We believe synthetic cybersecurity data will play an increasingly important role in security research, education, and system evaluation. Our long-term goal is to use realistic synthetic multi-host logs to support intrusion detection research and help the community build more robust security systems.
Jatin Mudiraj
Indian Institute of Technology Ropar
Read the Original
This page is a summary of: Poster - LogGenAgent: An Agentic Synthetic Security Log Generation System, June 2026, ACM (Association for Computing Machinery),
DOI: 10.1145/3812835.3814865.
You can read the full text:
Resources
Contributors
The following have contributed to this page







