What is it about?

Modern organisations depend on complex digital systems, from office IT networks to industrial control systems that run factories, power grids, and water plants. Keeping these systems secure requires regular vulnerability assessments and penetration testing (VAPT), where defenders deliberately try to find and exploit weaknesses before attackers do. Today, much of this work is slow, manual, and difficult to scale, especially as systems grow larger and more interconnected. This research reviews recent advances in Agentic AI (AI systems that can plan and act autonomously) and Generative AI (such as large language models that can reason over text, logs, and configurations). By analysing 72 studies published between 2020 and 2025, the paper shows that each approach has strengths but also clear limits when used alone. Agentic AI is good at deciding “what to test next” and adapting based on results, while generative AI excels at understanding unstructured data, explaining findings, and producing reports. The paper brings these ideas together by proposing a unified framework where AI agents make decisions, generative models interpret evidence, tools are accessed through controlled interfaces, and governance mechanisms ensure safety and accountability. In simple terms, it explains how future cyber security testing tools could behave more like skilled human testers—while remaining auditable, policy‑constrained, and safe to use even in sensitive industrial environments.

Featured Image

Why is it important?

This work is timely because cyber attacks are increasingly automated, fast‑moving, and targeted at critical infrastructure. Traditional security testing struggles to keep up, particularly outside standard IT networks. What is unique about this paper is that it moves beyond isolated AI “point solutions” and instead defines a whole-system view of how autonomous cyber security testing could work responsibly. The paper introduces: - A four‑layer reference model that clearly separates decision‑making, reasoning, tool execution, and governance. - A benchmark for measuring not just whether AI finds vulnerabilities, but whether it acts safely, follows rules, and can explain its decisions. - A strong focus on industrial and cyber‑physical systems, where unsafe testing could cause real‑world harm. Together, these contributions help researchers, practitioners, and regulators think more clearly about how AI can be used to improve cyber security without creating new risks.

Perspectives

Working on this paper highlighted how fragmented current research on AI‑driven security testing still is. Many impressive tools exist, but they rarely connect into a trustworthy end‑to‑end system. A key motivation for this work was to bridge computer science, engineering, and governance perspectives—especially for operational technology and industrial systems, where safety and accountability matter as much as technical performance. I hope this paper helps shift the discussion from “Can AI do penetration testing?” to “How can we design AI‑driven testing that people can trust, audit, and safely deploy in the real world?” If it encourages closer collaboration between AI researchers, security engineers, and policy makers, it will have achieved its goal.

Prof Tatiana Kalganova
Brunel University

Read the Original

This page is a summary of: Agentic and Generative AI for Intelligent Autonomous Vulnerability Assessment and Penetration Testing: A Systematic Analysis, January 2026, Elsevier,
DOI: 10.2139/ssrn.6804400.
You can read the full text:

Read

Contributors

The following have contributed to this page