What is it about?
Phishing is a type of online scam that uses fake websites to trick people into disclosing personal information, such as passwords, credit card details, or login credentials. While security tools powered by artificial intelligence (AI) have improved at identifying phishing webpages, attackers are also becoming more sophisticated. They now use advanced techniques to fool even the best detection systems. In this study, we introduce a tool called PhishOracle, which generates realistic, deceptive phishing webpages. These phishing webpages are carefully crafted to mimic legitimate ones by mixing in common signs of phishing, which makes them more convincing to both humans and AI models. We tested PhishOracle against three popular phishing detection models. The results showed that all these models struggled to correctly detect these newly generated phishing webpages. Even though a multimodal large language model (MLLM) performed better, it still missed some threats. To understand how real people might respond to these phishing webpages, we also conducted a user study. The outcome of which showed that many users were fooled into flagging these phishing webpages as legitimate ones. Our research highlights the urgent need for better, more adaptive security solutions that can stay ahead of cybercriminals.
Featured Image
Why is it important?
Our work is important because phishing continues to be one of the most common and dangerous cybersecurity threats. While detection models have improved, they can still be evaded by well-crafted phishing webpages. Our research shows that adversaries can now bypass these models by generating deceptive webpages that blend into legitimate ones. PhishOracle randomly selects content and visual-based phishing features to generate adversarial phishing webpages for a given legitimate webpage. The features of the tool that randomly select phishing features help to generate different versions of phishing webpages for a given legitimate webpage. By using PhishOracle, we were able to expose weaknesses in leading phishing detection models, including those that use deep learning and advanced visual analysis. Understanding these vulnerabilities is critical for building stronger, more reliable security tools. Our study provides both a method to test the limits of current technology and a warning about how attackers may soon operate. As phishing attacks continue to evolve, this research can help the cybersecurity community stay one step ahead.
Perspectives
From my perspective, what makes this work personally rewarding is how it bridges the gap between AI-based security models and real-world threats. Many detection models perform good, but this research puts them to the test under more realistic and challenging conditions. We didn’t want to just build another detection system, we wanted to stress-test the ones already out there and see how they handle adaptive attacks. The results are sobering but necessary. Even the best models faltered, and many users were still tricked. This project has made me rethink how we evaluate the effectiveness of cybersecurity tools, and I hope it inspires others to test AI systems not just for accuracy, but for resilience against adversaries who are always evolving.
Mr. Aditya Kulkarni
Indian Institute of Technology Dharwad
Read the Original
This page is a summary of: From ML to LLM: Evaluating the Robustness of Phishing Web Page Detection Models against Adversarial Attacks, Digital Threats Research and Practice, June 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3737295.
You can read the full text:
Contributors
The following have contributed to this page







