Look before you leap: Detecting phishing web pages by exploiting raw URL and HTML characteristics

Chidimma Opara; Yingke Chen; Bo Wei

doi:10.1016/j.eswa.2023.121183

What is it about?

Phishing websites are fake webpages designed to trick people into sharing sensitive information such as passwords, banking details, or personal data. Many existing tools detect these websites by looking for known bad web addresses or by using manually selected warning signs. However, attackers constantly change their methods, which means these warning signs can quickly become outdated. This publication presents WebPhish, an AI-based model that looks directly at two important parts of a webpage: its web address, known as the URL, and the HTML code behind the page. Instead of relying on hand-crafted rules, the model learns patterns from the webpage itself. The aim is to detect phishing pages early, before users submit any personal information. By combining information from both the URL and the HTML content, WebPhish can make stronger decisions than methods that look at only one part of the webpage.

Photo by Kenny Eliason on Unsplash

Why is it important?

This work is unique because it combines both the raw URL and the raw HTML content of a webpage in one deep learning model. Many phishing detection methods rely on manually designed features, blacklist databases, or only one webpage component. WebPhish reduces this dependence by learning directly from webpage data. The work is timely because phishing remains one of the most common and damaging cyber threats. Attackers can now create convincing fake websites quickly, and traditional blacklists may not detect new phishing pages fast enough. The difference this work could make is practical. It provides a route towards faster and more adaptable phishing detection that could support real-time protection. The model also shows that using both URL and HTML information together gives a fuller picture of whether a webpage is suspicious, which could help security teams, browser tools, and anti-phishing systems protect users more effectively.

Perspectives

For me, this publication was about moving phishing detection closer to how real users experience risk online. A person does not usually see just a URL or just the webpage content; they encounter both together. That is why it was important to explore whether combining these signals could improve detection. What I find especially meaningful about this work is its practical focus. Phishing harms individuals, organisations, and communities, often before victims realise what has happened. This paper reflects my wider interest in building AI-based cyber security tools that are not only accurate, but also useful in real-world settings where threats are constantly changing.
Dr Chidimma Opara
Teesside University

This page is a summary of: Look before you leap: Detecting phishing web pages by exploiting raw URL and HTML characteristics, Expert Systems with Applications, February 2024, Elsevier,
DOI: 10.1016/j.eswa.2023.121183.
You can read the full text:

Read

Contributors

The following have contributed to this page

Dr Chidimma Opara
Teesside University

Detecting Phishing Websites Using URL and HTML Content

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Detecting Phishing Websites Using URL and HTML Content

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management