HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis

Chidimma Opara; Bo Wei; Yingke Chen

doi:10.1109/ijcnn48605.2020.9207707

What is it about?

Phishing websites are fake webpages designed to trick people into giving away sensitive information such as passwords, banking details, or personal data. Many existing detection tools rely on known bad web addresses, manually selected warning signs, or rules that need constant updating as attackers change their methods. This publication presents HTMLPhish, an AI-based approach that looks directly at the HTML code behind a webpage. Instead of depending on a fixed list of suspicious features, the system learns patterns from the page itself, including its words, characters, links, tables, images, and structure. This allows it to recognise signs of phishing even when a webpage has not been seen before. The work shows that deep learning can help detect phishing webpages accurately, quickly, and in a way that could be used on the client side, such as in a browser-based tool. Because the method focuses on HTML structure rather than human language alone, it can also support phishing detection across webpages written in different languages.

Why is it important?

This work is unique because it shows that phishing webpages can be detected using only the raw HTML content of the page, without relying heavily on manually designed features or known phishing URLs. The approach combines both word-level and character-level patterns, helping the model recognise both the visible and hidden structures that attackers use when building fake websites. The work is timely because phishing remains one of the most common and damaging cyber threats, and attackers can now create convincing fake webpages quickly and cheaply. Traditional blacklist-based tools can struggle with new or fast-changing phishing sites, especially when those sites have not yet been reported. The difference this work could make is practical: it provides a foundation for faster, more adaptable, and potentially browser-based phishing detection. This could help protect users before they enter sensitive information, while also reducing the need for security teams to constantly redesign hand-crafted detection rules.

Perspectives

For me, this publication was about tackling a real-world cyber security problem in a more practical and adaptable way. I wanted to explore whether deep learning could move phishing detection beyond manually chosen warning signs and instead learn directly from the webpage itself. What I find especially important about this work is its focus on deployability. Phishing affects everyday users, not just technical experts, so detection methods need to be accurate, fast, and capable of working in real-world settings. This paper helped shape my wider research interest in using AI to build security tools that are not only technically strong, but also useful, scalable, and relevant to the people they are designed to protect.
Dr Chidimma Opara
Teesside University

This page is a summary of: HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis, July 2020, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/ijcnn48605.2020.9207707.
You can read the full text:

Read

Contributors

The following have contributed to this page

Dr Chidimma Opara
Teesside University

Using AI to Spot Phishing Web Pages from Their HTML Code

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Using AI to Spot Phishing Web Pages from Their HTML Code

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management