Towards benchmark datasets for machine learning based website phishing detection: An experimental study

Abdelhakim Hannousse; Salima Yahiouche

doi:10.1016/j.engappai.2021.104347

What is it about?

The increasing popularity of the Internet led to a substantial growth of e-commerce. However, such activities have main security challenges primary caused by cyberfraud and identity theft. Therefore, checking the legitimacy of visited web pages is a crucial task to secure costumers’ identities and prevent phishing attacks. The use of machine learning is widely recognized as a promising solution. The literature is rich with studies that use machine learning techniques for website phishing detection. However, their findings are dataset dependent and are far away from generalization. Two main reasons for this unfortunate state are the impracticable replication and absence of appropriate benchmark datasets for fair evaluation of systems. Moreover, phishing tactics are continuously evolving and proposed systems are not following those rapid changes. In this paper, we present a general scheme for building reproducible and extensible datasets for website phishing detection. The aim is to (1) enable comparison of systems adopting different features, (2) overtake the short-lived nature of phishing websites, and (3) keep track of the evolution of phishing tactics. For experimenting the proposed scheme, we start by adopting a refined categorization of website phishing features and we systematically select a total of 87 commonly recognized ones, we categorize them, and we made them subjects for relevance and runtime analysis. We use the collected set of features to build a dataset in light of the proposed scheme. Thereafter, we use a conceptual replication approach to check the genericity of former findings for the built dataset. Specifically, we evaluate the performance of classifiers on individual and combined categories of features, we investigate different combinations of models, and we explore the effects of filter and wrapper methods on the selection of discriminative features.

This page is a summary of: Towards benchmark datasets for machine learning based website phishing detection: An experimental study, Engineering Applications of Artificial Intelligence, September 2021, Elsevier,
DOI: 10.1016/j.engappai.2021.104347.
You can read the full text:

Read

Contributors

The following have contributed to this page

Pr. Abdelhakim Hannousse
Universite 8 Mai 1945 Guelma

Towards benchmark datasets for machine learning based website phishing detection: An experimental study

What is it about?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Towards benchmark datasets for machine learning based website phishing detection: An experimental study

What is it about?

Featured Image

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management