What is it about?

The increased number of cyber threats and the growing of websites pages lead to targeting them, that why is very necessary for developing an effective techniques in detecting and mitigating malicious website pages. To detect such threats, it propose an on-line system by using Python 3.7 and utilization of the Naive Bayesian algorithm as a powerful tool for identifying and categorizing potentially harmful web pages. This algorithm leverages a combination of statistical analysis and machine learning principles to analyze various features and attributes of web content, thereby determining their likelihood of being malicious. To achieve such system, at first it construct a comprehensive dataset comprising both malicious and legitimate website pages. The second step is extracting the relevant features like URL information from these pages (spelling mistakes, unusual characters and the strange domain names). These features are used in Training Naive Bayesian classifier that learns the patterns and characteristics of malicious web pages. Through the classification stage, the proposed model examines the features of unseen website pages and analyzes the URL carefully for distinguishing benign or malicious website pages. To evaluate the effectiveness of the proposed system, it conducted experiment of 7000 real website pages samples. The results demonstrate the capability of Naive Bayesian algorithm. 3800 website pages are detected as malicious pages, while 3200 are classified as benign pages. Additionally, the accuracy of the proposed system is compared with an existing method. The results show that the accuracy of the proposed system is about 99.5% in detection with a computational efficiency. In conclusion, the Naive Bayesian algorithm effectiveness is robust in detecting cyber threats of website pages.

Featured Image

Why is it important?

The increased reliance in an online services and the rapid growing of internet, the threading of malicious activities and cyber attacking has become a significant concern. Malicious website pages pose a serve risk in internet and network such as malware, identify theft and phishing. The malicious website is a network site that contends a malware. Due to internet rapidly growing, website became main source of hackers and intruders. Where a malicious is a general term used for everything that disturb the computer systems like stealing account bank no. or can access to secure and personal info or can trick the victim to believe they are in benign webpage. Nowadays the internet is rapidly growth; many intruders exploit the websites by embedding many malicious ideas inside them. The attackers will inject a hidden links to a website in order to track all the activities of victim computer through browsing this website. When the number of malicious web page increased lead to increasing the attack of computer systems. In this paper it prepared a classifier system used Naïve Bayesian in Python 3.7 which is a powerful tool in detecting and categorizing malicious web site pages by analyzing the URL of any web sites. The Naive Bayesian algorithm is based on Bayes’ theorem and assumes independence between features, making it computationally efficient and wellsuited for text classification tasks. The main objective of the proposed system is to develop an accurate a robust system that can identify and classify the malicious website pages effectively.


In this On-line proposed classifier system it solved the wide use of malicious web pages to avoid criminal activities. The proposed classifier analyze URL of any requested website page using (Naïve Bayesian algorithm) The machine learning algorithm for distinguishing benign and malicious website pages. This classifier prepared by Python 3.7 language.

M.Sc. Lecturer Mohammed Fakhrulddin Abdulqader
University of Kirkuk

Read the Original

This page is a summary of: Detect Malicious Web Pages Using Naive Bayesian Algorithm to Detect Cyber Threats, Wireless Personal Communications, August 2023, Springer Science + Business Media,
DOI: 10.1007/s11277-023-10713-9.
You can read the full text:



The following have contributed to this page