URL Shield: Detecting Malicious URLs Using Machine Learning Techniques

Mohammed Masood Ullah Fouzan Mohammed Khan · Muaaz Hamad · Shoaib Mohammed Basheer; Ms. Priyarika Sagar

Authors

Mohammed Masood Ullah Fouzan Mohammed Khan · Muaaz Hamad · Shoaib Mohammed Basheer BTech Students Department of Computer Science and Engineering, Lords Institute of Engineering and Technology, Hyderabad, India Author
Ms. Priyarika Sagar Assistant Professor Department of Computer Science and Engineering, Lords Institute of Engineering and Technology, Hyderabad, India Author

Keywords:

Malicious URL Detection · Machine Learning · Gradient Boosting · Feature Engineering · Phishing Detection · Cybersecurity · Flask · URL Classification · Ensemble Learning · Random Forest · SVM · Neural Network · SQLite

Abstract

The proliferation of cyber threats through malicious URLs has become one of the most significant challenges in
cybersecurity, with phishing attacks alone causing over $10.3 billion in losses globally in 2022. This paper presents
URLShield, a comprehensive machine learning-based web application that classifies URLs as Legitimate or Malicious
using 28 engineered features extracted directly from URL strings — without loading the target page, querying DNS,
or using WHOIS data — achieving sub-100ms classification latency.The system implements a comparative analysis
of eight diverse machine learning algorithms: Logistic Regression, Decision Tree, Random Forest, K-Nearest
Neighbors (KNN), Support Vector Machine (SVM), Naive Bayes, Gradient Boosting, and Multi-Layer Perceptron
Neural Network, trained on a balanced synthetic dataset of 10,000 URLs (5,000 legitimate + 5,000 malicious) with
intentional 8% label noise to simulate real-world classification ambiguity. Gradient Boosting achieves the highest
accuracy of 92.35% with 93.10% recall — the production model for URLShield.The 28 features span four categories:
character count features (13), binary flag features (7), structural features (5), and ratio features (3), capturing URL
length, special characters, HTTPS usage, IP-based addresses, URL shorteners, suspicious TLDs, and phishing
keyword counts. The Flask web application provides PBKDF2-SHA256 authenticated user sessions, real-time URL
prediction with confidence scores, prediction history in SQLite, 12-chart EDA gallery, Chart.js interactive model
comparison dashboard, role-based admin access, and Docker deployment. The system demonstrates that feature-only
URL analysis achieves over 92% accuracy — competitive with content-based approaches at 100× lower latency and
zero security risk.

URL Shield: Detecting Malicious URLs Using Machine Learning Techniques

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Call For Paper

Submission

MenuBar

Visitors in IJESR

Images

Indexed

Information

Reach Us

Important Links

Downloads & Indexing

Ethics & Policies