ML Phishing Detector
Security Pal Hackathon (Skill Lab)
1st Runner Up
Security Pal Hackathon placement
<100ms
Classification latency
Client-side
Privacy-preserving ML inference
Vulnerable users
Designed for non-technical users
Overview
This project was built during the Security Pal Hackathon organized by Skill Lab, where it won 1st Runner Up. The challenge was to create a tool that makes the internet safer for everyday users, and I focused on real-time phishing detection through a browser extension powered by machine learning.
Phishing attacks remain one of the most prevalent cybersecurity threats globally. The Anti-Phishing Working Group (APWG) has documented a consistent increase in phishing attacks year over year, with traditional blocklist-based approaches struggling to keep up as attackers spin up and tear down phishing pages within hours. An estimated 36 percent of data breaches involve phishing, making it the most common attack vector according to Verizon's Data Breach Investigations Report.
The extension was designed with a focus on protecting vulnerable users — those who may not have the technical knowledge to identify phishing attempts manually, including elderly users, first-time internet users, and those in regions where digital literacy education is still nascent.
The Problem
Traditional phishing detection relies on blocklists of known malicious URLs. These lists are inherently reactive — a new phishing URL must be discovered, reported, and added to the list before users are protected. With attackers able to create and deploy new phishing pages in minutes, the window of vulnerability is significant. Vulnerable populations, including elderly users and those with limited digital literacy, are disproportionately affected because they lack the technical awareness to independently evaluate website authenticity.
My Role
Developer
I developed the end-to-end solution during the hackathon, from ML model training to browser extension implementation. This included feature engineering for URL and page content analysis, model selection and training, and building the browser extension interface that communicates detection results to users in clear, non-technical language.
The Approach
The detection approach combined URL analysis features (domain age, SSL certificate validity, URL entropy, suspicious patterns in the URL structure) with page content analysis (form field analysis, brand impersonation signals, visual similarity to known legitimate sites). These features were fed into a gradient boosted tree model (XGBoost) trained on a labeled dataset of phishing and legitimate URLs.
The browser extension architecture prioritized low latency. Feature extraction happens client-side to avoid sending browsing data to external servers. The ML model was optimized for inference speed, achieving sub-100 millisecond classification on typical hardware. Users receive clear, color-coded warnings when a page is classified as suspicious.
Special attention was given to the user interface for non-technical users. Instead of showing risk scores or technical details, the extension presents simple "Safe," "Suspicious," or "Dangerous" verdicts with plain-language explanations of why a page was flagged.
Key Features
What we built
Real-Time URL Analysis
Machine learning classification of URLs based on domain characteristics, SSL validity, URL entropy, and pattern matching against known phishing structures.
Page Content Analysis
Detection of brand impersonation, suspicious form fields, and visual similarity to known legitimate sites to catch sophisticated phishing attempts.
User-Friendly Warnings
Clear, color-coded alerts with plain-language explanations designed for non-technical users, avoiding jargon and technical risk scores.
Privacy-First Architecture
Client-side feature extraction and inference to protect user browsing data — no URL history is sent to external servers.
Tech Stack
Key Lessons
What I took away from this project
The best security tools are those that protect users without requiring them to understand the threat
Client-side ML inference is viable and preferable when privacy matters
Hackathons force you to ruthlessly prioritize — the scope constraint is a feature
Designing for the least technical user often produces a better experience for everyone
More Projects
Explore other work
Want to build something similar?
I help companies scale their products and build high-performing teams. Let's discuss how I can help with your next project.
Get in Touch