Back04
04 — Project

Cybersecurity

ML systems for detecting threats across email and malware.

Machine LearningSecurityNLP
Overview

Two machine learning systems applied to two of the most common attack vectors — phishing emails and malicious files. Trained, stress-tested, and deployed as interactive tools that classify real-world threats.

Role
Builder
Timeline
2026
Stack
Python, scikit-learn, pandas, HuggingFace, SMOTE, Streamlit
Status
Deployed (AI 311, UTK)
The Problem

Phishing emails slip past filters every day. The harder problem isn't fitting a model — it's building one that generalizes to unseen, real-world emails instead of memorizing the training set.

Process

Trained and evaluated three models — Logistic Regression, SVM (LinearSVC), and Random Forest — on 82,486 emails (48% legitimate / 52% phishing) sourced from Kaggle.

Image

Model comparison

Extended the evaluation with 18,634 unseen emails to test generalization, then retrained on the combined set to harden the model against drift.

Image

Confusion matrices

Used SMOTE for class balancing to keep the minority class honest, and deployed the final model through a Streamlit interface for live classification.

Image

Deployed Streamlit app

Results

The final model hit 98.37% accuracy on the combined dataset, 97.31% generalization on previously unseen emails, and improved from 70% → 80% on a real-world stress test after retraining.

98.37%

Accuracy on combined set

97.31%

Generalization, unseen

70 → 80%

Real-world stress test

Next Project — 01View

PhotoChain

Public image provenance registry that cryptographically links images to their original creators.