Cybersecurity
ML systems for detecting threats across email and malware.
Two machine learning systems applied to two of the most common attack vectors — phishing emails and malicious files. Trained, stress-tested, and deployed as interactive tools that classify real-world threats.
- Role
- Builder
- Timeline
- 2026
- Stack
- Python, scikit-learn, pandas, HuggingFace, SMOTE, Streamlit
- Status
- Deployed (AI 311, UTK)
Phishing emails slip past filters every day. The harder problem isn't fitting a model — it's building one that generalizes to unseen, real-world emails instead of memorizing the training set.
Trained and evaluated three models — Logistic Regression, SVM (LinearSVC), and Random Forest — on 82,486 emails (48% legitimate / 52% phishing) sourced from Kaggle.
Model comparison
Extended the evaluation with 18,634 unseen emails to test generalization, then retrained on the combined set to harden the model against drift.
Confusion matrices
Used SMOTE for class balancing to keep the minority class honest, and deployed the final model through a Streamlit interface for live classification.
Deployed Streamlit app
The final model hit 98.37% accuracy on the combined dataset, 97.31% generalization on previously unseen emails, and improved from 70% → 80% on a real-world stress test after retraining.
98.37%
Accuracy on combined set
97.31%
Generalization, unseen
70 → 80%
Real-world stress test
PhotoChain
Public image provenance registry that cryptographically links images to their original creators.