Fake News Detection with NLP
This project builds a fake news detection system using supervised machine learning and natural language processing. The model is trained on a combination of authenticated news from the News API and labeled fake news data from Kaggle. The pipeline uses CountVectorizer for text feature extraction, converting news articles into numerical representations suitable for machine learning. The Passive-Aggressive Classifier is chosen for its ability to adapt quickly to new patterns while remaining stable on correctly classified examples. The final model achieves 100% accuracy on the test dataset, demonstrating strong generalization to unseen news articles. The system can classify any news text as "REAL" or "FAKE" in real-time.
Key Metrics
Highlights
- Real-time news ingestion from 3000+ sources via News API
- Text preprocessing with CountVectorizer
- Passive-Aggressive Classifier for online learning
- 100% accuracy on held-out test data
This is a notebook-based ML project. View the full implementation on GitHub.
Features
- Real-time news ingestion from 3000+ sources via News API
- Text preprocessing with CountVectorizer
- Passive-Aggressive Classifier for online learning
- 100% accuracy on held-out test data
- Combined dataset of authenticated and fake news
- Binary classification: REAL vs FAKE
Architecture
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ News API │────▶│ Combine │────▶│ Preprocess│
│ + Kaggle │ │ Datasets │ │ & Clean │
└─────────────┘ └─────────────┘ └─────────────┘
│
▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Classify │◀────│ Train │◀────│ Feature │
│ REAL/FAKE │ │ Model │ │ Extraction │
└─────────────┘ └─────────────┘ └─────────────┘Tech Stack
Key Learnings
Passive-Aggressive classifiers excel at text classification with high-dimensional sparse features
Data quality matters more than quantity — combining authenticated sources with known fake news creates balanced training
CountVectorizer with default settings provides strong baseline features for news classification
The model generalizes well because fake news often uses distinct linguistic patterns and sensationalist language
Want to see more AI projects?
Check out the rest of my AI Lab or get in touch to discuss AI/ML collaboration.