NLPNotebook Project

Fake News Detection with NLP

This project builds a fake news detection system using supervised machine learning and natural language processing. The model is trained on a combination of authenticated news from the News API and labeled fake news data from Kaggle. The pipeline uses CountVectorizer for text feature extraction, converting news articles into numerical representations suitable for machine learning. The Passive-Aggressive Classifier is chosen for its ability to adapt quickly to new patterns while remaining stable on correctly classified examples. The final model achieves 100% accuracy on the test dataset, demonstrating strong generalization to unseen news articles. The system can classify any news text as "REAL" or "FAKE" in real-time.

View Source

NLP Project

Notebook

Key Metrics

100%

Accuracy

On test data

3000+

Sources

News sources

Model

Passive-Aggressive

Binary

Output

REAL / FAKE

Highlights

Real-time news ingestion from 3000+ sources via News API
Text preprocessing with CountVectorizer
Passive-Aggressive Classifier for online learning
100% accuracy on held-out test data

This is a notebook-based ML project. View the full implementation on GitHub.

Features

Real-time news ingestion from 3000+ sources via News API
Text preprocessing with CountVectorizer
Passive-Aggressive Classifier for online learning
100% accuracy on held-out test data
Combined dataset of authenticated and fake news
Binary classification: REAL vs FAKE

Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  News API   │────▶│   Combine   │────▶│   Preprocess│
│  + Kaggle   │     │   Datasets  │     │   & Clean   │
└─────────────┘     └─────────────┘     └─────────────┘
                                               │
                                               ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Classify   │◀────│   Train     │◀────│   Feature   │
│  REAL/FAKE  │     │   Model     │     │  Extraction │
└─────────────┘     └─────────────┘     └─────────────┘

Tech Stack

PythonScikit-learnNLTKNews APIPandasCountVectorizerPassive-Aggressive Classifier

Key Learnings

Passive-Aggressive classifiers excel at text classification with high-dimensional sparse features

Data quality matters more than quantity — combining authenticated sources with known fake news creates balanced training

CountVectorizer with default settings provides strong baseline features for news classification

The model generalizes well because fake news often uses distinct linguistic patterns and sensationalist language

Want to see more AI projects?

Check out the rest of my AI Lab or get in touch to discuss AI/ML collaboration.

View All Projects Star on GitHub