RAG Chatbot with DeepSeek & LlamaIndex
This project implements a production-ready RAG (Retrieval-Augmented Generation) system that allows users to chat with their documents. Instead of relying solely on the LLM's training data, the system retrieves relevant context from uploaded documents to provide accurate, grounded responses. The chatbot uses LlamaIndex for document processing and retrieval, with DeepSeek as the underlying LLM for generation. Documents are chunked, embedded, and stored in a vector database for efficient similarity search. Key capabilities include multi-document support, source citation for every response, conversation memory for follow-up questions, and streaming responses for better UX.
Enter Wikipedia topic names separated by commas
Index Wikipedia topics to start chatting
Features
- Upload and index PDF, TXT, and MD documents
- Semantic search across document corpus
- Source citations with every response
- Conversation memory for context-aware follow-ups
- Streaming responses for real-time feedback
- Configurable chunk size and overlap
Architecture
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ React │────▶│ FastAPI │────▶│ LlamaIndex │
│ Frontend │ │ Backend │ │ Engine │
└─────────────┘ └─────────────┘ └─────────────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ ChromaDB │ │ DeepSeek │
│ (Vectors) │ │ API │
└─────────────┘ └─────────────┘Tech Stack
Key Learnings
Chunk size significantly impacts retrieval quality — 512 tokens with 50 overlap worked best
Hybrid search (semantic + keyword) outperforms pure vector search for technical documents
Streaming responses dramatically improve perceived latency
Document metadata (filename, page number) is essential for useful citations
Want to see more AI projects?
Check out the rest of my AI Lab or get in touch to discuss AI/ML collaboration.