< Writings

Building Narayan: A RAG System for Medical Research

The Problem

I kept noticing the same frustration across jobs. Everyone is drowning in documents.

Doctors juggle patient histories, research papers, guidelines, and old scan reports. A radiologist has to mentally cross-reference an MRI with similar cases from years ago. Researchers spend days skimming PDFs for a single finding. Clinicians dig through years of records hunting for patterns. And it’s not just medicine. Compliance officers, lawyers, insurance adjusters, and engineers all face the same bottleneck. The information exists, but finding it eats hours, days, sometimes weeks. Teams end up duplicating work because the person who cracked the problem last quarter has already moved on.

What if you could just ask a normal question and the system would search your entire knowledge base, pull the exact pieces that matter, and show you precisely where the answer came from?

That idea became Narayan. I built it under my proprietorship at Upperture Interactive. Not just for doctors, but for anyone buried in document stacks.

The hard part is that documents are messy. Medical papers love two-column layouts. Legal docs are full of dense jargon and cross-references. Compliance guides are exhaustively detailed. The system has to handle all of it without choking. And because people make real decisions with the answers, every response needs to be traceable, auditable, and grounded in the actual text. No hallucinations. No confidence without evidence.

The Architecture

If you’ve built RAG systems before, the basic pattern is familiar: extract text, chunk it, embed it, store vectors, retrieve, then generate. The devil is in the details, especially with real documents.

Document Ingestion: Getting the Text Right

Standard PDF parsers like PyPDF massacre two-column layouts. Text from the left column gets interleaved with the right, and your embeddings end up on pure garbage.

I switched to PyMuPDF (fitz) because it actually respects reading order. It pulls blocks top to bottom, left to right within columns. The code is simple:

blocks = page.get_text("blocks", sort=True)
text_blocks = [b[4].strip() for b in blocks if b[6] == 0 and len(b[4].strip()) > 20]
full_text = "\n\n".join(text_blocks)

Those double newlines are intentional. When RecursiveCharacterTextSplitter hits them, it respects paragraph boundaries first. So medical sentences stay intact instead of being sliced mid-thought.

Chunking and Vectorization

Chunk size is everything. Too small and you lose context. Too big and the LLM gets distracted by irrelevant noise.

I landed on 1000 character chunks with 200 character overlap. Roughly one dense medical paragraph. The overlap keeps context from vanishing at boundaries.

For embeddings I used Hugging Face’s nomic-embed-text via sentence-transformers. It’s lightweight, runs locally, and was trained on scientific text so medical terminology doesn’t throw it off. Everything goes into ChromaDB: simple, local, no external service. It plays nicely with LangChain. Every chunk carries stable metadata: filename, page number, document ID, and chunk index.

The Query Pipeline: Retrieve, Rerank, Generate

When a question comes in:

  1. Embed the question with the same model
  2. Pull top candidates from ChromaDB using cosine similarity
  3. Rerank (right now just by similarity score; cross-encoder is on the todo list)
  4. Format the top 3 chunks as numbered sources with filename plus page
  5. Feed the question plus sources to the LLM with a strict system prompt
  6. Return the answer with inline citations

The system prompt is non-negotiable. It forces the LLM to:

  • Use only the provided sources
  • Cite with [Source N]
  • Say “I don’t have enough information” if the chunks don’t help
  • Skip all fluff

Hallucination Mitigation

Three layers keep things honest:

  1. The prompt explicitly forbids invention
  2. The LLM only ever sees the top 3 chunks
  3. Every answer ships with full metadata: filename, page, relevance score (0 to 1), token usage, and a 300-character preview of the source text so users can judge for themselves

The Frontend: Simple, Readable, Usable

React plus Vite. Deliberately minimal. Drag-and-drop PDF upload. Backend reports chunk count. Sidebar lists every document. You can query everything or scope to a single doc. Results show the answer, numbered sources, and expandable evidence cards. Click [Source 1] and you see the original snippet. Trust is built in.

Technical Stack

Backend:

  • FastAPI (REST API that is actually pleasant to work with)
  • LangChain for orchestration
  • ChromaDB (local vector store)
  • PyMuPDF for extraction
  • Sentence-transformers plus nomic-embed-text
  • Ollama (llama3.1 locally; swap to OpenAI, Claude, Gemini with one line if you want)

Frontend:

  • React 19 plus Vite
  • Plain CSS (no framework bloat)
  • Fetch API only

Everything runs locally. Your PDFs never leave your machine.

Lessons Learned

  1. Two-column layouts matter more than you think. PyPDF made the first version useless. PyMuPDF plus block sorting was a 10x jump.

  2. Overlap is not optional. Without the 200-character overlap, context regularly vanished across chunk boundaries. “Treatment efficacy” questions would split the drug description from the results.

  3. Stop chasing the perfect reranker for v1. Naive similarity sorting works fine. A cross-encoder can come later.

  4. Metadata is free value. Filename plus page number turns “the system said X” into “here is exactly where X lives in the PDF.”

  5. Scoped queries are underrated. Letting users search just one document (or a folder) is trivial to implement and incredibly powerful in practice.

What’s Next

v1 shipped mid-January 2026. Phase 2 includes:

  • Cross-encoder reranking
  • OCR support for scanned PDFs
  • Admin panel for bulk management
  • Query analytics
  • Easy API key integration for external LLMs
  • More formats (Word, HTML, spreadsheets)

I’m also eyeing domain-specific embeddings for medical versus legal versus compliance use cases.

Why This Matters

Narayan sits in the sweet spot between generic chatbots and overly specialized tools. Researchers get faster literature reviews. Clinicians query their own knowledge base without flipping through files. Lawyers, compliance teams, engineers. Anyone who lives in documents gains hours back.

The real win is traceability. Every answer points back to the exact page. In medicine, law, or compliance, that’s not a nice-to-have. It’s essential. But even in lower-stakes domains, transparency builds trust.


Want to try it? Clone the repo at https://github.com/om-wani/Narayan. Follow the README, drop in some PDFs, and start asking questions. Break it, improve it, make it yours. That’s the whole point.