RAG - Cleva.Bot's Secret Sauce

By Andy Mundell | Published: 03 July, 2025

RAG: Providing contextual knowledge (Without the Fuss)

Cleva.Bot's goal isn’t just to answer questions—it’s to understand context. That the magic wand? It's achievable thanks to something called Retrieval-Augmented Generation, or RAG. It’s like giving your chatbot a librarian sidekick who reads and remembers everything, but only shares what truly matters for a particular conversation..

Imagine you stroll into a library. Instead of endless piles of books, each book (or chunk of text) is positioned on a neat “shelf” in a multidimensional space. On each shelf there's books that echo associated themes—say, space, rockets, orbit and astronomy—end up as neighbours on a particular shelf. These 'chunks of knowledge' form relationships, joined through semantic similarities.

When a question is asked like, "What's the distance to the moon?" that question gets turned into its own vector—a numeric code pointing to relevant “shelf neighbors.” That’s how AI chatbots instantly know the most helpful information in your knowledge base (i.e. vectorised website content, docs, PDFs, etc..).

This is RAG’s brilliance: instead of pulling from static training, your chatbot actually retrieves tailored, on-point info—right when you ask for it. This provides contextually aware responses with granular detail and accuracy.

How It Works

Upload and Slice
We dissect your content (web pages, docs, PDFs etc) into bite-sized chunks (around 600 tokens each) making them easy to process.
Vector Magic
Each chunk is turned into a number-list (a vector) based on meaning. So.. “happy” and “joyful” plot closely on this map, while “happy” and “screwdriver” are worlds apart.
Smart Storage
These vectors go into a vector database, a hyper-efficient structure aligned for close proximity between semantically similar neighbours.
User Query
When a user asks something, we vectorize their query and match it instantly against stored vectors. We pull the most relevant chunks and feed them, along with the ongoing conversation and behavioural prerequisites (prompt & response rules, persona etc) into ChatGPT.
Polish and Respond
ChatGPT blends these knowledge bits with its world-class language skills, refines them via our custom behaviour prompts (like “always include sources” or “sign off with a recommendation”), and returns a crisp, relevant, contextualised response.

What This Provides

Context rather than memory:
It's not memorising everything all at once—just retrieving what's contextually relevant when it matters.
Precision with Your Own Stuff:
It understands your conversation flow (e.g., if you talk about “new phones,” then “cameras,” it stays in that thread).
Efficient Wizardry:
No need for massive retraining cycles—just good architecture and clever retrieval.
Supercharged Accuracy:
It fetches rather than guesses meaning expert level responses with more facts and fewer hallucinations.

Think of RAG as the best of both worlds: the robustness and capability of ChatGPT combined with the precision of real-time vector retrieval. It super-powers your chatbot to be an A.I. genius that's grounded in your actual knowledge base data.

Cleva.Bot leverages these capabilities to deliver accurate, human-esque conversation that's helpful, engaging and on-point.

It's remarkable.

It's not magic. It's Cleva!