Overview

The HolyDocs AI Assistant is a RAG (Retrieval-Augmented Generation) chatbot embedded directly in your documentation site. It answers reader questions by searching your documentation content, retrieving relevant passages, and generating accurate responses grounded in your actual docs.

How It Works

1

Content Indexing

After each production deployment, HolyDocs chunks your documentation into semantic segments and generates vector embeddings using text-embedding-3-small. These embeddings are stored in Cloudflare Vectorize.

2

Query Processing

When a reader asks a question, the assistant converts the query into a vector embedding and performs a hybrid search — combining vector similarity search with keyword matching — to find the most relevant content chunks.

3

Agentic RAG

The assistant uses tool calling with three tools: search_docs (semantic search), get_page (fetch a specific page), and list_pages (browse the navigation). It can perform up to 3 rounds of retrieval before generating its final response.

4

Streaming Response

The response is streamed to the reader via Server-Sent Events (SSE), so answers appear incrementally as they are generated.

How RAG Works (Technical Detail)

The Retrieval-Augmented Generation pipeline behind the assistant ensures answers are grounded in your actual documentation rather than the LLM's general training data:

During indexing, each page is split into chunks of approximately 500 tokens with 50-token overlap between adjacent chunks. Chunks preserve their source page path and section heading as metadata. This ensures the assistant can cite specific sections when answering.

Chunks are embedded using OpenAI's text-embedding-3-small (1536 dimensions). The same model is used for both indexing and query-time embedding to ensure vector space alignment. Embeddings are stored in Cloudflare Vectorize with deterministic IDs (projectId:pagePath:chunkIndex).

At query time, the assistant runs both a vector similarity search (top 10 chunks by cosine similarity) and a keyword search (top 10 results by BM25-style scoring). Results are merged using Reciprocal Rank Fusion (RRF) with k=60, producing a unified ranking that benefits from both semantic understanding and exact term matching.

The top 5 merged results are injected into the LLM prompt as context. Each chunk includes its source page title, path, and section heading. The system prompt instructs the model to answer only from the provided context and to cite sources.

If the initial context is insufficient, the assistant can call tools to retrieve more information. It supports up to 3 rounds of tool calling before generating its final answer. This agentic loop allows it to navigate from a general query to a specific answer across multiple pages.

Assistant Tools

The assistant has access to three tools for retrieving information:

ToolParametersDescription
search_docsquery: stringPerforms hybrid search across your documentation. Returns the top 5 matching chunks with page path, section heading, and content snippet. Use this for broad questions.
get_pagepath: stringFetches the full content of a specific page by its path (e.g., /quickstart). Use this when the assistant needs the complete context of a known page.
list_pagesnoneReturns the full navigation tree of your documentation — all page titles, paths, and their group/tab hierarchy. Useful when the assistant needs to orient itself within your docs structure.

Example Tool Usage

When a reader asks "How do I set up a custom domain?", the assistant:

  1. Calls search_docs with query "set up custom domain"
  2. Receives chunks from the /custom-domains and /projects pages
  3. Calls get_page with path "/custom-domains" to get the full page content
  4. Generates a response citing the specific steps from the custom domains page

Enabling the Assistant

Add the assistant configuration to your docs.json:

json
{ "assistant": { "enabled": true, "name": "AI Helper", "greeting": "Hi! I can help you find information in the docs. What would you like to know?", "suggestedQuestions": [ "How do I set up a custom domain?", "What AI features are available?", "How do I configure navigation tabs?" ], "position": "bottom-right", "theme": { "accentColor": "#FBBF24", "icon": "sparkle" } }}

Configuration Options

OptionTypeDefaultDescription
enabledbooleanfalseEnable the chat widget
namestring"AI Assistant"Name shown in the chat header
greetingstringInitial message displayed when the chat opens
modelstringOverride the default LLM model
suggestedQuestionsstring[]Up to 4 suggested questions shown in the chat
positionbottom-right | bottom-leftbottom-rightWidget position on the page
theme.accentColorstringPrimary colorColor for the chat button and header
theme.iconsparkle | chat | bot | searchsparkleIcon on the chat trigger button

Chat Widget

The assistant appears as a floating button on your docs pages. Readers can:

  • Click the button or press Cmd+I (Mac) / Ctrl+I (Windows) to open the chat
  • Ask natural language questions about your documentation
  • Click suggested questions for quick answers
  • View source links to the relevant documentation pages

The chat widget keyboard shortcut is Cmd+I / Ctrl+I. The search modal uses Cmd+K / Ctrl+K. These are separate features.

Usage Limits

AI assistant messages are limited per organization per month based on your plan:

PlanMessages/Month
Free0 (not available)
Starter0 (not available)
Pro250
Business500
EnterpriseUnlimited

Usage is tracked per organization across all projects. You can view current usage in the dashboard under AI Features > Usage.

Contact Support Fallback

Optionally add a "Contact support" link when the assistant cannot answer:

json
{ "assistant": { "enabled": true, "contactSupport": { "enabled": true, "email": "support@yourcompany.com", "label": "Contact our team" } }}

API Endpoints

The assistant is also available via API for custom integrations:

bash
# Send a message and receive an SSE streamcurl -X POST "https://api.holydocs.com/api/v1/assistant/PROJECT_ID/chat" \ -H "Content-Type: application/json" \ -d '{ "message": "How do I configure custom domains?", "visitorId": "visitor-123" }'

The response is an SSE stream with data events containing the assistant's response tokens.

Reindexing

If your AI search results seem stale, you can trigger a manual reindex from the dashboard:

  1. Go to AI Features in your project sidebar
  2. Click Reindex Content
  3. Wait for the indexing job to complete (typically 30-60 seconds)

Content indexing uses differential checksums (SHA-256) — only changed chunks are re-embedded, making subsequent indexing fast.

Customizing Responses

You can influence the assistant's behavior through several configuration options:

System Prompt Override

Add a custom system prompt to guide the assistant's tone, scope, or behavior:

json
{ "assistant": { "enabled": true, "systemPrompt": "You are a helpful assistant for the Acme API. Always include code examples in your answers. If asked about pricing, direct users to the pricing page." }}

Suggested Questions

Provide up to 4 suggested questions that appear when the chat opens. Choose questions that highlight your most important or most-searched content:

json
{ "assistant": { "suggestedQuestions": [ "How do I authenticate API requests?", "What are the rate limits?", "How do I handle webhook events?", "Can I use the SDK with TypeScript?" ] }}

Model Selection

Override the default model (Claude Sonnet) with any model available through OpenRouter:

json
{ "assistant": { "model": "anthropic/claude-sonnet-4" }}

Custom model selection is available on Business and Enterprise plans. Pro plans use the default model.

Debugging

If the assistant is giving poor answers, use these approaches to diagnose and fix the issue:

Go to AI Features in the dashboard and verify that the latest deployment was indexed successfully. If the index is stale, click Reindex Content.

Use the search API to test whether your content is being retrieved correctly. If search_docs returns irrelevant chunks for a query, the issue is in your content structure rather than the LLM.

bash
curl "https://api.holydocs.com/api/v1/docs/PROJECT_ID/search/semantic?q=your+test+query"

If the assistant gives partial or truncated answers, your content may be splitting at awkward chunk boundaries. Adding clear section headings and keeping related content together helps the chunker produce better segments.

If the assistant says "I could not find information about X", the content may genuinely be missing. Check the zero-result queries in your analytics dashboard to identify gaps.

If you are using a custom systemPrompt, ensure it does not overly restrict the assistant. A prompt like "Only answer questions about authentication" will prevent it from answering any other topic.

Ask a question... ⌘I