AI Assistant

Enable an AI-powered chat assistant on your documentation site that answers questions using your content as context.

Overview

The HolyDocs AI Assistant is a RAG (Retrieval-Augmented Generation) chatbot embedded directly in your documentation site. It answers reader questions by searching your documentation content, retrieving relevant passages, and generating accurate responses grounded in your actual docs.

How It Works

Content Indexing

After each production deployment, HolyDocs chunks your documentation into semantic segments and generates vector embeddings using text-embedding-3-small. These embeddings are stored in Cloudflare Vectorize.

Query Processing

When a reader asks a question, the assistant converts the query into a vector embedding and performs a hybrid search — combining vector similarity search with keyword matching — to find the most relevant content chunks.

Agentic RAG

The assistant uses tool calling with three tools: search_docs (semantic search), get_page (fetch a specific page), and list_pages (browse the navigation). It can perform up to 3 rounds of retrieval before generating its final response.

Streaming Response

The response is streamed to the reader via Server-Sent Events (SSE), so answers appear incrementally as they are generated.

How RAG Works (Technical Detail)

The Retrieval-Augmented Generation pipeline behind the assistant ensures answers are grounded in your actual documentation rather than the LLM's general training data:

1. Chunking Strategy

During indexing, each page is split into chunks of approximately 500 tokens with 50-token overlap between adjacent chunks. Chunks preserve their source page path and section heading as metadata. This ensures the assistant can cite specific sections when answering.

2. Embedding Model

Chunks are embedded using OpenAI's text-embedding-3-small (1536 dimensions). The same model is used for both indexing and query-time embedding to ensure vector space alignment. Embeddings are stored in Cloudflare Vectorize with deterministic IDs (projectId:pagePath:chunkIndex).

3. Hybrid Retrieval

At query time, the assistant runs both a vector similarity search (top 10 chunks by cosine similarity) and a keyword search (top 10 results by BM25-style scoring). Results are merged using Reciprocal Rank Fusion (RRF) with k=60, producing a unified ranking that benefits from both semantic understanding and exact term matching.

4. Context Window Construction

The top 5 merged results are injected into the LLM prompt as context. Each chunk includes its source page title, path, and section heading. The system prompt instructs the model to answer only from the provided context and to cite sources.

5. Multi-Turn Tool Calling

If the initial context is insufficient, the assistant can call tools to retrieve more information. It supports up to 3 rounds of tool calling before generating its final answer. This agentic loop allows it to navigate from a general query to a specific answer across multiple pages.

Assistant Tools

The assistant has access to three tools for retrieving information:

Tool	Parameters	Description
`search_docs`	`query: string`	Performs hybrid search across your documentation. Returns the top 5 matching chunks with page path, section heading, and content snippet. Use this for broad questions.
`get_page`	`path: string`	Fetches the full content of a specific page by its path (e.g., `/quickstart`). Use this when the assistant needs the complete context of a known page.
`list_pages`	none	Returns the full navigation tree of your documentation — all page titles, paths, and their group/tab hierarchy. Useful when the assistant needs to orient itself within your docs structure.

Example Tool Usage

When a reader asks "How do I set up a custom domain?", the assistant:

Calls search_docs with query "set up custom domain"
Receives chunks from the /custom-domains and /projects pages
Calls get_page with path "/custom-domains" to get the full page content
Generates a response citing the specific steps from the custom domains page

Enabling the Assistant

Add the assistant configuration to your docs.json:

json
{  "assistant": {    "enabled": true,    "name": "AI Helper",    "greeting": "Hi! I can help you find information in the docs. What would you like to know?",    "suggestedQuestions": [      "How do I set up a custom domain?",      "What AI features are available?",      "How do I configure navigation tabs?"    ],    "position": "bottom-right",    "theme": {      "accentColor": "#FBBF24",      "icon": "sparkle"    }  }}

Configuration Options

Option	Type	Default	Description
`enabled`	boolean	`false`	Enable the chat widget
`name`	string	`"AI Assistant"`	Name shown in the chat header
`greeting`	string	—	Initial message displayed when the chat opens
`model`	string	—	Override the default LLM model
`suggestedQuestions`	string[]	—	Up to 4 suggested questions shown in the chat
`position`	`bottom-right` \| `bottom-left`	`bottom-right`	Widget position on the page
`theme.accentColor`	string	Primary color	Color for the chat button and header
`theme.icon`	`sparkle` \| `chat` \| `bot` \| `search`	`sparkle`	Icon on the chat trigger button

The assistant appears as a floating button on your docs pages. Readers can:

Click the button or press Cmd+I (Mac) / Ctrl+I (Windows) to open the chat
Ask natural language questions about your documentation
Click suggested questions for quick answers
View source links to the relevant documentation pages

The chat widget keyboard shortcut is Cmd+I / Ctrl+I. The search modal uses Cmd+K / Ctrl+K. These are separate features.

Usage Limits

AI assistant messages are limited per organization per month based on your plan:

Plan	Messages/Month
Free	0 (not available)
Starter	0 (not available)
Pro	250
Business	500
Enterprise	Unlimited

Usage is tracked per organization across all projects. You can view current usage in the dashboard under AI Features > Usage.

Contact Support Fallback

Optionally add a "Contact support" link when the assistant cannot answer:

json
{  "assistant": {    "enabled": true,    "contactSupport": {      "enabled": true,      "email": "support@yourcompany.com",      "label": "Contact our team"    }  }}

API Endpoints

The assistant is also available via API for custom integrations:

bash
# Send a message and receive an SSE streamcurl -X POST "https://api.holydocs.com/api/v1/assistant/PROJECT_ID/chat" \  -H "Content-Type: application/json" \  -d '{    "message": "How do I configure custom domains?",    "visitorId": "visitor-123"  }'

The response is an SSE stream with data events containing the assistant's response tokens.

Reindexing

If your AI search results seem stale, you can trigger a manual reindex from the dashboard:

Go to AI Features in your project sidebar
Click Reindex Content
Wait for the indexing job to complete (typically 30-60 seconds)

Content indexing uses differential checksums (SHA-256) — only changed chunks are re-embedded, making subsequent indexing fast.

Customizing Responses

You can influence the assistant's behavior through several configuration options:

System Prompt Override

Add a custom system prompt to guide the assistant's tone, scope, or behavior:

json
{  "assistant": {    "enabled": true,    "systemPrompt": "You are a helpful assistant for the Acme API. Always include code examples in your answers. If asked about pricing, direct users to the pricing page."  }}

Model Selection

Override the default model (Claude Sonnet) with any model available through OpenRouter:

json
{  "assistant": {    "model": "anthropic/claude-sonnet-4"  }}

Custom model selection is available on Business and Enterprise plans. Pro plans use the default model.

Debugging

If the assistant is giving poor answers, use these approaches to diagnose and fix the issue:

Check indexing status

Go to AI Features in the dashboard and verify that the latest deployment was indexed successfully. If the index is stale, click Reindex Content.

Test search quality directly

Use the search API to test whether your content is being retrieved correctly. If search_docs returns irrelevant chunks for a query, the issue is in your content structure rather than the LLM.

bash
curl "https://api.holydocs.com/api/v1/docs/PROJECT_ID/search/semantic?q=your+test+query"

Review chunk boundaries

If the assistant gives partial or truncated answers, your content may be splitting at awkward chunk boundaries. Adding clear section headings and keeping related content together helps the chunker produce better segments.

Check for content gaps

If the assistant says "I could not find information about X", the content may genuinely be missing. Check the zero-result queries in your analytics dashboard to identify gaps.

Verify the system prompt

If you are using a custom systemPrompt, ensure it does not overly restrict the assistant. A prompt like "Only answer questions about authentication" will prevent it from answering any other topic.

Was this page helpful?

Previous ContractCall

Next AI Search