AI Assistant
Enable an AI-powered chat assistant on your documentation site that answers questions using your content as context.
Overview
The HolyDocs AI Assistant is a RAG (Retrieval-Augmented Generation) chatbot embedded directly in your documentation site. It answers reader questions by searching your documentation content, retrieving relevant passages, and generating accurate responses grounded in your actual docs.
How It Works
Content Indexing
After each production deployment, HolyDocs chunks your documentation into semantic segments and generates vector embeddings using text-embedding-3-small. These embeddings are stored in Cloudflare Vectorize.
Query Processing
When a reader asks a question, the assistant converts the query into a vector embedding and performs a hybrid search — combining vector similarity search with keyword matching — to find the most relevant content chunks.
Agentic RAG
The assistant uses tool calling with three tools: search_docs (semantic search), get_page (fetch a specific page), and list_pages (browse the navigation). It can perform up to 3 rounds of retrieval before generating its final response.
Streaming Response
The response is streamed to the reader via Server-Sent Events (SSE), so answers appear incrementally as they are generated.
How RAG Works (Technical Detail)
The Retrieval-Augmented Generation pipeline behind the assistant ensures answers are grounded in your actual documentation rather than the LLM's general training data:
During indexing, each page is split into chunks of approximately 500 tokens with 50-token overlap between adjacent chunks. Chunks preserve their source page path and section heading as metadata. This ensures the assistant can cite specific sections when answering.
Chunks are embedded using OpenAI's text-embedding-3-small (1536 dimensions). The same model is used for both indexing and query-time embedding to ensure vector space alignment. Embeddings are stored in Cloudflare Vectorize with deterministic IDs (projectId:pagePath:chunkIndex).
At query time, the assistant runs both a vector similarity search (top 10 chunks by cosine similarity) and a keyword search (top 10 results by BM25-style scoring). Results are merged using Reciprocal Rank Fusion (RRF) with k=60, producing a unified ranking that benefits from both semantic understanding and exact term matching.
The top 5 merged results are injected into the LLM prompt as context. Each chunk includes its source page title, path, and section heading. The system prompt instructs the model to answer only from the provided context and to cite sources.
If the initial context is insufficient, the assistant can call tools to retrieve more information. It supports up to 3 rounds of tool calling before generating its final answer. This agentic loop allows it to navigate from a general query to a specific answer across multiple pages.
Assistant Tools
The assistant has access to three tools for retrieving information:
| Tool | Parameters | Description |
|---|---|---|
search_docs | query: string | Performs hybrid search across your documentation. Returns the top 5 matching chunks with page path, section heading, and content snippet. Use this for broad questions. |
get_page | path: string | Fetches the full content of a specific page by its path (e.g., /quickstart). Use this when the assistant needs the complete context of a known page. |
list_pages | none | Returns the full navigation tree of your documentation — all page titles, paths, and their group/tab hierarchy. Useful when the assistant needs to orient itself within your docs structure. |
Example Tool Usage
When a reader asks "How do I set up a custom domain?", the assistant:
- Calls
search_docswith query "set up custom domain" - Receives chunks from the
/custom-domainsand/projectspages - Calls
get_pagewith path "/custom-domains" to get the full page content - Generates a response citing the specific steps from the custom domains page
Enabling the Assistant
Add the assistant configuration to your docs.json:
json{ "assistant": { "enabled": true, "name": "AI Helper", "greeting": "Hi! I can help you find information in the docs. What would you like to know?", "suggestedQuestions": [ "How do I set up a custom domain?", "What AI features are available?", "How do I configure navigation tabs?" ], "position": "bottom-right", "theme": { "accentColor": "#FBBF24", "icon": "sparkle" } }}
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable the chat widget |
name | string | "AI Assistant" | Name shown in the chat header |
greeting | string | — | Initial message displayed when the chat opens |
model | string | — | Override the default LLM model |
suggestedQuestions | string[] | — | Up to 4 suggested questions shown in the chat |
position | bottom-right | bottom-left | bottom-right | Widget position on the page |
theme.accentColor | string | Primary color | Color for the chat button and header |
theme.icon | sparkle | chat | bot | search | sparkle | Icon on the chat trigger button |
Chat Widget
The assistant appears as a floating button on your docs pages. Readers can:
- Click the button or press Cmd+I (Mac) / Ctrl+I (Windows) to open the chat
- Ask natural language questions about your documentation
- Click suggested questions for quick answers
- View source links to the relevant documentation pages
The chat widget keyboard shortcut is Cmd+I / Ctrl+I. The search modal uses Cmd+K / Ctrl+K. These are separate features.
Usage Limits
AI assistant messages are limited per organization per month based on your plan:
| Plan | Messages/Month |
|---|---|
| Free | 0 (not available) |
| Starter | 0 (not available) |
| Pro | 250 |
| Business | 500 |
| Enterprise | Unlimited |
Usage is tracked per organization across all projects. You can view current usage in the dashboard under AI Features > Usage.
Contact Support Fallback
Optionally add a "Contact support" link when the assistant cannot answer:
json{ "assistant": { "enabled": true, "contactSupport": { "enabled": true, "email": "support@yourcompany.com", "label": "Contact our team" } }}
API Endpoints
The assistant is also available via API for custom integrations:
bash# Send a message and receive an SSE streamcurl -X POST "https://api.holydocs.com/api/v1/assistant/PROJECT_ID/chat" \ -H "Content-Type: application/json" \ -d '{ "message": "How do I configure custom domains?", "visitorId": "visitor-123" }'
The response is an SSE stream with data events containing the assistant's response tokens.
Reindexing
If your AI search results seem stale, you can trigger a manual reindex from the dashboard:
- Go to AI Features in your project sidebar
- Click Reindex Content
- Wait for the indexing job to complete (typically 30-60 seconds)
Content indexing uses differential checksums (SHA-256) — only changed chunks are re-embedded, making subsequent indexing fast.
Customizing Responses
You can influence the assistant's behavior through several configuration options:
System Prompt Override
Add a custom system prompt to guide the assistant's tone, scope, or behavior:
json{ "assistant": { "enabled": true, "systemPrompt": "You are a helpful assistant for the Acme API. Always include code examples in your answers. If asked about pricing, direct users to the pricing page." }}
Suggested Questions
Provide up to 4 suggested questions that appear when the chat opens. Choose questions that highlight your most important or most-searched content:
json{ "assistant": { "suggestedQuestions": [ "How do I authenticate API requests?", "What are the rate limits?", "How do I handle webhook events?", "Can I use the SDK with TypeScript?" ] }}
Model Selection
Override the default model (Claude Sonnet) with any model available through OpenRouter:
json{ "assistant": { "model": "anthropic/claude-sonnet-4" }}
Custom model selection is available on Business and Enterprise plans. Pro plans use the default model.
Debugging
If the assistant is giving poor answers, use these approaches to diagnose and fix the issue:
Go to AI Features in the dashboard and verify that the latest deployment was indexed successfully. If the index is stale, click Reindex Content.
Use the search API to test whether your content is being retrieved correctly. If search_docs returns irrelevant chunks for a query, the issue is in your content structure rather than the LLM.
bashcurl "https://api.holydocs.com/api/v1/docs/PROJECT_ID/search/semantic?q=your+test+query"
If the assistant gives partial or truncated answers, your content may be splitting at awkward chunk boundaries. Adding clear section headings and keeping related content together helps the chunker produce better segments.
If the assistant says "I could not find information about X", the content may genuinely be missing. Check the zero-result queries in your analytics dashboard to identify gaps.
If you are using a custom systemPrompt, ensure it does not overly restrict the assistant. A prompt like "Only answer questions about authentication" will prevent it from answering any other topic.