Overview

The AI Assistant API gives you programmatic access to the same chat assistant that powers the search widget on your documentation site. Send natural language questions and receive streaming responses grounded in your documentation content. The assistant uses agentic RAG with three tools to search, retrieve, and synthesize answers.

Base path: https://api.holydocs.com/api/v1/assistant/:projectId

Public Chat

The chat endpoint is public. Embed it in your own interfaces without exposing API keys.

SSE Streaming

Responses stream via Server-Sent Events for low-latency, token-by-token delivery.

Send Message

Send a message to the AI assistant and receive a streaming response. The assistant searches your documentation, retrieves relevant pages, and synthesizes an answer.

bash
POST /api/v1/assistant/:projectId/chat

bash
curl -N -X POST "https://api.holydocs.com/api/v1/assistant/$PROJECT_ID/chat" \ -H "Content-Type: application/json" \ -d '{"message":"How do I set up a custom domain?","visitorId":"v_anon_abc"}'

Path Parameters

ParameterTypeDescription
projectIdstringProject ID or project slug

Request Body

json
{ "message": "How do I set up a custom domain?", "visitorId": "v_anon_abc123", "conversationId": "conv_xyz789"}
FieldTypeRequiredDescription
messagestringYesThe user's question (1-2000 characters)
visitorIdstringYesUnique identifier for the visitor (used for rate limiting and conversation tracking)
conversationIdstringNoContinue an existing conversation. Omit to start a new one.

Response

The response is an SSE (Server-Sent Events) stream. Each event is a newline-delimited message with a data: prefix.

text
HTTP/1.1 200 OKContent-Type: text/event-streamCache-Control: no-cacheConnection: keep-alivedata: {"type":"start","conversationId":"conv_xyz789","messageId":"msg_abc123"}data: {"type":"tool_call","tool":"search_docs","input":{"query":"custom domain setup"}}data: {"type":"tool_result","tool":"search_docs","results":3}data: {"type":"tool_call","tool":"get_page","input":{"path":"/custom-domains"}}data: {"type":"tool_result","tool":"get_page","title":"Custom Domains"}data: {"type":"text","content":"To set up a custom domain"}data: {"type":"text","content":" for your HolyDocs project, follow"}data: {"type":"text","content":" these steps:\n\n1. Go to **Settings"}data: {"type":"text","content":" > Domain** in your dashboard\n2."}data: {"type":"text","content":" Enter your domain (e.g., `docs.yourcompany.com`)\n3."}data: {"type":"text","content":" Add a CNAME record pointing to `proxy.holydocs.com`\n4."}data: {"type":"text","content":" Wait for DNS propagation and SSL provisioning\n\n"}data: {"type":"citations","sources":[{"path":"/custom-domains","title":"Custom Domains"},{"path":"/api/domains","title":"Domains API"}]}data: {"type":"done","tokensUsed":342}

SSE Event Types

TypeDescriptionFields
startStream opened, conversation initializedconversationId, messageId
tool_callAssistant is invoking a tooltool, input
tool_resultTool returned resultstool, summary fields
textContent token(s)content (text fragment)
citationsSource pages the answer is grounded insources (array of {path, title})
doneStream completetokensUsed
errorAn error occurredcode, message
javascript
async function askAssistant(projectId, message, visitorId) { const response = await fetch( `https://api.holydocs.com/api/v1/assistant/${projectId}/chat`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ message, visitorId }) } ); const reader = response.body.getReader(); const decoder = new TextDecoder(); let fullText = ''; while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value, { stream: true }); const lines = chunk.split('\n'); for (const line of lines) { if (!line.startsWith('data: ')) continue; const event = JSON.parse(line.slice(6)); switch (event.type) { case 'text': fullText += event.content; process.stdout.write(event.content); break; case 'citations': console.log('\nSources:', event.sources.map(s => s.title).join(', ')); break; case 'error': throw new Error(event.message); } } } return fullText;}

Use the -N flag with cURL to disable output buffering, which is required to see SSE events in real time.

Agentic RAG Flow

The assistant does not simply embed the query and return a vector match. It follows an agentic Retrieval-Augmented Generation (RAG) flow, autonomously deciding which tools to call and how many rounds of retrieval to perform before answering.

1

Query Analysis

The assistant receives the user's message along with any prior conversation history. It analyzes the query to determine what information is needed.

2

Tool Selection

Based on the query, the assistant selects one or more tools to invoke. For a simple factual question, it may call search_docs once. For a complex question spanning multiple topics, it may chain several tool calls.

3

Retrieval

Each tool call retrieves information from the documentation. search_docs uses hybrid search (keyword + semantic via RRF). get_page retrieves full page content. list_pages returns the navigation tree.

4

Synthesis

With retrieved context, the assistant synthesizes a grounded answer. It cites specific pages and avoids hallucinating information not present in the documentation.

5

Citation

The response concludes with a citations event listing all source pages, so readers can verify and explore further.

Assistant Tools

The assistant has access to three tools. These are invoked automatically during the agentic loop -- you do not call them directly.

Performs hybrid search (keyword + semantic) across the documentation. Returns the top matching pages with snippets.

Input:

json
{ "query": "custom domain DNS setup", "limit": 5}

Behavior: Runs both keyword and semantic search in parallel, merges results via RRF, and returns the top matches. This is the most frequently used tool, invoked in approximately 74% of conversations.

Retrieves the full content of a specific documentation page by path. Used when the assistant needs detailed information from a page identified by search.

Input:

json
{ "path": "/custom-domains"}

Behavior: Fetches the complete page content from KV cache or R2 storage. Returns the title, content body, and metadata. Used in approximately 50% of conversations, typically after search_docs identifies a relevant page.

Returns the full navigation tree of the documentation site. Used when the assistant needs to understand the overall structure or find pages by section.

Input:

json
{}

Behavior: Returns the complete navigation configuration including all sections, pages, and their paths. Used in approximately 16% of conversations, typically for broad "what can I do" or "where is" questions.

List Conversations

Retrieve a list of AI assistant conversations for analytics and review. Requires authentication.

bash
GET /api/v1/assistant/:projectId/conversations

Query Parameters

ParameterTypeRequiredDescription
pagenumberNoPage number (default: 1)
perPagenumberNoResults per page (default: 20, max: 100)
sortBycreatedAt | messageCountNoSort field (default: createdAt)
orderasc | descNoSort order (default: desc)

Response

json
{ "data": [ { "id": "conv_abc123", "visitorId": "v_anon_xyz", "messageCount": 4, "tokensUsed": 1240, "firstMessage": "How do I set up a custom domain?", "satisfied": true, "createdAt": "2026-04-10T14:23:00Z", "lastMessageAt": "2026-04-10T14:25:30Z" }, { "id": "conv_def456", "visitorId": "v_anon_abc", "messageCount": 7, "tokensUsed": 3100, "firstMessage": "What authentication methods are supported?", "satisfied": null, "createdAt": "2026-04-10T12:10:00Z", "lastMessageAt": "2026-04-10T12:18:45Z" } ], "meta": { "total": 1240, "page": 1, "perPage": 20, "totalPages": 62 }}

Usage Statistics

Retrieve aggregated AI assistant usage for billing and monitoring. Requires authentication.

bash
GET /api/v1/assistant/:projectId/usage

Query Parameters

ParameterTypeRequiredDescription
period7d | 30d | 90d | billingNoTime period (default: billing, which uses current billing cycle)

Response

json
{ "data": { "period": "billing", "startDate": "2026-04-01T00:00:00Z", "endDate": "2026-04-30T23:59:59Z", "totalConversations": 1240, "totalMessages": 5680, "totalTokens": 2450000, "tokensLimit": 5000000, "tokensRemaining": 2550000, "usageByDay": [ { "date": "2026-04-10", "conversations": 45, "messages": 198, "tokens": 89000 }, { "date": "2026-04-11", "conversations": 38, "messages": 165, "tokens": 74000 } ], "model": "anthropic/claude-sonnet-4", "avgResponseTime": 2.3, "satisfactionRate": 0.87 }}
bash
curl "https://api.holydocs.com/api/v1/assistant/proj_abc123/usage?period=billing" \ -H "Authorization: Bearer hd_a1b2c3d4e5f67890a1b2c3d4e5f67890"

Token Limits by Plan

PlanMonthly Token LimitModel
Free100,000anthropic/claude-sonnet-4
Starter500,000anthropic/claude-sonnet-4
Pro2,000,000anthropic/claude-sonnet-4
Business5,000,000anthropic/claude-sonnet-4
EnterpriseCustomConfigurable

Token usage is metered on the LLM's input and output tokens combined. Tool calls (search, page retrieval) do not count against your token limit, but the content injected into the LLM context from those tool results does.

Embedding the Assistant

You can embed the AI assistant in your own application using the public chat endpoint:

1

Generate a visitor ID

Create a unique visitor ID for each user session. This can be a UUID, a hash of the user's session, or any unique string. It is used for rate limiting and conversation continuity.

javascript
const visitorId = `v_${crypto.randomUUID()}`;
2

Open a chat stream

POST to the chat endpoint with the visitor's message. Parse the SSE stream to display tokens in real time.

3

Continue the conversation

Pass the conversationId from the start event in subsequent requests to maintain context across multiple turns.

javascript
let conversationId = null;async function sendMessage(message) { const response = await fetch( `https://api.holydocs.com/api/v1/assistant/${projectId}/chat`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ message, visitorId, conversationId }) } ); // Parse SSE stream, capture conversationId from 'start' event}
4

Display citations

When you receive the citations event, render links to the source pages so readers can verify the answer.

Rate Limits

EndpointAuthenticationRate Limit
POST /chatPublic10 messages/minute per visitorId
GET /conversationsRequiredPlan-based (see Authentication)
GET /usageRequiredPlan-based

Error Codes

CodeStatusDescription
NOT_FOUND404Project not found or AI assistant is not enabled
VALIDATION_ERROR400Invalid request body (missing message, visitorId too long)
LIMIT_EXCEEDED429Rate limit or monthly token limit exceeded
AI_UNAVAILABLE503LLM provider is temporarily unavailable
CONTENT_FILTERED400Message was flagged by content safety filters
Ask a question... ⌘I