Overview

HolyDocs includes a built-in search system that combines traditional keyword matching with AI-powered semantic search. This hybrid approach ensures readers find what they need, whether they search using exact terms or natural language questions.

Search Modal

Readers access search by pressing Cmd+K (Mac) or Ctrl+K (Windows), or by clicking the search icon in the header. The search modal provides:

  • Instant results as you type
  • Highlighted matching text in results
  • Keyboard navigation (arrow keys + Enter)
  • Page title, description, and section headings in results

How Hybrid Search Works

The keyword search engine runs entirely on the edge using a pre-built search index:

  1. During each deployment, page titles, headings, and content are indexed
  2. The index is stored on the edge as JSON for instant lookup
  3. Queries are scored against titles (highest weight), headings, and body content
  4. Results are ranked by relevance score and returned instantly

Semantic search uses vector embeddings to understand the meaning behind a query:

  1. The search query is embedded into a 1536-dimensional vector using text-embedding-3-small
  2. The vector is compared against your documentation embeddings in our managed vector index
  3. The most similar content chunks are returned, even if they do not contain the exact query terms

Reciprocal Rank Fusion (RRF)

The keyword and semantic results are merged using Reciprocal Rank Fusion with k=60. This ensures that:

  • Pages ranked highly by both methods appear at the top
  • Semantic results are not drowned out by exact keyword matches
  • Keyword results still surface when a query uses precise technical terminology
  • The final ranking benefits from both approaches without requiring manual weight tuning

The fused results are returned to the search modal in under 100ms for most documentation sites.

Configuration

Customize the search experience in docs.json:

json
{ "search": { "prompt": "Search our documentation...", "hotkey": "k", "placeholder": "Search documentation..." }}

Search Options

OptionTypeDefaultDescription
promptstringCustom prompt text for the search modal
hotkeystring"k"Keyboard shortcut key (used with Cmd/Ctrl)
placeholderstring"Search documentation..."Placeholder text in the search input

Search API

Use the search API for custom search integrations:

Keyword Search

bash
curl "https://api.holydocs.com/api/v1/docs/PROJECT_ID/search?q=custom+domains&limit=10"

Response:

json
{ "data": { "results": [ { "title": "Custom Domains", "path": "/custom-domains", "description": "Configure a custom domain for your documentation site", "score": 95, "highlights": ["...configure a <mark>custom domain</mark> with automatic SSL..."] } ] }}

Semantic Search

bash
curl "https://api.holydocs.com/api/v1/docs/PROJECT_ID/search/semantic?q=how+do+I+add+my+own+domain"

Semantic search understands intent — the query "how do I add my own domain" will match the "Custom Domains" page even without those exact words appearing in the content.

Content Indexing

Search indices are built automatically after each production deployment:

1

Page Processing

Each MDX page is processed to extract the title, description, headings, and body text.

2

Chunking

Content is split into semantic chunks optimized for embedding. Each chunk preserves context about which page and section it belongs to.

3

Embedding

Chunks are converted to vector embeddings via OpenAI's text-embedding-3-small model (1536 dimensions).

4

Indexing

Embeddings are upserted into our managed vector index with deterministic IDs (projectId:pagePath:chunkIndex) for efficient updates.

5

Keyword Index

A separate keyword search index is built and stored on the edge for instant text-based search.

Content indexing is differential — only chunks whose SHA-256 checksum has changed since the last index are re-embedded. This makes indexing fast even for large documentation sites.

Search with Auth-Protected Content

If your documentation uses content authentication, search results respect access controls:

  • Public pages appear in search for all users
  • Protected pages only appear for authenticated users with the appropriate group membership
  • The search API respects the same JWT-based auth used for page access

Auth-enabled projects skip search result caching entirely to prevent protected content from leaking to unauthorized users.

Tuning Search Quality

If search results are not meeting expectations, use these techniques to improve quality:

The keyword index weights page titles highest. A clear, descriptive title like "Configuring Custom Domains with SSL" will rank better than "Domains" for relevant queries. The description frontmatter field is also indexed and influences rankings.

Headings (h2, h3) are indexed with higher weight than body text. Structure your pages with clear, descriptive headings that match the terms readers are likely to search for.

You can add a keywords field to your page frontmatter to boost discoverability for specific terms:

yaml
---title: Authenticationdescription: Set up API key authenticationkeywords: ["api key", "bearer token", "auth header", "credentials"]---

The analytics dashboard shows queries that returned no results. These are direct signals of content gaps or terminology mismatches. Either create content for those queries or add the terms as keywords to existing pages.

Pages that cover too many topics produce noisy embeddings. If a page is a catch-all, consider splitting it into focused pages — each will produce better search results.

Search Analytics

Monitor search performance from the Analytics > Search tab in the dashboard:

MetricDescriptionAction
Top queriesMost frequently searched termsEnsure top queries lead to high-quality results
Zero-result queriesSearches with no resultsCreate content or add keywords for these terms
Click-through rate% of searches where a result was clickedLow CTR suggests results are not relevant — review ranking
Time to first clickHow quickly readers find what they needA high time suggests too many results or poor ranking
Queries per sessionAverage searches per reader visitHigh values may indicate navigation issues

Set up a weekly routine to review zero-result queries. Each one represents a reader who could not find what they needed — addressing these systematically improves documentation quality over time.

Indexing Pipeline (Deep Dive)

For large documentation sites, understanding the indexing pipeline helps you optimize build times:

1

Content Extraction

Each MDX page is parsed to extract plain text. MDX components, code blocks, and frontmatter are processed separately. Code blocks are included in the keyword index but excluded from semantic embeddings (code does not embed well as natural language).

2

Deduplication Check

A SHA-256 checksum is computed for each chunk. If the checksum matches the previous index, the chunk is skipped — no API call is made. This makes incremental indexing fast: a 500-page site with 5 changed pages re-indexes in seconds.

3

Batch Embedding

Changed chunks are batched into groups of 100 and sent to the embedding API in parallel. Rate limiting is handled automatically with exponential backoff.

4

Vector Upsert

Embeddings are upserted into our managed vector index using deterministic IDs. Deleted pages have their vectors removed. The upsert is atomic per batch.

5

Edge Index Update

The keyword search index is rebuilt as a single JSON object and written to the edge cache. For sites with 500+ pages, the index is sharded by first letter to stay within per-value size limits.

Ask a question... ⌘I