Contextual Search: Finding Suttas based on context and meaning

The contextual search system finds suttas by mapping the meaning¹ behind your queries, rather than just matching keywords. This is a hybrid search system that combines semantic vector search with traditional full-text search — it's not an LLM wrapper that generates responses.

The search tool can be found at /suttas/sutta-search.

What Makes This Different

Traditional search looks for exact word matches. If you search for "sila," you'll only find suttas that literally contain that word. The hybrid search maps concepts and relationships through vector embeddings, allowing you to find relevant suttas even if they don't use the exact terms you typed:

"teachings about progress along the path" finds suttas about the gradual training, even if they don't use those exact words
"suttas where Buddha talks to kings" returns suttas containing conversations with rulers
"What's the sutta where 'x'" allows you to find a sutta based on a specific teaching or story, even if you don't remember the exact title or wording

Note: If you just want to quickly jump to a specific sutta by ID (like "MN1" or "DN34"), use the quick search (Cmd/Ctrl+K) instead. The hybrid search is designed for deeper exploration and discovery of teachings based on context and thematic connections.

How to Use Contextual/Vector Search

Getting Started

Visit the Sutta Search page.

Example Queries That Work Well

Here are some natural language queries that work well:

By Topic or Theme:

"suttas about sense restraint and the danger of sensual pleasures"
"teachings on compassion and metta"
"suttas explaining paticcasamuppada"

By Context or Situation:

"the Buddha speaks to King Pasenadi"
"conversations with lay disciples about ethics"
"suttas between 2 monks discussing discussing consciousness"

By Specific Teachings:

"the Buddha compares right effort to tuning a lute"
"the buddha use parables about farming to explain the path"
"the senses are like 6 wild animals tied together"

Understanding the Results

Search results show several key pieces of information:

Score: How well the sutta matches your query (higher = better match)
Search Type:
- Enhanced Hybrid = combining semantic understanding + keyword matching + blurb context
- Hybrid = combining semantic understanding + keyword matching
- Full-text only = keyword matching when semantic search unavailable
  - This should only happen for very obscure queries with no semantic matches or if the API is down
Performance: How long the search took, including embedding generation time
Content Summary: A snippet showing a summary of the suttas (for suttas that have one)

Tips for Better Results

Effective Query Patterns

Be specific about what you're looking for:

✅ "suttas about the four noble truths given to monks"
❌ "Buddha teachings"

Use natural language:

✅ "suttas where Buddha helps a monk attain right view"
❌ "Buddha help suffering stories"

Include context when helpful:

✅ "teachings given to monks about abandoning the world"
✅ "advice for lay practitioners on ethical conduct"

When to Use Different Search Types

Contextual Search: When exploring themes, concepts, or situations
Quick Search (Cmd/Ctrl+K): When you know the specific sutta name or ID (like "MN1", "DN34" or "Rahulasutta")
Collection Browse: When you want to read through a specific collection systematically ((/suttas/)[/suttas])

Technical Limitations: I Searched for "X" but it didn't return "Y" as expected

The contextual search system works well for most queries, but understanding its strengths helps you get better results.

Example: Specific Action Queries

When searching for "the Buddha waves his hand", you might expect SN16.3 (which contains "Then the Blessed One waved his hand in space"). The search system handles this type of query better when given thematic context, as the blurb-enhanced discovery can surface relevant suttas through their overall themes and situations.

For example, the search: "The Buddha waves his hand and teaches monks how to behave around families" returns /suttas/SN/SN16.3 as expected. However, the more specific query "the Buddha waves his hand" does not return SN16.3 in the top results. This is due to several factors:

Root Causes

Semantic vs. Literal Matching

Vector embeddings capture overall *meaning, not specific details
SN16.3's primary semantic signature is "monastic conduct" and "approaching families"
The hand gesture is <5% of the text and serves as a teaching metaphor

Terminology Mismatch

Query: "buddha waves his hand"
Text: "Blessed One waved his hand"
Semantic models struggle with these terminology differences when not given broader context

Context Weighting

The gesture is brief contextual detail within a larger teaching
3,800+ characters about monastic conduct dilute the gesture's semantic weight
Vector represents the dominant themes, not incidental actions

Technical Implementation

How It Works

When you search, three processes run in parallel:

Semantic Search: Your query gets converted to a vector embedding using Voyage AI's model. This finds suttas with similar conceptual content, even without exact word matches.

Full-Text Search: PostgreSQL searches for exact words and phrases in sutta content using GIN-indexed tsvector columns.

Blurb Search: Vector search runs against curated sutta summaries from Sutta Central, helping surface thematically relevant suttas where key concepts might be diluted in full-text embeddings.

All results merge using Reciprocal Rank Fusion scoring, which balances semantic relevance, keyword accuracy, and thematic context for final ranking.

Technical Sources

Suttas: on abuddhistview sourced from dhammatalks.org, Sutta Central, and Hillside Hermitage
Sutta Central Blurbs: 1,600+ curated summaries from the Bilara repository
Voyage AI voyage-3-large: 1024-dimensional embeddings for semantic understanding
PostgreSQL: Full-text search with tsvector indexing and custom search functions

Future Considerations

For even better literal phrase matching, future improvements could include chunking each sutta into semantic segments instead of embedding entire documents. This would help with very specific queries by preserving details that can get diluted in whole-document embeddings.

The current system handles most practical search scenarios effectively, making such optimizations useful but not essential for typical use cases.

Not an LLM Wrapper

Unlike chatbot-style search that generates responses, this system:

Searches actual sutta content: Every result comes directly from the texts
Preserves original context: You read the actual suttas, not AI-generated summaries
Maintains authenticity: No interpretation or paraphrasing—just the original texts
Provides transparent scoring: You can see exactly how relevant each result is
Offers consistent performance: Deterministic results are based on mathematical similarity, not variable AI generation

The semantic "understanding" comes from embeddings that capture conceptual relationships, not from generative AI that might hallucinate or misinterpret teachings. Of course, it still isn't perfect and may not always return the exact sutta you had in mind, but it significantly enhances the search experience compared to traditional keyword-only methods, and eliminates the issue of hallucination that can occur with LLMs.

You can access the contextual search at /suttas/sutta-search.

If you have any feedback or suggestions, please let me know at contact@abuddhistview.com.

As far as the word "meaning", it's worth noting that vector embeddings don't actually understand meaning in the way humans do. They capture patterns in language use and statistical relationships between words and concepts. This is fundamentally different from human understanding, but "meaning" remains an intuitive way to describe how they function. ↩