Published on

Sutta Finder: Find any Sutta with vector search

Authors
  • avatar
    Name
    A Buddhist View
    Twitter

Contextual Search: Finding Suttas based on context and meaning

The contextual search system finds suttas by mapping the meaning1 behind your queries, rather than just matching keywords. This is a hybrid search system that combines semantic vector search with traditional full-text search — it's not an LLM wrapper that generates responses.

The search tool can be found at /suttas/sutta-search.

What Makes This Different

Traditional search looks for exact word matches. If you search for "sila," you'll only find suttas that literally contain that word. The hybrid search maps concepts and relationships through vector embeddings, allowing you to find relevant suttas even if they don't use the exact terms you typed:

  • "teachings about progress along the path" finds suttas about the gradual training, even if they don't use those exact words
  • "suttas where Buddha talks to kings" returns suttas containing conversations with rulers
  • "What's the sutta where 'x'" allows you to find a sutta based on a specific teaching or story, even if you don't remember the exact title or wording

Note: If you just want to quickly jump to a specific sutta by ID (like "MN1" or "DN34"), use the quick search (Cmd/Ctrl+K) instead. The hybrid search is designed for deeper exploration and discovery of teachings based on context and thematic connections.

Getting Started

Visit the Sutta Search page.

Example Queries That Work Well

Here are some natural language queries that work well:

By Topic or Theme:

  • "suttas about sense restraint and the danger of sensual pleasures"
  • "teachings on compassion and metta"
  • "suttas explaining paticcasamuppada"

By Context or Situation:

  • "the Buddha speaks to King Pasenadi"
  • "conversations with lay disciples about ethics"
  • "suttas between 2 monks discussing discussing consciousness"

By Specific Teachings:

  • "the Buddha compares right effort to tuning a lute"
  • "the buddha use parables about farming to explain the path"
  • "the senses are like 6 wild animals tied together"

Understanding the Results

Search results show several key pieces of information:

  • Score: How well the sutta matches your query (higher = better match)
  • Search Type:
    • Enhanced Hybrid = combining semantic understanding + keyword matching + blurb context
    • Hybrid = combining semantic understanding + keyword matching
    • Full-text only = keyword matching when semantic search unavailable
      • This should only happen for very obscure queries with no semantic matches or if the API is down
  • Performance: How long the search took, including embedding generation time
  • Content Summary: A snippet showing a summary of the suttas (for suttas that have one)

Tips for Better Results

Effective Query Patterns

Be specific about what you're looking for:

  • ✅ "suttas about the four noble truths given to monks"
  • ❌ "Buddha teachings"

Use natural language:

  • ✅ "suttas where Buddha helps a monk attain right view"
  • ❌ "Buddha help suffering stories"

Include context when helpful:

  • ✅ "teachings given to monks about abandoning the world"
  • ✅ "advice for lay practitioners on ethical conduct"

When to Use Different Search Types

  • Contextual Search: When exploring themes, concepts, or situations
  • Quick Search (Cmd/Ctrl+K): When you know the specific sutta name or ID (like "MN1", "DN34" or "Rahulasutta")
  • Collection Browse: When you want to read through a specific collection systematically ((/suttas/)[/suttas])

Technical Limitations: I Searched for "X" but it didn't return "Y" as expected

The contextual search system works well for most queries, but understanding its strengths helps you get better results.

Example: Specific Action Queries

When searching for "the Buddha waves his hand", you might expect SN16.3 (which contains "Then the Blessed One waved his hand in space"). The search system handles this type of query better when given thematic context, as the blurb-enhanced discovery can surface relevant suttas through their overall themes and situations.

For example, the search: "The Buddha waves his hand and teaches monks how to behave around families" returns /suttas/SN/SN16.3 as expected. However, the more specific query "the Buddha waves his hand" does not return SN16.3 in the top results. This is due to several factors:

Root Causes

  1. Semantic vs. Literal Matching
  • Vector embeddings capture overall *meaning, not specific details
  • SN16.3's primary semantic signature is "monastic conduct" and "approaching families"
  • The hand gesture is <5% of the text and serves as a teaching metaphor
  1. Terminology Mismatch
  • Query: "buddha waves his hand"
  • Text: "Blessed One waved his hand"
  • Semantic models struggle with these terminology differences when not given broader context
  1. Context Weighting
  • The gesture is brief contextual detail within a larger teaching
  • 3,800+ characters about monastic conduct dilute the gesture's semantic weight
  • Vector represents the dominant themes, not incidental actions

Technical Implementation

How It Works

The search combines two complementary approaches:

Semantic Vector Search

  • Uses vector embeddings to convert your query into a mathematical representation of its conceptual patterns
  • Finds suttas with similar semantic content using vector similarity
  • Captures broader themes, relationships, and contexts beyond exact words

Full-Text Search

  • PostgreSQL's built-in text search for exact words and phrases
  • Handles specific terms, names, and direct quotes with linguistic variations
  • Fast and precise for targeted searches

Blurb-Enhanced Discovery

  • Sutta summaries provide thematic context and improved discovery
  • Over 1,600 suttas include blurbs that highlight key themes and situations
  • Blurb embeddings help surface thematically relevant suttas by capturing the essence of each teaching, especially for long suttas where specific details may be diluted in full-text or semantic search

Reciprocal Rank Fusion

  • All three search types (semantic, full-text, blurb) run simultaneously and results are merged using RRF scoring
  • The system balances semantic relevance, keyword accuracy, and thematic context
  • Final ranking considers all approaches to provide comprehensive, contextually relevant results

Current Architecture

The search system includes several components working together:

  • Sutta Central Blurbs for over 1,600 suttas provide rich contextual information
  • Separate blurb embeddings help surface thematically relevant suttas even when specific terms don't match
  • Hybrid ranking combines semantic similarity, keyword matching, and thematic context

Future Considerations

For even better literal phrase matching, future improvements could include chunking each sutta into semantic segments instead of embedding entire documents. This would help with very specific queries by preserving details that can get diluted in whole-document embeddings.

The current system handles most practical search scenarios effectively, making such optimizations useful but not essential for typical use cases.

Not an LLM Wrapper

Unlike chatbot-style search that generates responses, this system:

  • Searches actual sutta content: Every result comes directly from the texts
  • Preserves original context: You read the actual suttas, not AI-generated summaries
  • Maintains authenticity: No interpretation or paraphrasing—just the original texts
  • Provides transparent scoring: You can see exactly how relevant each result is
  • Offers consistent performance: Deterministic results are based on mathematical similarity, not variable AI generation

The semantic "understanding" comes from embeddings that capture conceptual relationships, not from generative AI that might hallucinate or misinterpret teachings. Of course, it still isn't perfect and may not always return the exact sutta you had in mind, but it significantly enhances the search experience compared to traditional keyword-only methods, and eliminates the issue of hallucination that can occur with LLMs.


You can access the contextual search at /suttas/sutta-search.

If you have any feedback or suggestions, please let me know at contact@abuddhistview.com.

Footnotes

  1. As far as the word "meaning", it's worth noting that vector embeddings don't actually understand meaning in the way humans do. They capture patterns in language use and statistical relationships between words and concepts. This is fundamentally different from human understanding, but "meaning" remains an intuitive way to describe how they function.