Skip to content

RAG

create_rag(documents, top_k=, dense=, embed_fn=, reranker=) builds a retrieval pipeline. By default it uses BM25 keyword search (Okapi BM25 + a conservative suffix stemmer so refunds/refund match) — instant, offline, no extra deps. Dense vector search and reranking are opt-in.

from largestack.rag import create_rag

docs = [
    "Refunds are available within 30 days of purchase.",
    "Our warranty covers manufacturing defects for 12 months.",
    "Shipping is free for orders over fifty dollars.",
]
rag = create_rag(docs, top_k=2)
hits = rag.retrieve("how long is the refund window?")
for h in hits:
    print(round(h["score"], 3), h["text"])

retrieve() returns a list of {"text", "score", "index"} dicts (when a reranker is set, a "rerank_score" is added too).

create_rag arguments

Arg Default Notes
documents None corpus to chunk + index (call .ingest(docs) later to add)
chunk_size 512 recursive chunker target size (chars)
top_k 5 results returned by retrieve() / used by build_context()
dense False True (or "auto") enables hybrid BM25 + dense via a local sentence-transformers model (all-MiniLM-L6-v2). No-op fallback to BM25 if the package isn't installed
embed_fn None your own sync str -> list[float] embedder; enables hybrid retrieval without sentence-transformers
reranker None a Reranker (see below) to re-score candidates after retrieval

Hybrid retrieval fuses BM25 and dense rankings with Reciprocal Rank Fusion (RRF). Dense is opt-in: pip install largestack[rag] for sentence-transformers, or pass your own embed_fn.

Methods

Method Returns Notes
retrieve(query, top_k=None) list[dict] {"text", "score", "index"} (+"rerank_score" if reranking)
build_context(query, top_k=None) str retrieved chunks joined as [Source 1] ... [Source 2] ...
as_tool() @tool a search_knowledge(query) tool you can hand to an Agent
ingest(documents) chunk + (re)index more documents
print(rag.build_context("refund window"))
# [Source 1] Refunds are available within 30 days of purchase.

tool = rag.as_tool()        # name: "search_knowledge" — pass to Agent(tools=[tool])

Reranking (opt-in)

Pass a Reranker to re-score retrieved candidates for precision. The default keyword mode (TF-IDF + n-gram overlap) needs no extra deps; other modes are opt-in.

from largestack.rag import create_rag
from largestack._rag.reranker import Reranker

rag = create_rag(docs, top_k=2, reranker=Reranker(mode="keyword"))
hits = rag.retrieve("refund window")   # each hit now has a "rerank_score"
mode Backend Deps / auth
"keyword" TF-IDF + n-gram overlap none (default)
"cross_encoder" local BAAI/bge-reranker-v2-m3 sentence-transformers
"cohere" Cohere Rerank API COHERE_API_KEY (or LARGESTACK_COHERE_API_KEY)
"voyage" Voyage AI Rerank API VOYAGE_API_KEY (or LARGESTACK_VOYAGE_API_KEY)
"custom" your custom_fn(query, docs) none

API/model modes fall back to keyword if the key/package is missing.

Citations

For per-sentence citation mapping against trusted sources, use the secure pipeline, which returns grounded answers with citations/sources. See Secure RAG agent.

Vector-store backends

largestack._vectorstores ships 18 adapters that all implement the same async VectorStore interface (upsert / query(vector, top_k, filter) / delete / close), so they're interchangeable. They are not auto-wired into create_ragcreate_rag retrieves over in-memory chunks (BM25, + dense embeddings when enabled). Wire a store in your own code (e.g. embed chunks, upsert, then query per request).

Store Backend Install / requires
PineconeStore Pinecone (asyncio) pip install pinecone[asyncio]; PINECONE_API_KEY
WeaviateStore Weaviate v4 pip install weaviate-client>=4.7
PgVectorStore Postgres + pgvector pip install asyncpg; pgvector extension
MilvusStore Milvus pip install pymilvus>=2.4
RedisVectorStore Redis Stack / RediSearch pip install redis>=5.0
ElasticsearchStore Elasticsearch pip install 'elasticsearch[async]>=8.0'
ElasticsearchDenseVectorStore ES native dense_vector + kNN pip install 'elasticsearch[async]>=8.0'
OpenSearchStore OpenSearch pip install 'opensearch-py>=2.4'
MongoDBAtlasStore MongoDB Atlas pip install motor>=3.5
MongoAtlasVectorStore MongoDB Atlas Vector Search pip install motor>=3.5
ChromaStore Chroma pip install chromadb>=0.5
LanceDBStore LanceDB pip install lancedb>=0.13
AzureCognitiveSearchStore Azure AI Search pip install azure-search-documents>=11.4
QdrantStore Qdrant (asyncio) pip install qdrant-client; QDRANT_API_KEY (cloud)
FaissPersistentStore FAISS with disk persistence pip install faiss-cpu
DuckDBVectorStore DuckDB + vss pip install duckdb>=0.10
SupabaseVectorStore Supabase (pgvector wrapper) pip install asyncpg
AuroraPgVectorStore AWS Aurora Postgres + pgvector pip install asyncpg
from largestack._vectorstores import QdrantStore

async def example():
    store = QdrantStore(collection="docs", url="http://localhost:6333")
    await store.upsert([{"id": "1", "vector": [0.1, 0.2], "metadata": {"src": "faq"}}])
    results = await store.query(vector=[0.1, 0.2], top_k=5)
    await store.close()

Each adapter reports cleanly if its underlying SDK isn't installed, so importing largestack._vectorstores never fails at startup even with no DB clients present.

Maturity Boundaries

Do not market RAG as fully enterprise-hardened until these gates have fresh release evidence:

Area Current public claim Required hardening proof
Retrieval Local retrieval works with evaluation coverage Production-scale corpus benchmark with latency and recall targets
Reranking Rerank path exists Non-regression benchmark across representative corpora
Citation confidence Citation presence is tested Confidence calibration against labeled answer/citation pairs
Tenant filtering Tenant-aware paths exist Cross-tenant leakage tests for every persistent vector backend
Metadata indices Metadata filters are supported in selected paths Backend-specific index/filter validation at scale
GraphRAG Experimental/conceptual Real graph construction, query tests, and failure-mode docs