RAG
create_rag(documents, top_k=, dense=, embed_fn=, reranker=) builds a retrieval
pipeline. By default it uses BM25 keyword search (Okapi BM25 + a conservative
suffix stemmer so refunds/refund match) — instant, offline, no extra deps.
Dense vector search and reranking are opt-in.
from largestack.rag import create_rag
docs = [
"Refunds are available within 30 days of purchase.",
"Our warranty covers manufacturing defects for 12 months.",
"Shipping is free for orders over fifty dollars.",
]
rag = create_rag(docs, top_k=2)
hits = rag.retrieve("how long is the refund window?")
for h in hits:
print(round(h["score"], 3), h["text"])
retrieve() returns a list of {"text", "score", "index"} dicts (when a reranker
is set, a "rerank_score" is added too).
create_rag arguments
| Arg | Default | Notes |
|---|---|---|
documents |
None |
corpus to chunk + index (call .ingest(docs) later to add) |
chunk_size |
512 |
recursive chunker target size (chars) |
top_k |
5 |
results returned by retrieve() / used by build_context() |
dense |
False |
True (or "auto") enables hybrid BM25 + dense via a local sentence-transformers model (all-MiniLM-L6-v2). No-op fallback to BM25 if the package isn't installed |
embed_fn |
None |
your own sync str -> list[float] embedder; enables hybrid retrieval without sentence-transformers |
reranker |
None |
a Reranker (see below) to re-score candidates after retrieval |
Hybrid retrieval fuses BM25 and dense rankings with Reciprocal Rank Fusion (RRF).
Dense is opt-in: pip install largestack[rag] for sentence-transformers, or pass
your own embed_fn.
Methods
| Method | Returns | Notes |
|---|---|---|
retrieve(query, top_k=None) |
list[dict] |
{"text", "score", "index"} (+"rerank_score" if reranking) |
build_context(query, top_k=None) |
str |
retrieved chunks joined as [Source 1] ... [Source 2] ... |
as_tool() |
@tool |
a search_knowledge(query) tool you can hand to an Agent |
ingest(documents) |
— | chunk + (re)index more documents |
print(rag.build_context("refund window"))
# [Source 1] Refunds are available within 30 days of purchase.
tool = rag.as_tool() # name: "search_knowledge" — pass to Agent(tools=[tool])
Reranking (opt-in)
Pass a Reranker to re-score retrieved candidates for precision. The default
keyword mode (TF-IDF + n-gram overlap) needs no extra deps; other modes are opt-in.
from largestack.rag import create_rag
from largestack._rag.reranker import Reranker
rag = create_rag(docs, top_k=2, reranker=Reranker(mode="keyword"))
hits = rag.retrieve("refund window") # each hit now has a "rerank_score"
mode |
Backend | Deps / auth |
|---|---|---|
"keyword" |
TF-IDF + n-gram overlap | none (default) |
"cross_encoder" |
local BAAI/bge-reranker-v2-m3 |
sentence-transformers |
"cohere" |
Cohere Rerank API | COHERE_API_KEY (or LARGESTACK_COHERE_API_KEY) |
"voyage" |
Voyage AI Rerank API | VOYAGE_API_KEY (or LARGESTACK_VOYAGE_API_KEY) |
"custom" |
your custom_fn(query, docs) |
none |
API/model modes fall back to keyword if the key/package is missing.
Citations
For per-sentence citation mapping against trusted sources, use the secure pipeline,
which returns grounded answers with citations/sources. See
Secure RAG agent.
Vector-store backends
largestack._vectorstores ships 18 adapters that all implement the same async
VectorStore interface (upsert / query(vector, top_k, filter) / delete /
close), so they're interchangeable. They are not auto-wired into create_rag —
create_rag retrieves over in-memory chunks (BM25, + dense embeddings when enabled).
Wire a store in your own code (e.g. embed chunks, upsert, then query per request).
| Store | Backend | Install / requires |
|---|---|---|
PineconeStore |
Pinecone (asyncio) | pip install pinecone[asyncio]; PINECONE_API_KEY |
WeaviateStore |
Weaviate v4 | pip install weaviate-client>=4.7 |
PgVectorStore |
Postgres + pgvector | pip install asyncpg; pgvector extension |
MilvusStore |
Milvus | pip install pymilvus>=2.4 |
RedisVectorStore |
Redis Stack / RediSearch | pip install redis>=5.0 |
ElasticsearchStore |
Elasticsearch | pip install 'elasticsearch[async]>=8.0' |
ElasticsearchDenseVectorStore |
ES native dense_vector + kNN |
pip install 'elasticsearch[async]>=8.0' |
OpenSearchStore |
OpenSearch | pip install 'opensearch-py>=2.4' |
MongoDBAtlasStore |
MongoDB Atlas | pip install motor>=3.5 |
MongoAtlasVectorStore |
MongoDB Atlas Vector Search | pip install motor>=3.5 |
ChromaStore |
Chroma | pip install chromadb>=0.5 |
LanceDBStore |
LanceDB | pip install lancedb>=0.13 |
AzureCognitiveSearchStore |
Azure AI Search | pip install azure-search-documents>=11.4 |
QdrantStore |
Qdrant (asyncio) | pip install qdrant-client; QDRANT_API_KEY (cloud) |
FaissPersistentStore |
FAISS with disk persistence | pip install faiss-cpu |
DuckDBVectorStore |
DuckDB + vss |
pip install duckdb>=0.10 |
SupabaseVectorStore |
Supabase (pgvector wrapper) | pip install asyncpg |
AuroraPgVectorStore |
AWS Aurora Postgres + pgvector | pip install asyncpg |
from largestack._vectorstores import QdrantStore
async def example():
store = QdrantStore(collection="docs", url="http://localhost:6333")
await store.upsert([{"id": "1", "vector": [0.1, 0.2], "metadata": {"src": "faq"}}])
results = await store.query(vector=[0.1, 0.2], top_k=5)
await store.close()
Each adapter reports cleanly if its underlying SDK isn't installed, so importing
largestack._vectorstores never fails at startup even with no DB clients present.
Maturity Boundaries
Do not market RAG as fully enterprise-hardened until these gates have fresh release evidence:
| Area | Current public claim | Required hardening proof |
|---|---|---|
| Retrieval | Local retrieval works with evaluation coverage | Production-scale corpus benchmark with latency and recall targets |
| Reranking | Rerank path exists | Non-regression benchmark across representative corpora |
| Citation confidence | Citation presence is tested | Confidence calibration against labeled answer/citation pairs |
| Tenant filtering | Tenant-aware paths exist | Cross-tenant leakage tests for every persistent vector backend |
| Metadata indices | Metadata filters are supported in selected paths | Backend-specific index/filter validation at scale |
| GraphRAG | Experimental/conceptual | Real graph construction, query tests, and failure-mode docs |