Complete architecture guide for building SaaS AI products where each customer's data stays isolated, searchable, and secure.
Multi-tenant RAG is the backbone of every SaaS AI product. Each customer uploads their own documents, and the system must retrieve only from that customer's data, never mixing tenants. Pinecone's namespace and metadata filtering features make this straightforward to implement, but the architecture decisions around isolation models, ingestion pipelines, and access control determine whether your system scales to 10 tenants or 10,000.
Before writing any code, you need to choose an isolation model. Each has different trade-offs for cost, security, and performance.
| Model | How It Works | Security | Cost | Best For |
|---|---|---|---|---|
| Index per Tenant | Separate Pinecone index per customer | Strongest | Highest | Enterprise / regulated |
| Namespace per Tenant | One index, separate namespace per customer | Strong | Medium | Most SaaS products |
| Metadata Filtering | Shared namespace, tenant_id in metadata | Adequate | Lowest | Small-scale / prototypes |
Recommendation:
Use the namespace-per-tenant model for most SaaS products. Pinecone namespaces provide strong query isolation (a query to namespace A will never return results from namespace B), cost efficiency (one index), and simple management. Reserve index-per-tenant for regulated industries like legal AI and healthcare where compliance requires physical data separation.
# Create a single serverless index for all tenants
from pinecone import Pinecone
pc = Pinecone(api_key="YOUR_API_KEY")
pc.create_index(
name="multi-tenant-rag",
dimension=1536, # OpenAI text-embedding-3-small
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
# Upsert documents into tenant-specific namespace
def ingest_document(tenant_id: str, doc_chunks: list):
index = pc.Index("multi-tenant-rag")
vectors = []
for i, chunk in enumerate(doc_chunks):
embedding = get_embedding(chunk.text)
vectors.append({
"id": f"{tenant_id}_{doc_id}_{i}",
"values": embedding,
"metadata": {
"text": chunk.text,
"source": chunk.source_file,
"uploaded_at": chunk.timestamp
}
})
# Namespace = tenant_id ensures isolation
index.upsert(vectors=vectors, namespace=tenant_id)
# Query only within the authenticated tenant's namespace
def retrieve(tenant_id: str, query: str, top_k: int = 5):
index = pc.Index("multi-tenant-rag")
query_embedding = get_embedding(query)
results = index.query(
vector=query_embedding,
top_k=top_k,
namespace=tenant_id, # isolation happens here
include_metadata=True
)
return [match.metadata["text"] for match in results.matches]
The namespace parameter is your data boundary, but you must enforce it at the application layer. Never let the tenant_id come from the client request body. Extract it from the authenticated session or JWT token.
# FastAPI endpoint with tenant extraction from auth
@app.post("/api/query")
async def query_endpoint(
request: QueryRequest,
tenant_id: str = Depends(get_tenant_from_token)
):
# tenant_id comes from JWT, not request body
chunks = retrieve(tenant_id, request.question)
answer = generate_answer(request.question, chunks)
return {"answer": answer, "sources": chunks}
For a deeper comparison of vector database options, see our Pinecone vs Weaviate vs PGVector analysis. For the API layer, our guide on scaling FastAPI covers the patterns for high-throughput AI endpoints.
If the tenant_id is passed in the request body, a malicious user can query another tenant's data. Always derive tenant_id from the authentication layer.
If you fine-tune an embedding model on one tenant's data and use it for all tenants, the model may encode proprietary concepts from the training tenant's documents. Use general-purpose embedding models for multi-tenant deployments.
Without per-tenant metrics, you can't identify which tenant is causing performance issues or excessive costs. Log tenant_id with every query and build dashboards showing query volume, latency, and error rates per tenant.
Pinecone serverless indexes support up to 10,000 namespaces per index. For most SaaS products, this is sufficient. If you exceed this, shard across multiple indexes with a routing layer.
Namespaces for tenant isolation. Metadata filtering for sub-tenant filtering (e.g., filtering by document type, date, or department within a tenant's namespace). They serve different purposes and are often used together.
As you add more tenants with diverse document types, your chunking and retrieval strategy needs tuning. Read our guide on fixing RAG failures with agentic AI for advanced retrieval patterns.
We architect and deploy multi-tenant RAG systems for SaaS companies. From prototype to production at scale.
Start Building