Stable-RAG Benchmark: Your RAG System Lies Depending on Document Order.

(Antonio V. Franco) You built a solid RAG pipeline. Semantic retrieval with embeddings, lexical search via BM25, cross-encoder reranking, everything dialed in. The right documents reach the model. The answer comes out. You trust it. But what if I told you that simply shuffling the order of the documents in the context (without removing or adding a single one) makes the model give a completely different answer? And worse: a wrong answer. ...

April 2026 · 9 min · Antonio V. Franco