AI Engineering

What the NOA-Star Simulation Reveals About Transient Classification

(Antonio V. Franco) A conventional Retrieval-Augmented Generation (RAG) system takes a set of retrieved documents, assembles them into a context window, and asks a large language model (LLM) to produce a classification. This approach works reasonably well until you reorder the same documents and watch the model change its mind. The same astronomical alert, the same collection of scientific papers, light curve templates, and stellar catalogs (with the correct document present in every permutation) can yield radically different classifications depending solely on the order in which the documents are presented. A supernova type Ia might emerge in one configuration, a kilonova in another, and an RR Lyrae variable star in a third. This phenomenon, which the Stable-RAG paper (Zhang et al., arXiv:2601.02993) calls permutation-induced hallucination, is more than a curiosity. In the field of transient astronomy, it means allocating precious telescope time to the wrong object, missing the electromagnetic counterpart of a gravitational wave signal, or polluting the Transient Name Server with misclassifications that confuse the entire community. The NOA-Star system was built precisely to solve this problem. ...

Why Specialized SLMs Are the Standard From 2026 Onward

(Antonio V. Franco) If there is one thing I have learned over the last three years, it is that technology does not forgive naivety. And I was naive (quite expensively, in fact). In 2023, when ChatGPT exploded and the entire world woke up to the existence of large language models, I did exactly what everyone else did: I threw myself headfirst into the giants’ race. It was as if there were a silent, collective competition to see who could accumulate more parameters, more context tokens, more raw computational power. If your company was not paying a fortune in monthly API calls or building an entire GPU cluster to run the model with the most zeros in the spec sheet, you were “falling behind.” That was the mantra repeated in every keynote, every corporate blog post, every networking conversation about AI. ...

Stable-RAG Benchmark: Your RAG System Lies Depending on Document Order.

(Antonio V. Franco) You built a solid RAG pipeline. Semantic retrieval with embeddings, lexical search via BM25, cross-encoder reranking, everything dialed in. The right documents reach the model. The answer comes out. You trust it. But what if I told you that simply shuffling the order of the documents in the context (without removing or adding a single one) makes the model give a completely different answer? And worse: a wrong answer. ...

RAG with Open-Source LLMs: Why You're Paying Too Much to Hand Your Data to Someone Else

(Antonio V. Franco) Every time someone asks me whether it’s worth using GPT‑5.5 or Opus 4.7 in a RAG pipeline, I ask them the same question back: would you trust your most sensitive documents to a foreign company that’s subject to the CLOUD Act? The answer, almost always, is an uncomfortable silence. Commercial models are good. Really good. But RAG is, by definition, an operation that involves sensitive data (internal documents, contracts, customer histories, regulatory information) flowing through infrastructure you don’t control. Every query your system makes to the OpenAI or Anthropic API carries the retrieved context and the generated response along with it. And that context, almost all the time, contains exactly the kind of information that data protection regulations are trying to shield. ...