LLM | Antonio V. Franco

What the NOA-Star Simulation Reveals About Transient Classification

(Antonio V. Franco) A conventional Retrieval-Augmented Generation (RAG) system takes a set of retrieved documents, assembles them into a context window, and asks a large language model (LLM) to produce a classification. This approach works reasonably well until you reorder the same documents and watch the model change its mind. The same astronomical alert, the same collection of scientific papers, light curve templates, and stellar catalogs (with the correct document present in every permutation) can yield radically different classifications depending solely on the order in which the documents are presented. A supernova type Ia might emerge in one configuration, a kilonova in another, and an RR Lyrae variable star in a third. This phenomenon, which the Stable-RAG paper (Zhang et al., arXiv:2601.02993) calls permutation-induced hallucination, is more than a curiosity. In the field of transient astronomy, it means allocating precious telescope time to the wrong object, missing the electromagnetic counterpart of a gravitational wave signal, or polluting the Transient Name Server with misclassifications that confuse the entire community. The NOA-Star system was built precisely to solve this problem. ...

Why Specialized SLMs Are the Standard From 2026 Onward

(Antonio V. Franco) If there is one thing I have learned over the last three years, it is that technology does not forgive naivety. And I was naive (quite expensively, in fact). In 2023, when ChatGPT exploded and the entire world woke up to the existence of large language models, I did exactly what everyone else did: I threw myself headfirst into the giants’ race. It was as if there were a silent, collective competition to see who could accumulate more parameters, more context tokens, more raw computational power. If your company was not paying a fortune in monthly API calls or building an entire GPU cluster to run the model with the most zeros in the spec sheet, you were “falling behind.” That was the mantra repeated in every keynote, every corporate blog post, every networking conversation about AI. ...

Spectral Classification Stability Engine: From Idea to Results

(Antonio V. Franco) It all started with a pun. Not just any pun (but one of those that makes more sense the longer you think about it). The Stable-RAG method, proposed by Zhang et al. (2026), uses spectral clustering on the hidden states of language models to identify distinct reasoning modes. Astronomy, in turn, studies spectral features of celestial objects (emission lines, absorption lines, the continuum) as the primary diagnostic tool for classification. Two meanings of the same word, “spectral,” belonging to seemingly distant domains. ...

Stable-RAG Benchmark: Your RAG System Lies Depending on Document Order.

(Antonio V. Franco) You built a solid RAG pipeline. Semantic retrieval with embeddings, lexical search via BM25, cross-encoder reranking, everything dialed in. The right documents reach the model. The answer comes out. You trust it. But what if I told you that simply shuffling the order of the documents in the context (without removing or adding a single one) makes the model give a completely different answer? And worse: a wrong answer. ...