Posts

What the NOA-Star Simulation Reveals About Transient Classification

(Antonio V. Franco) A conventional Retrieval-Augmented Generation (RAG) system takes a set of retrieved documents, assembles them into a context window, and asks a large language model (LLM) to produce a classification. This approach works reasonably well until you reorder the same documents and watch the model change its mind. The same astronomical alert, the same collection of scientific papers, light curve templates, and stellar catalogs (with the correct document present in every permutation) can yield radically different classifications depending solely on the order in which the documents are presented. A supernova type Ia might emerge in one configuration, a kilonova in another, and an RR Lyrae variable star in a third. This phenomenon, which the Stable-RAG paper (Zhang et al., arXiv:2601.02993) calls permutation-induced hallucination, is more than a curiosity. In the field of transient astronomy, it means allocating precious telescope time to the wrong object, missing the electromagnetic counterpart of a gravitational wave signal, or polluting the Transient Name Server with misclassifications that confuse the entire community. The NOA-Star system was built precisely to solve this problem. ...

Why Specialized SLMs Are the Standard From 2026 Onward

(Antonio V. Franco) If there is one thing I have learned over the last three years, it is that technology does not forgive naivety. And I was naive (quite expensively, in fact). In 2023, when ChatGPT exploded and the entire world woke up to the existence of large language models, I did exactly what everyone else did: I threw myself headfirst into the giants’ race. It was as if there were a silent, collective competition to see who could accumulate more parameters, more context tokens, more raw computational power. If your company was not paying a fortune in monthly API calls or building an entire GPU cluster to run the model with the most zeros in the spec sheet, you were “falling behind.” That was the mantra repeated in every keynote, every corporate blog post, every networking conversation about AI. ...

Can Memory Make an AI Worse? My Benchmark with Qwen3.5-9B and Stellar Classification

(Antonio V. Franco) I ran 135 celestial object classification tasks using three memory approaches. The result was counterintuitive. It seems like common sense: if an artificial intelligence agent learns from its past experiences, it should become better over time. Each solved problem becomes a reference, a reusable pattern that accelerates and sharpens future decisions. This is precisely the intuition behind the ReasoningBank paper (Ouyang et al., ICLR 2026), a system that stores reasoning strategies in a memory bank and retrieves them when facing similar tasks. The promise is seductive and aligns with how human experts build expertise. ...

Spectral Classification Stability Engine: From Idea to Results

(Antonio V. Franco) It all started with a pun. Not just any pun (but one of those that makes more sense the longer you think about it). The Stable-RAG method, proposed by Zhang et al. (2026), uses spectral clustering on the hidden states of language models to identify distinct reasoning modes. Astronomy, in turn, studies spectral features of celestial objects (emission lines, absorption lines, the continuum) as the primary diagnostic tool for classification. Two meanings of the same word, “spectral,” belonging to seemingly distant domains. ...

Stable-RAG Benchmark: Your RAG System Lies Depending on Document Order.

(Antonio V. Franco) You built a solid RAG pipeline. Semantic retrieval with embeddings, lexical search via BM25, cross-encoder reranking, everything dialed in. The right documents reach the model. The answer comes out. You trust it. But what if I told you that simply shuffling the order of the documents in the context (without removing or adding a single one) makes the model give a completely different answer? And worse: a wrong answer. ...

RAG with Open-Source LLMs: Why You're Paying Too Much to Hand Your Data to Someone Else

(Antonio V. Franco) Every time someone asks me whether it’s worth using GPT‑5.5 or Opus 4.7 in a RAG pipeline, I ask them the same question back: would you trust your most sensitive documents to a foreign company that’s subject to the CLOUD Act? The answer, almost always, is an uncomfortable silence. Commercial models are good. Really good. But RAG is, by definition, an operation that involves sensitive data (internal documents, contracts, customer histories, regulatory information) flowing through infrastructure you don’t control. Every query your system makes to the OpenAI or Anthropic API carries the retrieved context and the generated response along with it. And that context, almost all the time, contains exactly the kind of information that data protection regulations are trying to shield. ...