Mechanistic Interpretability (2026): Reverse-Engineering LLMs Into Features, Circuits, and Causal Traces

mechanistic interpretability cover showing features circuits causal traces

Mechanistic Interpretability (2026): Reverse-Engineering LLMs Play Introduction Mechanistic interpretability is the “take it apart and see how it works” branch of AI interpretability: instead of treating a model as a black box and correlating inputs to outputs, you try to recover the internal computations that produce behavior, down at the level of activations, learned features, … Read more

Anthropic Assistant Axis: What It Is, What It Prevents, And What It Might Break

Anthropic Assistant Axis cover showing assistantness slider and risks

Watch or Listen on YouTube Anthropic Assistant Axis, Persona Drift, Jailbreak Defense 1. Introduction: The Mask Everyone Has Felt Spend a few evenings with chat models and you start noticing the costume changes. Most of the time the model sits in a familiar groove, helpful, tidy, a little polite. Then the conversation veers, and the … Read more

Meta Dr. Zero Explained: The Self-Evolving Search Agent That Trains Without Human SFT Data

Meta Dr. Zero cover showing proposer-solver loop.

Watch or Listen on YouTube Meta Dr. Zero Explained: Self-Evolving Search Agents Without Human Data Introduction Everyone wants “agents” that can look things up, chain multiple steps, and feel like a junior researcher who never sleeps. The annoying part is what it takes to get there: piles of hand-curated instruction data, constant refresh cycles, and … Read more

The AI brain Anatomy: How The Synergistic Core Killed The Stochastic Parrot

AI brain cover showing synergistic core versus periphery

Watch or Listen on YouTube The AI Brain Anatomy: How The Synergistic Core Killed The Stochastic Parrot Hajra, A clinical psychologist research scholar reads the paper, then squints at our favorite arguments. 1. Introduction: The Ghost In The Machine “Stochastic parrot” used to be the healthiest two word reply on the internet. It was a … Read more

Conditional Memory And DeepSeek Engram: When Lookup Beats More Compute

Conditional Memory hero image with DeepSeek Engram dashboard

Watch or Listen on YouTube Conditional Memory And DeepSeek Engram: When Lookup Beats More Compute Introduction Bigger models keep winning, but the reason is not always “more intelligence.” Sometimes it is just less wasted work. The Engram paper makes an almost irritatingly sensible point. Transformers do two jobs at once: they remember stable patterns, and … Read more

Epiplexity vs. Entropy: The New Metric That Explains What AI Actually Learns

Epiplexity vs entropy cover with loss-curve mini infographic

Watch or Listen on YouTube Epiplexity vs. Entropy: The New Metric That Explains What AI Actually Learns Introduction If you’ve ever stared at a loss curve and thought, “Cool, it’s going down, but what did the model actually learn?”, you’re not alone. We’ve trained ourselves to treat lower loss like truth. But a model can … Read more