Self-Distillation Fine-Tuning (SDFT): The On-Policy Trick That Makes Continual Learning Finally Work

Self-Distillation SDFT cover hero in research newsroom

Self-Distillation Fine-Tuning (SDFT): The On-Policy Trick That Makes Continual Learning Finally Work Play Introduction Fine-tuning an LLM feels like doing surgery with oven mitts. You make one clean cut, the patient learns a shiny new skill, then you check the vitals and realize it forgot its own name. That is the default behavior of supervised … Read more

Kimi K2.5 Review: Swarm Mode Reality Check, Benchmarks That Matter, And Pricing You’ll Actually Pay

Kimi K2.5 cover showing swarm, benchmarks, pricing

Kimi K2.5 Review: Swarm Mode Reality Check, Benchmarks, and Pricing Play Introduction AI model releases used to be simple: bigger context, higher scores, new logo. Now the real competition is usability. Can the model code without turning your repo into spaghetti. Can it look at a screenshot and stay honest about what it sees. Can … Read more

Qwen3 Max Thinking Review: Heavy Mode, Test-Time Scaling, And Benchmarks Vs GPT-5.2 And Gemini 3 Pro

Qwen3 Max Thinking cover hero with heavy mode report

Qwen3 Max Thinking Review: Heavy Mode, Test-Time Scaling, And Benchmarks Play Introduction Every few months the “reasoning model” race gets a new lap: a flagship shows up, claims it thinks deeper, posts a fresh set of charts, and the internet immediately argues about whether the charts are real. Qwen3 Max Thinking is worth your time … Read more

TTT-Discover Explained: Why Test-Time RL Outruns Best-of-N Sampling

TTT-Discover cover showing test-time RL loop

TTT-Discover Explained: Why Test-Time RL Outruns Best-of-N Sampling Play Introduction You have seen this movie. A model tackles a hard problem, fails, tries again, fails differently, then repeats the same mistake with fresh confidence. You can sample more. You can crank temperature. You can run best of n sampling until the GPU fans sound like … Read more