Residual Connections Rethought: How Kimi’s ‘Attention Residuals’ Fixed a 10-Year-Old Transformer Flaw

Residual connections feature image showing attention-based depth routing in transformer layers

Standard residuals Fixed, uniform weights Embedding h₁ Layer 1 Layer 2 Layer 3 Each layer only sees the accumulated sum Attention Residuals Learned, input-dependent Embedding h₁ Layer 1 Layer 2 Layer 3 Layer 3 selectively attends to any earlier layer Residual connections are one of those rare ideas in deep learning that became so successful, … Read more

Is X Biased? A Nature Study Just Ran the Definitive Test

Is X biased feature showing split chronological vs algorithmic feed and a ranking dial

Mini Chart Effect sizes when people switched from chronological to algorithmic (standard deviations) Engagement 0.14 Policy agenda (right) 0.11 Ukraine (pro-Kremlin) 0.12 Trump probes (unfair) 0.08 All policy/news index 0.12 Partisanship 0.00 Affective polarization 0.00 People ask Is X biased the way they ask if the weather is “angry.” It’s a vibe question, born from … Read more