TurboQuant Explained: How Google’s “Random Rotation” Trick Shrinks AI Memory by 6x

TurboQuant feature image showing rotated vectors compressed into KV cache memory blocks

KV Cache Compression: Recall vs. Memory Needle-in-Haystack benchmark · Llama-3.1-8B-Instruct · context up to 104k tokens Best recall 0.997 TurboQuant = full precision Memory at 3.5-bit 6x smaller KV cache GPU speedup 8x on H100 at 4-bit Needle-in-Haystack recall score KV cache size (bits) Tested on Llama-3.1-8B-Instruct · Needle-in-Haystack benchmark · context up to 104k … Read more

Residual Connections Rethought: How Kimi’s ‘Attention Residuals’ Fixed a 10-Year-Old Transformer Flaw

Residual connections feature image showing attention-based depth routing in transformer layers

Standard residuals Fixed, uniform weights Embedding h₁ Layer 1 Layer 2 Layer 3 Each layer only sees the accumulated sum Attention Residuals Learned, input-dependent Embedding h₁ Layer 1 Layer 2 Layer 3 Layer 3 selectively attends to any earlier layer Residual connections are one of those rare ideas in deep learning that became so successful, … Read more

Is X Biased? A Nature Study Just Ran the Definitive Test

Is X biased feature showing split chronological vs algorithmic feed and a ranking dial

Mini Chart Effect sizes when people switched from chronological to algorithmic (standard deviations) Engagement 0.14 Policy agenda (right) 0.11 Ukraine (pro-Kremlin) 0.12 Trump probes (unfair) 0.08 All policy/news index 0.12 Partisanship 0.00 Affective polarization 0.00 People ask Is X biased the way they ask if the weather is “angry.” It’s a vibe question, born from … Read more