The article describes a turning point in AI, where large language models are moving from the cloud into personal devices like phones. This shift challenges the assumption that AI must reside in data centers.
Liquid AI, an MIT AI startup spin-out, is leading this movement with an audacious thesis: the future of AI is edge native, not cloud domiciled. Their goal is to shrink state-of-the-art language models by at least one order of magnitude and make them run anywhere. This involves a fundamental redesign of network architectures, not just compression or optimization.
The inspiration for Liquid AI’s approach came from the humble nematode C. elegans, specifically its 302-neuron nervous system. Researchers observed that neurons in this system don’t fire in lockstep but have adaptive time constants, allowing them to handle information across multiple time scales. This led to the development of “Liquid Time Constant Networks”.
Liquid AI’s primary weapon is a family of “liquid” neural networks that borrow from physics and dynamical systems theory, differing from the common Transformer architecture. These networks embody two technical attributes: adaptive time constants (neurons adjust their receptive window) and a continuous-time formulation (hidden states modelled as solutions to differential equations). These properties are claimed to give liquid nets a natural advantage in handling non-i.i.d. data like control systems or conversations.
The article explains why the Transformer architecture, despite its brilliance, has become too heavy for many tasks, particularly on edge devices. Its global self-attention mechanism incurs a quadratic computational cost with respect to sequence length, which is manageable in the cloud but causes performance issues and battery drain on phones.
While others tackle this problem through exotic hardware or incremental mathematical optimizations, Liquid AI proposes a third path: abandon the attention primitive and use more tractable operations like convolutions and state-space kernels. Their flagship architecture is Hyena.
The Hyena layer involves two key techniques: data-controlled gating to manage channels and implicit long convolutions that operate efficiently in the frequency domain. This results in a worst-case compute cost of O(n log n), often linear in practice. Although convolutions were pre-Transformer technology, Hyena’s use of millions of taps enabled by implicit parametrization and its gating mechanism injecting data-dependent non-stationarity make it fundamentally different from older convolutional networks.
Hyena models show significant efficiency gains compared to equal-sized Transformers:
- At 2K tokens, Hyena matches Transformer perplexity with about 80% of the FLOPs.
- At 64K tokens, Hyena is 80–100× faster in wall-clock inference.
- On a Samsung Galaxy S24 Ultra, Hyena Edge showed 30% faster throughput on long chunks and measurable battery savings compared to an optimized MobileBERT baseline.
The edge-first philosophy is driven by several key benefits:
- Latency feels like intelligence: Local inference drastically reduces the delay, making interactions feel conversational.
- Privacy is a feature, not an afterthought: Data remains on the device, avoiding compliance issues and data residency nightmares. Liquid AI envisions deploying models like a 3 billion-parameter Liquid Foundation Model (LFM 3B) directly on premise for use cases like healthcare.
- Energy equals opportunity: Models that consume low power can be embedded in devices like wearables, cars, and drones where power-hungry GPUs are not feasible.
Venture capital has taken notice, with Liquid AI receiving significant funding, including a $250 million Series A round led by AMD Ventures, valuing the company at over $2 billion. This investment surge is attributed to factors like chip synergy with AMD’s edge focus, enterprise demand for on-prem privacy solutions, and the compelling contrarian narrative.
The competitive landscape includes hardware maximalists, Transformer refiners, and architecture rebels like Liquid AI. While cross-pollination occurs, Liquid AI represents a shift towards different network families.
The article raises some unanswered questions and challenges for Liquid AI, including:
- The cost of pretraining Liquid AI models.
- Whether reinforcement learning with human feedback (RLHF) works effectively on their architecture.
- The tooling gap, as the PyTorch ecosystem is less optimized for their specific operations.
- Building community traction against the established Transformer ecosystem.
The author describes a personal project porting a document search assistant to a local Hyena Edge model, finding it offered good latency and memory usage on a laptop and excelled at handling long contexts offline, making it preferable for that specific use case.
Overall, the article frames Liquid AI as a key player in a second wave of AI that prioritizes frugality and efficiency over sheer scale. The mantra shifts from scaling laws to “Sustainable intelligence demands thermodynamic humility”. The future vision is one where AI is ambient, integrated into everyday devices, perhaps starting with a bifurcation where cloud models handle creativity and edge-native models dominate latency-critical, privacy-sensitive applications.
Introduction
I was on the Metro Line heading into Town the other week, scrolling through a flurry of GitHub issues, when my phone buzzed with a freshly compiled demo from a friend.
“Try this,” the message read, “and turn off Wi Fi.”
I did. The little chat window loaded instantly, no cloud icon spinning in the corner. I typed a paragraph-long prompt, half expecting the whir of remote GPUs to kick in.
Instead, the reply landed before the carriage lurched out of Kendall Square. Zero latency, airplane mode engaged, battery barely dented.
That moment—quiet, almost mundane—felt like a turning point. The large language model had finally moved into the handset, and with it a decade of assumptions about where intelligence ought to live started to wobble.
The team behind that demo is Liquid AI, a two year old MIT spin out with an audacious thesis: the future of AI is edge native, not cloud domiciled. Their weapon of choice is a family of “liquid” neural networks—architectures that borrow more from physics and dynamical systems theory than from the by-now familiar Transformer blueprint.
Table of Contents
1. From Worm Brain to Startup Term Sheet
A quick origin story sets the stage.
In 2020, three MIT researchers—Ramin Hasani, Mathias Lechner, and Alexander Amini—published a modestly titled paper on “Liquid Time Constant Networks.” The inspiration was the humble nematode C. elegans, whose 302-neuron nervous system somehow pulls off all the survival behaviors a worm requires. The key idea: neurons don’t have to fire in lockstep; they can carry their own adaptive time constants, a trick that lets a tiny circuit juggle information across multiple scales of time.
The paper fizzled in the hype cycle—2020 belonged to GPT-3—but it seeded a conviction: there was room for qualitatively different network families, not merely scaled-down Transformers. Fast forward two years, and that conviction coalesced into Liquid AI funding (check Liquid AI Crunchbase for fundraising details), fortified by a cross-disciplinary brain trust (robotics legend Daniela Rus chairs the board) and an opening seed round north of $40 million.
The mission statement, scrawled on the whiteboard of Liquid AI headquarters (a sign of how far Liquid AI valuation soared early on), was blunt:
“Shrink state of the art language models by at least one order of magnitude and make them run anywhere.”
Notice the verb shrink, not compress or quantize. This is about redesign, not merely optimization.
2. Why the Transformer Became Too Heavy

Let’s be clear: Transformers are brilliant. They gave us chatbots that can cite Feynman and write passable haiku. Yet the math that makes them sing—global self-attention—comes with a quadratic price tag in sequence length. Double your input tokens and you quadruple compute. Fine for short emails, insane for 64 kilobyte documents or multi-hour audio.
In the cloud we can hide that cost behind a datacenter invoice. On a battery-powered phone, the physics reasserts itself. The GPU core throttles, memory stalls, and the user wonders why the fancy AI app drains 10 percent per query.
Engineers have attacked the problem from two flanks:
- Hardware — build exotic accelerators (Groq’s LPU, Cerebras’ wafer-scale slabs) to brute-force attention faster.
- Incremental math — prune heads, sparsify blocks, flash-compress QKV matrices.
An edge LLM startup like Liquid AI offers yet another vector for innovation.
Liquid AI argues for a third path: abandon the attention primitive outright and resuscitate older but more tractable operations—namely, convolutions and state-space kernels. That choice undergirds their flagship architecture, Hyena.
3. Hyena 101 — How to Listen to 100K Tokens Without Flinching
Imagine you’re handed a novel and asked to answer a question about the last paragraph and chapter one simultaneously. A Transformer builds a full matrix of interactions—n² links for n tokens. Hyena takes a humbler view: local interactions can be captured by convolutions—efficient transformer alternatives that sidestep attention—whereas truly long-range dependencies can be encoded in implicit kernels that operate in the frequency domain, à la digital signal processing.
The Hyena layer, first formalized by Poli et al. in 2023, stacks two tricks:
- Data-controlled gating: a lightweight mechanism that decides which channels to amplify or attenuate before the heavy math kicks in.
- Implicit long convolutions: parameterize a filter in Fourier space, then apply it in time space using the convolution theorem. Compute cost becomes O(n log n) at worst, but in practice often linear courtesy of kernel caching.
Liquid AI’s engineers like to show a small chart: at a 2 K token context, a Hyena language model delivers the same perplexity as a Transformer while burning 20 percent less FLOPs. Stretch the sequence to 64 K tokens and the gap balloons to a 100× speedup. Importantly, these are architectural savings, not post hoc quantization tricks.
Sidebar — Are Convolutions Really SOTA Again?
I confess skepticism the first time I read the paper. Convolutions were the NLP workhorse until 2017, when attention swept the board. Was Hyena just nostalgia with nicer math? Two things convinced me otherwise:
- The convolution here isn’t hand-sized; it’s millions of taps long, made feasible only by implicit parametrization.
- The gating mechanism injects data-dependent non-stationarity, recapturing some of the dynamic routing that attention offers.
In effect, Hyena revives convolutions but upgrades them with lessons from state-space models, yielding a strangely elegant hybrid.
4. The Edge-First Philosophy — Privacy, Latency, and the Missing Modem Hop
Why insist on running the model on device? In a landscape dominated by AI startups 2025, Liquid AI careers often revolve around solving exactly these constraints. Three reasons surface whenever I quiz the experts:
- Latency feels like intelligence. A 500 ms pause breaks the cognitive illusion; 50 ms feels conversational. Local inference chops the speed-of-light penalty.
- Privacy is a feature, not an afterthought. When raw text never leaves silicon under your thumb, compliance headaches melt away.
- Energy equals opportunity. A model that sips 1 W can be embedded in wearables, cars, drones—domains where a 50 W GPU is a non-starter.
5. Money Flows to Physics-First AI
Venture capital notices pattern breaks before the general public. In late 2023, OSS Capital and PagsGroup wired $37.5 million to Liquid AI while the ink on their corporate charter was still wet. A year later, AMD Ventures led a $250 million Series A, vaulting the valuation north of $2 billion. Meanwhile, early Hyena AI Stanford benchmarks fueled credibility for Liquid AI.
Why the feeding frenzy? My read:
- Chip synergy — AMD wants workloads that fit its edge-oriented FPGA/SoC roadmap.
- Market timing — Enterprises are uneasy about sending proprietary data to OpenAI servers; an on-prem model that fits on a single H100 is music to the legal department.
- Narrative divergence — Investors love a contrarian story: worm brains versus trillion-parameter megafauna.
Add marquee angels (Tobi Lütke, Naval Ravikant, Tom Preston-Werner) and you have the makings of a hype halo. The danger, of course, is overshoot. But even skeptics admit the cap table now reads like a who’s-who of hardware and SaaS royalty, which rarely converge by accident.
6. The Competitive Map — Three Camps, One Goal
Zoom out and the “efficient AI” arena splits roughly into:
- Hardware maximalists — Build exotic silicon to crunch existing models faster (Groq, Cerebras).
- Transformer refiners — Keep the architecture, prune sparsely, flash optimize kernels (Google, Meta, Mistral).
- Architecture rebels — Invent new network families that sidestep attention (Liquid AI, certain SSM labs).
In my conversations with AI engineers, Liquid AI thinking repeatedly challenges cloud-first model assumptions.
Cross-pollination happens—Google’s FlashAttention is now a staple everywhere—but strategic identities are clarifying. Liquid AI happily runs on Nvidia GPUs today; they just need fewer of them. Hardware outfits, in turn, flirt with non-attention models because they map cleanly onto systolic arrays. The picture is less a horse race, more a triangulation exercise: pick any two of size, speed, interpretability; the third becomes negotiable.
7. Under the Microscope — What Makes a Model “Liquid”?
The term liquid conjures fluid dynamics more than backprop. Yet it captures two technical attributes that Liquid AI highlighted:
- Adaptive time constants: Each neuron can stretch or contract its effective receptive window on the fly, akin to how biological cells integrate stimuli over multiple millisecond scales.
- Continuous-time formulation: Instead of discrete layers stacked like pancakes, liquid nets model hidden states as solutions to differential equations. Training discretizes them, but the underlying view is continuous.
Why care? Because the real world is rarely i.i.d. Shocks, drifts, and feedback loops dominate control systems, financial tick streams, and yes, conversational context that meanders. Liquid nets claim a natural footing in such domains. This perspective drives how Liquid AI evaluates model efficiency.
Anecdote — Debugging in Fourier Space
One Liquid engineer debugged a perplexity spike by visualizing the learned frequency response of a Hyena kernel. The bump correlated with Germanic compound nouns—long tokens that had slipped the convolution window of an earlier prototype. Adjusting a single hyper-parameter in kernel length shaved 1.2 perplexity points. Try doing that with opaque attention heads.
8. Benchmarks, But Read the Fine Print
A few headline figures circulate in press releases:
- 2 K tokens: Hyena matches Transformer perplexity with ~80 % of the FLOPs.
- 64 K tokens: Hyena is 80–100× faster in wall-clock inference on a desktop 4090.
These are credible—Liquid AI provided logs to an independent lab—but remember they compare equal-sized models by parameter count, not by training compute. A crafty Transformer with sparse routing might close part of the gap. Still, the trend line is unmistakable: longer sequences amplify Hyena’s advantage, so workloads like legal discovery or code-base reasoning may tilt in its favor.
9. Privacy Is the New UI
Developers often treat privacy as compliance paperwork—until it bites. Yet, Liquid AI’s architects built privacy-first features into every tensor. Consider a healthcare startup that trains a GPT-style scribe for doctors. If the text ever leaves the clinic’s LAN, HIPAA sirens go off. Liquid AI pitches a neat solution: deploy a 3 billion-parameter Liquid Foundation Model (LFM 3B) directly on premise. This approach reflects Liquid AI’s conviction that no external server call is necessary. No call-outs, no data residency nightmares.
Moreover, smaller deterministic nets foster explain ability. Gating activations in Hyena resemble FIR filters; you can literally plot them and see which frequencies dominate. Clinicians, regulators, and auditors crave such windows into the black box. Whether that comforts them fully is unresolved, but the direction feels constructive.
10. Open Sourcing — Risk or Accelerant?
Liquid AI hints that an open source release is imminent—weights, code, the works. Conventional wisdom says: guard your moat. But history suggests the opposite for platform plays. TensorFlow’s launch in 2015 catalyzed Keras; PyTorch’s liberal license spurred lightning and Hugging Face. If Liquid AI wants Hyena kernels adopted at scale, “release early, instrument often” is a proven path.
The challenge? Edge models are more finicky to benchmark. Unlike GPU clusters, mobile SOCs vary wildly. One option is a “matrix multiplication taximeter” akin to MLPerf Mobile, but tuned for convolutional time-space kernels. I hope Liquid AI seeds that ecosystem; otherwise we’ll drown in apples to oranges speed claims.
11. Skeptic’s Corner — Four Unanswered Questions
1. Training cost: If Hyena is so efficient at inference, how expensive is pretraining for Liquid AI models?
2. Instruction following: Raw perplexity is nice; aligning a model to human intent is another beast. Does RLHF work as-is on a liquid backbone?
3. Tooling gap: The PyTorch ecosystem is optimized for matrix multiplies, not gigantic implicit convolutions. Custom CUDA kernels may lag behind.
4. Community traction: Transformers enjoy billions of cumulative engineering hours; a new stack must earn loyalty. Will weekend hackers bother?
Liquid’s leadership acknowledges the hurdles.
12. The Broader Narrative — Efficiency Is the New Frontier
History repeats in spirals. The first wave of deep learning celebrated scaling laws—bigger data, wider layers, deeper stacks. The second wave, now unfolding, prizes frugality: do more with fewer parameters, less energy, tighter latency.
Liquid AI embodies that turn, but it’s hardly alone. TinyML conferences brim with silicon vendors; Apple’s Neural Engine quietly ships transformer variants distilled into 256 KB L1 caches. The field senses a plateau in raw scale and hunts for architectural leaps.
If I had to coin a slogan, it might be: Sustainable intelligence demands thermodynamic humility. The room-sized mainframe begat the personal computer; cloud behemoths may yet be followed by AI that stows itself inside earbuds.
Conclusion: — A Call to Curiosity
In the mid-90s, computer scientists coined the phrase “network is the computer.” Perhaps the next decade in AI flips that aphorism: “The computer is the model.” With liquid architectures, the model folds into the device, seeps into the operating system, and eventually dissolves into the background noise of everyday tools.
Whether Liquid AI leads that migration or simply accelerates it is secondary. What matters is the reminder that progress in machine learning still rewards out-of-left-field ideas—like studying a worm’s nervous system—and that elegance, not just scale, can move markets.
If you’re a student, tinker. Download the code, profile the kernels, break something. If you’re an executive, demand a latency demo with the modem disabled. And if you’re a researcher, keep one eye on biology; neurons found tricks billions of years ago that we’ve barely begun to translate into silicon.
The edge is calling. Time to pack a lighter network.
Azmat — Founder of BinaryVerse AI | Tech Explorer and Observer of the Machine Mind Revolution
For questions or feedback, feel free to contact us or explore our About Us page