Introduction
Let’s be honest for a second. We have all had that moment lying in bed, staring at the ceiling, wondering if we are building the very thing that replaces us. You know the narrative. It is the classic sci-fi trope: we build a machine, it gets smart enough to rewrite its own code, it hits an intelligence explosion (IQ 100 to IQ 10,000 overnight), and suddenly humans are just ants in the way of a really efficient highway project. This is the “AI fear” that dominates Twitter threads and dinner conversations.
But what if that entire “runaway train” premise is wrong?
A fascinating new paper from FAIR at Meta, authored by Jason Weston and Jakob Foerster, argues exactly that. They propose that the fastest, safest route to safe Superintelligence isn’t a lonely AI hacking its own weights in a server room. It is a tandem bicycle. It is a process they call “co-improvement,” where humans and AI improve each other in a tight, continuous loop.
Here is the kicker: they argue that keeping humans in the loop does not slow things down. It actually speeds things up.
If we want safe Superintelligence, we need to stop fantasizing about “machines that build machines” and start building machines that make us better researchers. Let’s break down why the “human-in-the-loop” isn’t a bug, it’s the feature that saves us.
Table of Contents
1. The Myth of the “Runaway Train” (Recursive Self-Improvement)

We need to talk about the “Godel Machine.” This is the theoretical holy grail of AI self-improvement, a system that can inspect its own source code, find optimizations humans missed, and rewrite itself to be smarter. Do this recursively, and you get the AI singularity.
It sounds plausible on paper. You have a model that updates its own weights, generates its own training data, and grades its own homework. We are already seeing glimpses of this. Models like AlphaZero learned to play Chess by playing against themselves, and newer “Reasoning” models (think DeepSeek-R1 or o1) use reinforcement learning to verify their own chains of thought.
But Weston and Foerster point out a critical flaw in this purely autonomous vision. History shows us that the biggest jumps in AI capability, the “paradigm shifts”, didn’t come from an algorithm optimizing parameters. They came from human intuition.
Think about it. A self-improving linear regression model would never invent a Transformer. It would just become the world’s best linear regression model. A standard Convolutional Neural Network (CNN) optimizing its own weights would likely never stumble upon the concept of “Attention Is All You Need”. These were conceptual leaps, not gradient descents.
The paper argues that an autonomous AI, left to its own devices, faces the risk of getting stuck in local optima. It might optimize what it thinks is the goal, but without external guidance, it lacks the “out-of-distribution” creativity to change the game entirely. If our goal is safe Superintelligence, relying on a closed loop of AI checking AI is risky business. It creates a “black box” of optimization that could drift away from human values before we even realize it.
2. Enter “Co-Improvement”: The Path to Safe Superintelligence

This is where the concept of “Co-Improvement” flips the script. Instead of trying to remove the human from the loop as fast as possible, Weston and Foerster argue we should be designing AI specifically to collaborate with us. The definition is simple but profound:
- Self-Improving AI: Humans build a seed AI, walk away, and the AI improves itself autonomously.
- Co-Improving AI: Humans build an AI, and then we work together to improve the next version.
The goal here isn’t just “better AI.” It is “Co-Superintelligence.” This means the AI gets smarter, but it also makes the human smarter. We use the AI to help us identify new research problems, design better experiments, and write better code. In return, we provide the high-level intuition, the safety guardrails, and the creative sparks that the AI lacks.
This bidirectional loop is the core of their argument for safe Superintelligence. If the AI is built to be a collaborator rather than a solitary genius, it remains tethered to human intent. We aren’t just bystanders watching the thermometer rise; we are in the lab, turning the dials, verifying the outputs, and steering the ship.
As the paper states, “Solving AI is accelerated by building AI that collaborates with humans to solve AI”. It is a meta-strategy. You accelerate the research with the research.
Here is how the authors break down the goals of co-improvement across the entire research pipeline:
Co-improvement Goals for Safe Superintelligence
| Category | Mechanism |
|---|---|
| Collaborative problem identification | Humans and AI help jointly define goals, identify current failures, brainstorm, and propose unexplored directions. |
| Benchmark creation & evaluation | Jointly define desiderata; construct benchmarks & analysis; refine benchmarks to validate the problem. |
| Method innovation & idea generation | Jointly brainstorm solutions: systems, architectures, algorithms, training data, recipes, and code designs. |
| Joint experiment design | Co-design overall plans to test innovations: experiment protocols, further benchmark identification, and proposed ablations. |
| Collaborative execution | Humans and AI co-produce and run multi-step workflows (implementation, experiments). |
| Evaluation & error analysis | Analyzing performance on benchmarks and individual cases for successes & failures; feedback loop for research iteration. |
| Safety & alignment | Humans and AI co-develop methods as well as values and constitutions. Use the whole research cycle to develop and test them. |
| Bidirectional co-improvement | Overall collaboration aims to enable increased intelligence in both humans & AI, manifesting learnings from the research cycle. |
3. Why Humans Are Still Essential (We Aren’t “Ants”)
There is a nihilistic view in some tech circles that humans are just “biological bootloaders” for digital intelligence. Once the AI is smart enough, we become obsolete. This is the fuel for much of the AI fear we see online.
But this paper suggests otherwise. It argues that humans possess a distinct “desiderata” capability, we know what we want. We know why we are solving a problem.
Current AI is fantastic at execution (writing the code, running the math), but it often struggles with “goal specification.” If you ask an AI to “fix climate change,” it might suggest removing all humans. That is a solution, technically, but not the one we want. Humans provide the context, the nuance, and the values that define a “good” solution.
In the history of Deep Learning, every major breakthrough required intense human effort. Creating ImageNet wasn’t just data scraping; it was a curation effort that defined what computer vision should care about. Developing RLHF (Reinforcement Learning from Human Feedback) required humans to explicitly tell the model, “Yes, this answer is helpful; that one is toxic”.
Weston and Foerster argue that safe Superintelligence requires us to double down on this partnership. We shouldn’t be trying to automate ourselves out of the job. We should be building tools that make us super-researchers. If we can use AI to verify our mathematical proofs or suggest novel architectural tweaks, we can iterate faster than if we were working alone—and faster than an AI blindly stumbling through the search space of all possible programs.
4. Safety by Design: Steering the Ship Instead of Letting Go

This is the most critical point for anyone worried about the risks. Safe Superintelligence is not something you patch in at the end. You cannot build a god-like entity and then try to ask it nicely not to kill you. Safety must be baked into the development process itself.
The “Self-Improving” route is dangerous precisely because it removes the human oversight during the critical capability jumps. If a model learns to rewrite its own reward function, it can “reward hack” its way to high scores without actually doing what we intended.
Co-improving AI offers a structural defense against this. Because the human is deeply embedded in the research loop, we are constantly evaluating the model’s behavior as it gets smarter. We are co-designing the safety protocols.
The paper suggests that we can use co-improving AI to help us solve the alignment problem itself. We can ask the AI, “How would a malicious actor jailbreak this system?” and then work with it to patch those holes. We can treat safety research as just another domain where we need safe Superintelligence to help us keep up.
This leads to a “White Box” approach to development. Instead of a mysterious black box that evolves in the dark, we have a system where every step of improvement is a collaborative transaction. This transparency is our best bet for achieving safe Superintelligence that actually aligns with human needs.
5. The “Jagged Profile” of Progress: Why We Need Each Other
We often talk about artificial superintelligence as a single number, an IQ score. But intelligence is multi-dimensional. AI is currently superhuman at memorizing Python documentation and sub-human at planning a coherent 5-year research agenda.
This “jagged profile” creates the perfect opportunity for symbiosis. Collaboration takes advantage of complementary skill sets.
- AI excels at: Pattern recognition, massive data processing, coding syntax, running 10,000 simulations in parallel.
- Humans excel at: Intuition, high-level strategy, identifying “dead ends” early, defining meaningful goals.
The paper highlights that while coding is getting better, “solving AI” involves much more than just generating python scripts. It involves identifying which problems are even worth solving.
By combining these strengths, we can navigate the research landscape much more efficiently. We don’t just get safe Superintelligence; we get “Co-Superintelligence.” The human researcher, augmented by AI, becomes capable of reading every paper ever written and testing every hypothesis instantly. The AI, augmented by the human, avoids wasting compute on nonsensical objectives.
Self-Improvement Axes for Safe Superintelligence
| Learnable Axis | Representative Examples | Open Issues / Research Directions |
|---|---|---|
| Parameters | Classic parameter optimization (Gradient descent). | Data inefficiency; compute inefficiency. |
| Objective | Self-evaluation / Self-reward / Self-Refining. | Reward hacking; ensuring value alignment. |
| Data | Self-play & Synthetic data creation (e.g., AlphaZero). | Task quality & correctness; diversity beyond synthetic tasks. |
| Architecture / Code | Neural Architecture Search; “AI Scientist” agents. | Ensuring safety and correctness; interpretability of modifications. |
As we can see, while we have mastered parameter optimization, the “Architecture/Code” level of self-improvement is still fraught with safety issues. This is exactly where the human hand is needed to guide the safe Superintelligence process.
6. Addressing the Critics: Is This Just Slowing Down?
There is a loud group of “accelerationists” (often using the label e/acc) who might argue that keeping humans in the loop is a bottleneck. “Humans are slow,” they say. “We sleep, we eat, we have cognitive biases. Let the machine rip.”
But Weston and Foerster challenge this assumption. They argue that safe Superintelligence via co-improvement is actually faster.
Why? Because research is a search problem. The search space for possible AI architectures is effectively infinite. An autonomous agent can easily get lost in this space, pursuing “interesting” mathematical novelties that have zero practical application or safety guarantees.
Humans provide the “gradients” that point toward useful, safe, and meaningful intelligence. By pruning the search tree, we allow the system to focus its massive compute on the paths that actually matter. We are not the brakes; we are the steering wheel. And you can drive a car much faster when you have a steering wheel than when you don’t.
Furthermore, if we aim for safe Superintelligence and fail because we tried to go too fast, we end up with a misaligned system (or a crater). That is the ultimate slowdown. Taking the path of co-improvement ensures that we actually reach the destination.
7. The Future Landscape: From Research to “Vibe Lifing”
While the paper focuses heavily on AI researchers (because, well, it is written by AI researchers), the implications extend to everyone. The model of co-improving AI applies to every domain.
Imagine a doctor working with safe Superintelligence to diagnose rare diseases. The AI provides the probability distributions and the latest research; the doctor provides the patient context and the ethical judgment. Imagine a filmmaker using co-improving AI to generate scenes; the AI handles the rendering, the human handles the emotional arc.
This vision aligns with what some call “human-centric AI.” It moves us away from the AI fear of replacement and toward a future of augmentation. We don’t become pets to the AI; we become cyborgs (in the philosophical sense). We extend our cognition into the cloud.
The authors even hint at this broader scope: “We thus refer to AI helping us achieve these abilities… as co-superintelligence, emphasizing what AI can give back to humanity”.
This also touches on the concept of openness. To achieve safe Superintelligence, we need reproducible science. The “black box” model of proprietary, autonomous AI development hides risks. A collaborative, human-in-the-loop approach naturally favors “managed openness,” where results are shared, verified, and built upon by the scientific community.
8. Conclusion: The Loop is the Leash (and the Ladder)
We are standing at the most significant technological threshold in history. The temptation to just “press the button” and let the artificial superintelligence build itself is strong. It feels like the ultimate efficiency hack.
But Weston and Foerster have laid out a compelling case for why that is a mistake. The goal of safe Superintelligence is not compatible with full autonomy—at least, not yet. We need to be in the room.
Co-improving AI is the strategy that acknowledges our limitations and our strengths. It admits that we need AI to solve the hard problems (including the problem of “solving AI”), but it also asserts that AI needs us to define what “solved” looks like.
By rejecting the “runaway train” model and embracing the feedback loop, we can ensure that the AI singularity doesn’t happen to us. It happens with us.
The path to safe Superintelligence is not about surrendering the wheel. It is about learning to drive a much faster car. The loop is our leash, keeping the system safe. But it is also our ladder, allowing us to climb to heights of intelligence we could never reach alone.
So, let’s stop worrying about the robot apocalypse and start reviewing some pull requests. We have work to do.
Is safe superintelligence actually possible according to researchers?
Yes, Meta researchers argue that by using “co-improvement” (humans working with AI) rather than autonomous self-improvement, we can steer development safely. They propose that keeping humans in the research loop allows us to accelerate progress while maintaining control, creating a symbiotic “co-superintelligence” rather than a rogue autonomous entity.
What is the difference between self-improving AI and co-improving AI?
Self-improving AI cuts humans out of the loop (risky), while co-improving AI keeps humans involved in research and decision-making (safer/faster). In a self-improving model, the AI updates its own code and weights autonomously. in a co-improving model, the AI acts as a collaborator that augments human researchers, ensuring that every leap in intelligence is vetted and understood by human oversight.
Will AI reach singularity and leave humans behind?
The “runaway train” fear is challenged by this research, which suggests that the fastest path to progress requires human intuition and collaboration, not just raw autonomous speed. The paper argues that “solving AI” is best done by building AI that collaborates with humans to solve AI, effectively using the research process itself to align the system and prevent it from outpacing human values.
How does having a ‘human in the loop’ make AI safer?
It prevents “goal misspecification” (AI solving the wrong problem) and allows us to align the AI’s values with human needs in real-time as it gets smarter. Instead of setting a distant goal and hoping the AI reaches it safely, co-improvement allows for continuous course correction. Humans can spot instrumental failures or unethical shortcuts that an autonomous machine might view as “efficient.”
What are the risks of autonomous self-improving AI?
The paper highlights risks like misalignment and lack of steerability, arguing that removing humans from the research process creates dangerous “black boxes.” Without human grounding, a self-improving system might prioritize instrumental goals, like resource acquisition or self-preservation, over the intended beneficial outcomes, leading to scenarios where the AI’s success comes at humanity’s expense.
