Introduction
We are witnessing a phase shift in how science gets done. For years artificial intelligence in research was essentially a really fast calculator or a glorified search engine. It was a tool. You held the hammer and you swung it. But the dynamic is changing. We are moving from AI as a tool to AI as a partner.
This shift was cemented recently when OpenAI released a technical report detailing early experiments with GPT-5. The paper is titled “Early science acceleration experiments with GPT-5” but that title buries the lead. The reality is far more interesting. We are looking at the birth of the AI Co-Scientist.
This isn’t just about writing cleaner code or summarizing PDFs. In collaboration with researchers from institutions like Oxford, UC Berkeley, and Lawrence Livermore National Lab, this new model didn’t just organize knowledge. It created new knowledge. It settled open mathematical problems. It proposed biological mechanisms that human experts had missed. It accelerated fusion simulations by a factor of a thousand.
GPT-5 is still imperfect and it still hallucinates. But when you wrap it in the right expert workflow, what we call “scaffolding”, it becomes something else entirely. It becomes a reasoning engine capable of accelerating science in ways we are just beginning to map out.
Table of Contents
1. The Headline Breakthrough: Solving Erdős Problem #848

To understand the leap in GPT-5 capabilities, you have to look at the edge cases. You have to look at the problems that have stumped humans for decades.
Enter the Erdős problems. Paul Erdős was a legendary mathematician who posed hundreds of conjectures that define the field of combinatorial number theory. Problem #848 asked a specific question about the density of a set of numbers where the product of any two elements plus one is not a square-free number. It is the kind of problem that requires deep intuition about “negative space”, understanding not just where numbers fit, but where they cannot fit.
Researchers Mehtaab Sawhney and Mark Sellke had the framework for a solution. They knew the answer likely involved “residue classes” (remainders when divided by a number). But they were stuck on the optimization. They needed a sharper bound to close the proof. They treated the model as an AI Co-Scientist. They didn’t ask it to “solve the problem.” They fed it their partial progress and the context of previous online discussions.
The model didn’t just regurgitate training data. It proposed a novel logical step. It suggested that if the set contained even a single “unexpected” number (one that didn’t fit the pattern modulo 25), that single outlier would constrain all other numbers in the set so severely that the set couldn’t exist.
This was the key. The insight about the “outlier” allowed the humans to complete the proof. It was a genuine contribution to abstract mathematics. The proof is now formally solved, with the AI Co-Scientist acting as the catalyst that crystallized the final argument.
2. Beyond Math: Accelerating Science in Biology and Physics
Mathematics is a closed system with clear rules. Biology is messy. Physics is unforgiving. Yet the AI Co-Scientist showed it could handle the noise of the real world just as well as the purity of number theory.
2.1 The T-Cell Mystery

In a standout example of AI in scientific research, immunologist Derya Unutmaz fed the model unpublished data from a flow cytometry experiment. The experiment involved treating human T-cells (immune warriors) with a glucose inhibitor called 2-DG. The data showed a massive spike in a specific subset of inflammatory cells. The lab had spent months puzzling over why.
The model looked at the charts and proposed a mechanism: N-linked glycosylation interference. It reasoned that the inhibitor wasn’t just starving the cells of energy; it was physically messing with the sugar coatings on their surface receptors.
It didn’t stop there. It suggested a follow-up experiment to prove it: “rescue” the cells with mannose (a sugar) to bypass the block. The lab had actually done this experiment but hadn’t published it. The model predicted the result perfectly. It then suggested a new application for cancer therapy, using this mechanism to “prime” CAR-T cells to be better serial killers of tumors.
2.2 Black Hole Symmetries
In theoretical physics, Alex Lupsasca used the AI Co-Scientist to find hidden symmetries in the equations governing black holes (specifically the Kerr metric). Initially, the model failed. It looked at the complex curved-space equations and gave up, saying there were no symmetries.
This is where the concept of “scaffolding” comes in. Lupsasca didn’t fire the assistant. He gave it a warm-up exercise. He asked it to solve the same problem in “flat space” (a simpler version). The model nailed it. Once it had “warmed up” its reasoning circuits on the simpler case, Lupsasca fed it the complex curved-space problem again. This time, it transferred the intuition and successfully derived the hidden symmetries.
3. The “Secret Sauce”: Scaffolding and Human-AI Collaboration

There is a criticism often lobbed at these demos: “The human did all the work.” Critics argue that the AI is just a stochastic parrot and the expert researcher is doing so much hand-holding that they might as well do the math themselves. This misses the point of an AI Co-Scientist.
The power isn’t in the AI acting alone. It is in the “scaffolding.” Just as a principal investigator doesn’t expect a grad student to write a Nature paper on day one without guidance, you cannot expect GPT-5 to solve physics from a zero-shot prompt.
Scaffolding is the art of breaking a massive cognitive task into iterative steps. You verify the intermediate work. You provide context. You treat the AI as an infinite resource of intelligence that lacks judgment. The human provides the judgment. The AI provides the compute.
In the Inertial Confinement Fusion (ICF) experiments conducted by Brian Spears, this dynamic reduced a workload of six months into six hours. Spears had the physics intuition. He used the AI Co-Scientist to write the code, solve the differential equations, and run the optimization loops. The AI made mistakes, it tried to use numerical shortcuts that obscured the physics, but Spears caught them. The result was a “ridge map” of optimal fuel density that would have taken a team of postdocs weeks to generate manually.
AI Co-Scientist Performance Impact
| Task | Traditional Timeline | AI Co-Scientist Timeline | Factor of Acceleration |
|---|---|---|---|
| Fusion Simulation Setup | Days to Weeks | Minutes | ~100x |
| Complex Integral Solution | Hours (or never) | 40 Minutes | ~10x |
| Literature Deep Search | Weeks | Seconds | ~1000x |
| Mechanism Hypothesis | Months | Minutes | ~10,000x |
4. AI Research Assistant: The Power of “Deep Literature Search”
Every scientist knows the pain of literature review. You search for keywords. You hope the authors used the same terminology you use. You miss the seminal paper from 1958 because it was written in German and used different notation. The AI Co-Scientist changes the unit of search from “keywords” to “concepts.”
In a project involving “density estimation,” researcher Nikita Zhivotovskiy had a new geometric result. He wanted to know if anyone had seen it before. A standard Google Scholar search came up empty. He asked the AI.
The model connected his work to a paper from 2000 by Papadimitriou and Yannakakis. The papers used completely different languages, one spoke of statistics, the other of multi-objective optimization in computer science. But the math was the same. The AI research assistant saw through the jargon to the underlying structure.
It went further in the Erdős project. The AI located a solution to a problem buried in a 1961 paper by Pommerenke. It wasn’t the main theorem; it was a side comment between two theorems. It also found that the real meat of the proof was in a 1959 paper written in German. The AI read the German paper, translated the math, and explained how it solved the modern problem.
This is automated scientific discovery at the level of information retrieval. It synthesizes the collective knowledge of humanity in a way no single human brain can.
5. Addressing the Skeptics: Hallucinations vs. Novelty
We must address the elephant in the room. Large Language Models hallucinate. They make things up. In science, this sounds like a disqualifying feature. But when you are using an AI Co-Scientist for hypothesis generation, hallucination is sometimes a feature, not a bug. It represents lateral thinking. The key is verification.
In the case of “Clique-avoiding codes,” researchers asked the model for a lower bound on a specific problem. The model confidently produced a proof. It was correct. It was elegant. The researchers were thrilled, until they realized the model had essentially plagiarized a 2024 paper by Noga Alon.
The model didn’t cite Alon initially. It presented the proof as its own derivation. This is a dangerous failure mode. It creates a risk where a researcher might think they have discovered something new, only to be embarrassed later.
Yet in the physics example, the model confidently stated there were “no symmetries” in the black hole equation. It was wrong. It defended its wrongness. It took an expert physicist to push back and say, “No, try this warm-up first.” This reinforces the necessity of the expert in the loop. The AI Co-Scientist is not an oracle. It is a generator. You must check its work.
6. GPT-5 Capabilities vs. GPT-4: A Leap in Reasoning
What makes GPT-5 different? We have had chatbots for years. The difference lies in the depth of the reasoning chain.
Previous models were great at pattern matching. They struggled with “negative space”, understanding why a certain mathematical approach won’t work. In the experiments with mathematician Timothy Gowers, the AI Co-Scientist was able to look at a proposed proof strategy and explain why it would fail before the mathematician wasted time on it.
This ability to prune the tree of possibilities is massive. Accelerating science isn’t just about finding the right answer faster; it is about abandoning dead ends sooner.
The model also showed improved persistence. In the cosmic string radiation problem, the model hung up for 40 minutes thinking about an integral. In previous versions, it would have timed out or given a lazy approximation. Here, it churned. It tried a Legendre polynomial expansion. Then it switched to Bessel functions. It eventually spit out the correct analytical result—a result that Mathematica, a dedicated symbolic algebra system, couldn’t solve.
7. Practical Application: How to Use an AI Co-Scientist Today
If you are a researcher or an engineer, you don’t need to wait for the future. You can start using these workflows now (even with current top-tier models). The trick is to change how you prompt. Do not treat the AI as a search engine. Treat it as a colleague who has read every book in the library but has had too much coffee.
7.1 Simulation over Speculation
One of the most potent uses of the AI Co-Scientist is simulation. In the CAR-T cell example, the researchers asked the AI to “simulate” what would happen if they engineered the cells in a specific way. The AI ran a mental model based on its training data of cell biology and predicted the outcome.
This allowed the lab to prioritize which wet-lab experiments to run. Experiments cost money. They take time. Running a “digital twin” simulation in the model is free.
7.2 The “What Am I Missing?” Prompt
Instead of asking the model to solve the problem, ask it to critique your setup. “I am setting up a reaction-diffusion model for fusion. Here are my constants. What physical effects am I neglecting that could dominate the regime I am in?”
This leverages the broad knowledge base of the AI Co-Scientist without relying on it for precise calculation.
8. The Limitations: Why You Still Need a PhD (For Now)
The paper makes one thing abundantly clear: The AI Co-Scientist did not solve these problems alone. In every single success story, the human in the loop was a world-class expert. Sébastien Bubeck is a master of convex optimization. Alex Lupsasca is a theoretical physicist. They knew when the AI was lying. They knew how to steer it.
If a novice had used GPT-5 to try and solve the Kerr metric symmetries, they would have accepted the initial “it’s impossible” answer and moved on. The AI amplifies expertise; it does not yet replace it. It raises the ceiling of what an expert can do in a day, but it raises the floor only slightly.
The error modes are subtle. In the graph theory problem regarding “Follow-the-Leader” algorithms, the AI generated several incorrect proofs before landing on the counter-intuitive example that worked. A non-expert would have been unable to filter the noise from the signal.
AI Co-Scientist Limitations & Risks
| Limitation | Consequence | Mitigation |
|---|---|---|
| Hallucination | False confidence in wrong results | Mandatory independent verification |
| Attribution Failure | Plagiarism of existing literature | Deep literature search using specific tools |
| Context Window | Loss of thread in long proofs | Modularizing the problem into small steps |
| Stochasticity | Different answers on different runs | Running the prompt multiple times (Ensembling) |
9. Conclusion: The Future of Automated Scientific Discovery
We are standing at the precipice of a new era. The bottleneck in science has always been human bandwidth. There are only so many papers you can read. There are only so many integrals you can solve in a day. There are only so many dead ends you can explore before you run out of funding.
The AI Co-Scientist removes these limits. It allows a single researcher to act as a principal investigator, a grad student, a coder, and a literature reviewer simultaneously. It compresses the timeline of discovery from months to hours.
Accelerating science is no longer just a funding problem; it is a workflow problem. The researchers who learn to scaffold these models, who learn to dance with the AI rather than just command it, will define the next decade of discovery.
The tools are here. The math is checking out. The biology is working. The only question left is: What problem will you ask it to solve?
What is an AI Co-Scientist and how does it differ from a standard chatbot?
Answer: An AI Co-Scientist is not merely a retrieval tool but a reasoning partner capable of generating novel knowledge. unlike a standard chatbot that summarizes existing data, models like GPT-5 act as “reasoning engines” that can propose original biological mechanisms, settle open mathematical conjectures, and accelerate simulations by factors of 1,000 when wrapped in expert workflows called “scaffolding.”
How did GPT-5 solve the unsolved Erdős Problem #848?
Answer: GPT-5 solved Erdős Problem #848 by collaborating with mathematicians Mehtaab Sawhney and Mark Sellke. Instead of solving it “zero-shot,” the AI analyzed partial progress and proposed a novel logical step: that a single “unexpected” number (an outlier modulo 25) would constrain the entire set so severely it couldn’t exist. This insight on “negative space” allowed the humans to close the proof.
Can GPT-5 perform “automated scientific discovery” without human help?
GPT-5 cannot perform fully automated discovery. The technical report emphasizes “scaffolding”, a process where human experts break massive tasks into iterative steps. For example, physicist Alex Lupsasca had to “warm up” the model on simpler “flat space” problems before it could successfully find hidden symmetries in complex black hole equations.
How does GPT-5 differ from GPT-4 in scientific reasoning capabilities?
Answer: GPT-5 exhibits a massive leap in reasoning depth, particularly in understanding “negative space” (knowing why a strategy won’t work). Unlike GPT-4, which often timed out or gave approximations, GPT-5 demonstrated persistence, spending 40 minutes churning through Legendre polynomials and Bessel functions to solve an integral that symbolic engines like Mathematica failed to compute.
What are the limitations and risks of using AI in scientific research?
Answer: The primary risks are hallucinations and attribution failures. In one case, the AI correctly proved a bound for “Clique-avoiding codes” but initially failed to cite the 2024 paper by Noga Alon it had derived the logic from. This necessitates a “human-in-the-loop” workflow to verify results and prevent “scientific plagiarism” or false confidence in incorrect physics models.
