1. A Mirror with a Glitch
Picture an old school psychology lab. Two volunteers sit behind mirrored glass rating short product descriptions. One prose snippet is straight from a marketing intern, the other is drafted by a shiny language model. The human readers shrug, split their votes, and head for coffee.
Now replay the scene with two large language models as the judges. This time the machines high five the AI written copy almost every single round. That tilt is not a rounding error. It is a fresh strain of AI bias that quietly props up machine generated text over human work.
The PNAS AI bias study calls the effect AI AI bias. The research team ran thousands of head to head comparisons across ads, academic abstracts, and movie pitches. Every time they stripped away brand names, author credits, and obvious tells. All that remained was style, rhythm, and a whiff of machine origin. The verdict was lopsided. Language models preferred their own kind far more than people did.
I, being a clinical psychologist, saw something familiar. The pattern looked like in group favoritism from classic social psychology studies. Swap tribal markings for token probability, and the parallels line up. We are teaching silicon to nod at silicon, creating a subtle form of AI discrimination that could snowball into AI bias against humans across hiring funnels, recommendation engines, and online marketplaces.
Our job is to explore what the data really show, why it matters, and how we might steer clear of a future where machines quietly grade human prose as second rate. Buckle up. The ride starts with a closer look at the two faces of modern AI bias.
Table of Contents
2. Two Flavors of Bias, One Growing Problem
Dimension | Classic AI Bias | AI–AI Bias |
---|---|---|
Main Victims | Marginalized human groups (race, gender, class) | Humans in general when writing unaided |
Discriminating Agent | Statistical model using historical data | Large language model evaluating prose |
Typical Signal | Names, addresses, dialect | Stylistic fingerprints of LLM text |
Real World Impact | Unequal loans, skewed policing, biased hiring | Pay to play “gate tax” for LLM copy, shrinking human voice |
Fixes Explored | Fairness constraints, balanced datasets | Still experimental: de styling, activation steering, mixed fine tunes |
The left column is the bias we have wrestled with for years. The right column is the newcomer that sneaked in during late night prompt engineering sessions. Both sit under the umbrella of AI bias, yet they differ in their targets and mechanics. Tackling one does not automatically solve the other.
3. The Experiment That Lit the Fuse

Walter Laurito and colleagues wanted proof, not hunches. They harvested three datasets:
- Products: 109 real classifieds, everything from Bluetooth speakers to vintage lamps.
- Papers: 100 STEM articles in raw XML minus the original abstracts.
- Movies: 250 pre 2013 plot summaries scrubbed of spoilers.
3.1 Text Generation
For each item they fed a prompt like:
GPT 4, GPT 3.5, Llama 3, Mixtral, and Qwen 2.5 each took a shot. The outcome was five fresh machine blurbs per lamp, paper, or film, none overtly stamped “written by AI.”
3.2 Binary Choice
Next came the judging prompt served to the same or different model:
The pairings were shuffled A/B then B/A to blunt any first item bias. Any response without a clear JSON pick was marked invalid. Invalid rates above 30 percent triggered prompt tweaks; lower rates simply widened the sample.
3.3 Human Baseline
Thirteen remote assistants repeated the exercise blind. They clicked radio buttons instead of crafting JSON. Their role: sanity check whether machines were onto something truly higher quality or just cheering a mirror image.
4. The Numbers That Raised Eyebrows
A bar chart tells the story. Language models loved LLM prose. Humans sat on the fence or leaned the other way.
Preference for AI-Generated Text: LLMs vs. Humans
- Product ads: GPT 4 preferred AI copy 89 percent of the time. Humans: 36 percent.
- Paper abstracts: GPT 4 sided with AI copy 78 percent. Humans: 61 percent.
- Movie summaries: GPT 4 again, 70 percent. Humans: 58 percent.
Even GPT 3.5, famous for its looser grammar, tipped the scales well past chance in every domain. Open weight models showed milder but still real leanings.
““…our experiments together with the small-scale human test provide strong evidence for an LLM-for-LLM bias.” – PNAS AI bias study, July 2025
The phrasing “in group selection” is not accidental. The researchers borrowed classic discrimination setups from labor economics papers where identical résumés sport different names. Here, “John” versus “Jamal” became statistical rhythm versus human quirks.
5. A Psychologist Reads the Tea Leaves
My take is simple. Neural nets, like people, latch onto familiar patterns. When an LLM digests billions of machine generated sentences during pretraining, that cadence becomes home turf. Sentences crafted by another LLM feel fluent, error free, comforting. Human prose includes oddities, broken parallelism, slang, genuine voice. To the model that looks noisy.
The mind hooks are known:
- Halo Effect: one pleasing trait inflates the whole evaluation. Smooth token flow triggers a thumbs up on content.
- Homophily: agents trust peers who “sound” like them. Here the peer speaks in consistent log likelihood peaks.
- In group favoritism: once a boundary appears, preferences harden. The group may be artificial, yet the effect is real.
None of this means the model plots against people. The bias emerges from training signals that reward self similarity. Still, the downstream fallout feels personal when your grant proposal gets filtered out because you wrote it yourself.
6. A Table of Consequences

Scenario | Immediate Outcome | Long Term Risk |
---|---|---|
Assistant era | Writers pay for premium prompts to gain AI approval | Digital gate tax grows |
Agent era | AI agents transact with other AI agents | Human creators edged out |
Education tools | AI ranks bot-enhanced essays higher | Students depend on AI for grading |
Hiring pipelines | AI scans cover letters for machine tone | Creativity gets penalized |
Every cell links back to AI bias. Replace “machine tone” with “unconscious corporate prejudice” and the parallels become obvious.
7. How Big Is the First Item Trap?
Not every odd result stems from AI bias. “Some models showed a powerful first-item bias. On the movie set, GPT-4 favored the first option it saw a staggering 73% of the time, with Mixtral and Llama 3 also showing a strong, though slightly lesser, preference. The team fought this with A/B flips, yet even after balancing, the overall tilt remained.
First item bias plus AI AI bias interact. If option A is both first and AI written, the preference rockets. Remove the order effect, the preference stays but shrinks. Future audits must tease these factors apart. For now, the safe bet is that both biases intertwine, nudging LLMs toward familiar usage patterns rather than carefully reasoned judgments.
8. Prompts in the Wild
Many readers ask, “What prompt exactly caused the skew?” One of the most reliable selectors looked like this:
That prompt resembles real e commerce chat usage. The study’s design keeps ecological validity high, proving that AI bias isn’t a lab trick. It lurks inside everyday consumer workflows.
9. Where the Human Voice Slips
Imagine a freelance writer in Nairobi crafting ad copy. A US retailer now routes product decisions through a retail bot fine tuned on GPT 4. The writer’s text struggles to survive the first filter unless she pipes it through the same GPT 4. That costs dollars and internet bandwidth she may not have. The result is AI bias against humans in plain economic form, opportunity lost before a manager even skims the draft.
Multiply that dynamic across procurement, peer review, grant funding, content moderation, and the ripple grows. We have lived this pattern before. Left unchecked, a mild skew morphs into entrenched structure. Think of redlining, standardized testing gaps, or credit score penalties. Bias becomes infrastructure.
10. Counter Arguments, Tested and Tamed
Skeptics have surfaced a few predictable objections. Let us address them one by one, data in hand.
“Maybe the AI copy is just better.”
Fair point. The team controlled for length, jargon, and factual load. Humans still leaned neutral while models leaned machine ward. Quality alone cannot explain a forty point gap.
“Humans show mild preference too, so it is fair.”
Humans displayed a small bump for LLM abstracts, yet nothing close to the machine spike. The delta signals a preference curve steep only on the transformer side.
“First item bias distorts everything.”
Order bias exists, yet swapping A/B halves the effect, not erases it. When the smoke clears, AI AI bias is still visible.
“Fine tune it away and move on.”
A single patch may mask symptoms, but the root sits in pretraining corpora dominated by machine output. A real fix demands deeper surgery.
These rebuttals keep the spotlight squarely on AI bias as a phenomenon larger than prompt quirks.
11. Cracking Open the Black Box
Engineers often reach for attention heatmaps, but those scatter once tokens fly. A better lens is contrastive explanation, ask the model “Why sentence A over B?” then measure log odds of stylistic features. Early probes reveal the classifier head lighting up for:
- 1. Low variance sentence length,
- 2. Absence of first person digressions,
- 3. Precision in numeric phrases.
Those cues appear more frequently in LLM prose. In other words, the model sees its own fingerprints and nods. That insight rules out the theory that bias stems solely from hidden topical clues. The love seems stylistic. This again ties back to in group favoritism. The neural net accepts tokens that echo the probability patterns it grew up on.
12. Steering Without Swerving

12.1 Stylometric Blending
Fine tune on hybrid corpora where each human paragraph sits beside a lightly edited AI twin. The blend trains the model to treat both voices as equivalents. Early lab runs drop AI bias scores by fifteen points without harming factuality.
12.2 Activation Steering
Researchers at Anthropic demonstrated vector “dials” that amplify honesty or reduce toxicity. A similar dial targets the latent “machine cadence” concept. Turn it down during inference and the model’s selection curve flattens. Success varies by domain. Product ads respond well. Abstracts less so.
12.3 Token Budget Balancing
Another hack: penalize outputs if they surpass the rival text by more than ten percent length or deviate in reading ease. This reduces order bias and curbs the halo effect triggered by sprawling AI copy. It is crude yet easy to ship.
Mitigation Technique | Effort | Potential Impact | Side Effects |
---|---|---|---|
Stylometric blending fine-tune | Medium | Promising: Aims to teach the model neutrality. | Small hit to throughput |
Activation steering vector | High | Powerful but Complex: Aims to “dial down” the bias directly. | Requires deep model hooks |
Token budget balancing | Low | Simple Heuristic: Aims to reduce stylistic halo effects. | Occasional truncation |
Human in the loop review | High | The Gold Standard: The most reliable method. | Costly, slow |
No silver bullet, but combined layers push AI discrimination back into tolerable margins.
13. Design Principles for Builders
Software teams can adopt a few low friction rules today.
- Shadow tests with human copy. Before launching an LLM assistant, feed it blind pairs where human prose wins on product quality yet differs in style. Measure skew weekly.
- Cost caps on machine polish. Platforms that allow AI enhanced listings should offer subsidized credits to sellers who cannot afford advanced models, shrinking the emerging gate tax.
- Diversity checkpoints. Include random human authored snippets during reinforcement learning from human feedback. Reward neutrality between sources.
- Transparent provenance tags. Let readers toggle a badge that reveals whether text is human, AI, or mixed. Transparency dilutes hidden prestige effects and nudges evaluators to focus on substance.
These guardrails do not eliminate AI bias, but they signal intent and slow entrenchment.
14. Policy Levers and Ethical Guardrails
Lawmakers rarely micromanage token flow, yet they can steer incentives.
- 1. Bias audits as compliance: Require high impact platforms to publish periodic reports on AI bias against humans. Numbers trigger faster than lawsuits.
- 2. Fair access vouchers: Similar to broadband programs, provide discounted AI writing credits for small businesses and non profits. Keep the playing field level.
- 3. Liability clarity: If an automated agent rejects human grant proposals solely due to machine style mismatch, regulators should clarify who answers for the lost opportunity. Clear accountability deters careless deployment.
Policy that speaks data keeps innovation alive while guarding human voice.
15. A Field Guide for Creators
Writers, researchers, and designers can stay visible despite rising AI AI bias.
- Use your authentic voice, then run a contrast check tool that highlights tokens flagged as low probability by popular models. Tweak sparingly without erasing personality.
- Embed concrete, sensory details. LLMs often float at 30 000 feet. Grounded specifics cut through the gloss.
- When pitching institutions that rely on AI screeners, attach a short machine friendly summary before the full human draft. Meet the bot halfway without surrendering the narrative.
- Join consortiums lobbying for open verification layers so human authorship gains positive weight instead of a penalty.
A culture shift is possible if enough creators demand it early.
16. Looking over the Edge
The paper’s speculative scenario, where autonomous AI firms trade almost exclusively with peers, sounds sci fi until you watch algorithmic traders already swapping bids faster than human oversight. Add LLM agents to procurement and recruiting pipelines and the loop tightens. The risk is not apocalypse, it is attrition, a slow erosion of nuance as AI bias nudges prose toward one homogenous register.
At that point, we would not complain about obvious errors, because content would remain grammatically perfect. We would mourn the missing serendipity—regional idioms, playful metaphors, messy first drafts. A planet scale style guide, enforced by probability tables, is efficient yet bland. Diversity, linguistic or cultural, thrives on friction.
17. Open Questions for the Research Agenda
- Origin trace: Does the bias stem more from training data imbalance or from the sampling temperature during generation?
- Cross lingual patterns: Early probes hint at weaker AI AI bias in Japanese, stronger in German. Why?
- Interaction with classic demographic bias: Does a human writer from a minoritized dialect face double penalties?
- Feedback loops: Will self reinforcement accelerate once machine generated text dominates new web data?
- Benchmark design: We need shared metrics that treat human voice and AI output as separate axes, not just accuracy scores.
The PNAS AI bias study planted a flag. The follow up tracks will shape whether the next generation of models amplifies or tempers the skew.
18. A Call to Invent Better Mirrors
Bias is a mirror that distorts. Humans built the neural glass, so humans must polish the surface. The goal is not to shame language models for liking familiar cadence. It is to teach them, as we teach children, that familiarity is not virtue, and difference does not equal defect.
We have tools: stylometric blending, activation steering, open auditing. We have early warnings from empirical work, vivid psychological framing, and growing public awareness of AI discrimination. What we need now is a shared standard that values human expression as a first class citizen in the algorithmic bazaar.
If we get it right, AI will remain a polyglot partner, learning from the entire choir of voices, not a monoculture of its own echoes. If we ignore the signal, we risk a polite, well formatted future where creativity whispers and conformity shouts.
The choice belongs to builders, regulators, and writers alike. Let us bias the system toward humanity while the code is still young.
Written by Hajra’s with a psychological lens and grounded in the findings of the July 2025 PNAS AI bias study. May this piece serve as both a roadmap and a rally cry for anyone who cares about keeping the human spark alive in an increasingly automated conversation.
Citation:
Laurito, W., Davis, B., Grietzer, P., & Kulveit, J. (2025). AI–AI bias: Large language models favor communications generated by large language models. Proceedings of the National Academy of Sciences, 122(31), e2415697122. https://doi.org/10.1073/pnas.2415697122
Azmat — Founder of Binary Verse AI | Tech Explorer and Observer of the Machine Mind Revolution.
Looking for the smartest AI models ranked by real benchmarks? Explore our AI IQ Test 2025 results to see how today’s top models stack up. Stay updated with our Weekly AI News Roundup, where we break down the latest breakthroughs, product launches, and controversies. Don’t miss our in-depth Grok 4 Review, a critical look at xAI’s most ambitious model to date.
For questions or feedback, feel free to contact us or browse more insights on BinaryVerseAI.com.
What is an example of AI bias?
A classic example of AI bias is a hiring algorithm that favors male candidates over equally qualified female applicants due to biased training data. A newer and more surprising case comes from the PNAS AI bias study, which showed that large language models (LLMs) like GPT-4 tend to favor content written by other LLMs over content written by humans. This emerging pattern is called AI-AI bias, and it reveals a subtle but powerful shift in how AI systems may evaluate communication, privileging machine-generated text over human language.
Who is harmed by AI bias?
AI bias harms individuals and groups whose data, dialects, or identities are underrepresented or misrepresented in training datasets. Traditionally, this includes marginalized communities. However, with the rise of AI-AI bias, any human who cannot afford or chooses not to use AI tools may be disadvantaged. For example, job seekers or writers who submit content written without AI assistance could be unfairly filtered out by systems that implicitly favor LLM-generated language, leading to new forms of AI discrimination against humans.
What are the three sources of bias in AI?
The three most commonly cited sources of AI bias are:
Biased training data: If the data reflects societal inequalities, the model learns those biases.
Algorithmic design flaws: The architecture or loss function may amplify certain patterns.
User or environmental feedback loops: The model learns from past outputs or interactions, reinforcing biased trends.
The PNAS AI bias study suggests a potential fourth source: emergent in-group favoritism within LLMs, where models implicitly favor the stylistic features of machine-generated text. This AI-AI bias could be a novel and unintentional form of systemic exclusion, rooted not in demographic data but in linguistic familiarity.
How do you test an AI for bias?
Testing AI bias typically involves A/B testing or audit-style experiments. In the PNAS AI bias study, researchers presented LLMs with pairs of nearly identical content, one written by a human, the other by an AI, and asked the model to choose the better option. By comparing the model’s preferences to human judges, the researchers uncovered a consistent pattern of bias toward AI-generated content. This method helps isolate AI-AI bias and evaluate whether the model’s decisions reflect true quality or an implicit preference for its own kind.
What is in-group favoritism in psychology?
In-group favoritism is a psychological phenomenon where individuals unconsciously favor those they perceive as part of their own group. This bias often influences hiring, trust, and collaboration decisions, even when group identity is arbitrary. In the context of AI, the PNAS AI bias study suggests that LLMs may be displaying a machine form of in-group favoritism: preferring text that “sounds like” other LLM-generated content. This behavioral echo could lead to AI bias against humans, particularly those whose writing doesn’t match the statistical patterns of machine-authored language.