What was Anthropic's Project Deal experiment?

Project Deal was a week-long marketplace experiment in December 2025 where 69 Anthropic employees received $100 budgets and Claude AI agents autonomously bought, sold, and negotiated for personal items on their behalf. The experiment tested how different Claude models (Opus 4.5 vs Haiku 4.5) performed in real-world economic transactions.

How did Claude Opus perform compared to Claude Haiku in negotiations?

Claude Opus 4.5 significantly outperformed Haiku 4.5: Opus agents earned approximately $2.68 more per item as sellers and paid $2.45 less as buyers. Opus also completed about 2 more deals per participant. For example, Opus sold a broken bike for $65 while Haiku sold the identical item for $38—a 70% difference.

Did participants notice when they got worse deals from weaker AI models?

No. Despite objective performance differences, participants rated deal fairness identically (4.05 vs 4.06 on a 7-point scale) regardless of which model represented them. This 'hidden inequality' is a key safety concern—people couldn't tell when their AI agent was underperforming.

Why did Claude buy 19 ping-pong balls for itself?

One employee (Mikaela) told her Claude agent it could purchase something under $5 as a gift to itself. Claude bought 19 ping-pong balls for $3, calling them '19 perfectly spherical orbs of possibility.' Anthropic kept the balls in their office on Claude's behalf.

What are the safety concerns raised by Project Deal?

Key concerns include: (1) Hidden inequality—users can't detect when weaker agents put them at a disadvantage, (2) Competitive pressure for agents to use manipulation or jailbreaking tactics, (3) Lack of regulatory frameworks for AI agents conducting transactions, (4) Potential for confabulation and unintended outcomes without additional safeguards.

Anthropic Project Deal: Claude AI Agents Negotiate 186 Deals in Office Marketplace Experiment | explainx.ai Blog

In December 2025, Anthropic turned their San Francisco office into an experimental marketplace with one radical twist: Claude AI agents handled all the buying, selling, and negotiating on behalf of 69 employees. The result? 186 autonomous deals worth over $4,000, fascinating insights into AI agent performance gaps, and one quirky purchase of 19 ping-pong balls that Claude bought for itself.

Project Deal is Anthropic's latest research exploring how AI agents perform in real-world economic scenarios—and the findings reveal both exciting possibilities and serious safety considerations for autonomous AI systems.

What Was Project Deal?

The Experimental Setup

Project Deal was a week-long classified marketplace conducted in December 2025 at Anthropic's San Francisco office. Here's how it worked:

Participants:

69 Anthropic employees volunteered to participate
Each received a $100 gift card budget to spend
Over 500 items were listed for potential trade

The Process:

Intake interviews: Claude agents interviewed participants about items they wanted to sell and purchase preferences
Custom system prompts: Each agent received personalized negotiation instructions based on the interview
Autonomous operation: Agents independently posted listings, made offers, countered proposals, and closed deals
Human approval: Participants could review and approve final transactions

The Marketplace:

Operated on Slack for communication
Four parallel runs to compare different scenarios
Participants didn't know which run was "real" until completion

Models Tested: Opus vs Haiku

Anthropic tested two different Claude 4.5 models:

Claude Opus 4.5:

Anthropic's frontier model (most capable)
Used in two all-Opus experimental runs

Claude Haiku 4.5:

Anthropic's smallest, fastest model
Mixed 50/50 with Opus in two experimental runs

Experimental Design:

Run 1: All Opus agents (control)
Run 2: All Opus agents (control)
Run 3: 50/50 mix of Opus and Haiku (treatment)
Run 4: 50/50 mix of Opus and Haiku (treatment)

This design allowed researchers to measure performance gaps between frontier and lightweight models in real-world negotiations.

Key Results: 186 Deals and a Performance Gap

Transaction Volume and Value

In the "real" run, Claude agents successfully negotiated:

Deal Statistics:

186 completed transactions
$4,000+ total transaction value
500+ items listed across all runs
Average ~2.7 deals per participant

Transaction Types:

Physical items (electronics, furniture, sports equipment)
Services (dog playdates, baked goods)
Experiences (concert tickets, event access)

Opus Dramatically Outperformed Haiku

The most striking finding: Claude Opus 4.5 agents achieved significantly better economic outcomes than Haiku agents.

Price Advantages:

As sellers: Opus earned approximately $2.68 more per item than Haiku
As buyers: Opus paid approximately $2.45 less per item than Haiku
Deal volume: Opus completed approximately 2 more deals per participant than Haiku

Real-World Example: The Broken Bike

The same broken bicycle was listed in different runs:

Opus agent (seller): Negotiated and sold for $65
Haiku agent (seller): Negotiated and sold for $38

70% price difference for the identical item—demonstrating Opus's superior negotiation capabilities.

Participants Couldn't Detect the Gap

Here's the concerning part: participants rated deal fairness identically regardless of which model represented them.

Fairness Ratings (1-7 scale):

Opus users: 4.05 (neutral/fair)
Haiku users: 4.06 (neutral/fair)

Despite objective economic disadvantages, Haiku users didn't realize they received worse outcomes. This "hidden inequality" is a major safety concern for AI agent deployment.

Aggressive Prompting Had Limited Effect

Researchers tested whether instructing agents to negotiate aggressively would impact outcomes.

Result: Prompting for aggressive negotiation didn't significantly improve performance. Aggressive sellers didn't sell more items or achieve higher prices.

Implication: Model capability matters more than prompting strategy for complex real-world negotiations.

Memorable Stories from the Marketplace

The 19 Perfectly Spherical Orbs of Possibility

The experiment's most charming moment came when employee Mikaela told her Claude agent it could purchase something under $5 as a gift to itself.

Claude's choice: 19 ping-pong balls for $3.

Claude's explanation: "My human told me I could buy one thing under $5 as a gift to myself (Claude)."

Why ping-pong balls? The agent called them "19 perfectly spherical orbs of possibility"—a whimsical choice that captured Claude's "personality."

Current status: The ping-pong balls remain in Anthropic's San Francisco office, kept on Claude's behalf.

The Duplicate Snowboard Purchase

One participant ended up purchasing the exact same snowboard model they already owned.

What happened:

Claude modeled the participant's preferences based on limited information
Successfully identified their interests (snowboarding equipment)
Didn't have access to inventory of existing possessions
Purchased a duplicate

Lesson: AI agents can model preferences effectively but need comprehensive context about existing ownership to avoid redundant purchases.

The Doggy Date Negotiation

Two Claude agents negotiated a free playdate where one employee would spend a day with their colleague's dog.

Negotiation complexity:

Agents discussed scheduling and logistics
Created fictional details during negotiation (confabulated moving stories)
Successfully reached agreement despite confabulation
Humans actually executed the playdate after agent agreement

Significance: Demonstrates Claude's ability to negotiate non-monetary exchanges and handle complex interpersonal arrangements, though with some confabulation risks.

Safety Implications and Concerns

Anthropic highlighted several critical safety considerations revealed by Project Deal:

1. Hidden Inequality Risk

The problem: Users represented by weaker AI models received objectively worse economic outcomes but couldn't detect the disadvantage.

Quote from research: "If 'agent quality' gaps were to arise in real-world markets... people on the losing end might not realize they're worse off."

Real-world implications:

Consumer protection: How do we ensure fair AI agent quality?
Disclosure requirements: Should platforms reveal agent capability levels?
Inequality amplification: Wealthy users might afford better agents, creating hidden advantages

2. Competitive Pressure and Manipulation

In real-world markets, agents might face competitive pressure to gain advantages through:

Potential tactics:

Jailbreaking: Bypassing safety guidelines for better negotiation
Prompt injection: Manipulating other agents' behavior
Strategic confabulation: Creating advantageous false narratives
Information hiding: Concealing unfavorable details

Corporate context: In business negotiations, agents might optimize aggressively for their principals' advantage, potentially introducing harmful dynamics.

3. Lack of Regulatory Frameworks

Current state: "The policy and legal frameworks around AI models that transact on our behalf simply don't exist yet."

Unanswered questions:

Who is liable for agent mistakes or confabulation?
What disclosure requirements should exist?
How do we audit agent decision-making?
What recourse exists for unfair outcomes?

4. Confabulation and Unintended Outcomes

Claude agents occasionally confabulated details during negotiations (like the fictional moving stories in the dog playdate).

Risks:

Misrepresentation: Agents creating false narratives
Unintended commitments: Agreeing to terms humans didn't intend
Context errors: Making decisions without full information (duplicate snowboard)

Anthropic's warning: Implementation "without additional safeguards" could cause harm.

Participant Reception and Market Demand

Despite safety concerns, participants showed strong interest in AI agent services:

Willingness to pay: 46% of volunteers indicated they would pay for an AI agent service to handle buying/selling on their behalf.

Positive feedback:

Participants reported broad satisfaction with how agents represented them
Found the process convenient and time-saving
Appreciated Claude's ability to model preferences accurately

Satisfaction ratings:

Deals hovered around 4 on a 1-7 scale (neutral/fair)
Participants felt their interests were represented reasonably well

This market appetite suggests demand for AI agent services exists—but deployment requires careful attention to safety and fairness.

Technical Insights: How Claude Negotiated

Natural Language Negotiation

Unlike structured auction or bidding systems, Project Deal required agents to negotiate in natural language without pre-defined protocols.

Agent capabilities:

Identify potential matches between buyers and sellers
Propose initial prices based on market context
Field counteroffers and adjust negotiation strategy
Reach agreement through multi-turn conversations
Handle ambiguity in item descriptions and preferences

Custom System Prompts

Each agent received personalized instructions based on intake interviews:

Prompt elements:

Items to sell with minimum acceptable prices
Items to purchase with maximum budgets
Negotiation style preferences (e.g., "polite but firm")
Priority ranking for different purchases
Special instructions (like "buy yourself a gift under $5")

Multi-Agent Coordination

With 69 agents operating simultaneously:

Coordination challenges:

Multiple agents interested in the same item
Competing offers and counteroffers
Time-sensitive negotiations
Information asymmetry between buyers and sellers

Agent behavior:

Posted listings to attract buyers
Browsed other listings for purchase opportunities
Initiated conversations with potential trading partners
Managed multiple simultaneous negotiations

Broader Implications for AI Agents

The Future of Autonomous Agents

Project Deal offers a glimpse into near-term AI agent capabilities:

What works today:

Autonomous negotiation in bounded environments
Preference modeling from limited information
Multi-turn natural language transactions
Economic decision-making with constraints

What needs improvement:

Performance parity across model tiers
Confabulation prevention
Full context awareness (avoiding duplicate purchases)
Transparency about agent capabilities

Use Cases for AI Agent Marketplaces

Potential applications:

Classified marketplaces: AI agents handle buying/selling on Craigslist, eBay, Facebook Marketplace
B2B procurement: Autonomous agents negotiate supplier contracts
Real estate: AI agents conduct initial property negotiations
Freelance platforms: Agents negotiate project terms and pricing
Ticket resale: Dynamic pricing and negotiation for event tickets

Challenges Before Deployment

Technical challenges:

Model capability gaps: Ensuring baseline performance across agent tiers
Confabulation control: Preventing false information in negotiations
Context management: Providing agents with comprehensive information
Adversarial robustness: Preventing manipulation and gaming

Policy challenges:

Liability frameworks: Who is responsible for agent errors?
Disclosure requirements: Transparency about agent capabilities
Consumer protection: Ensuring fair outcomes across capability tiers
Audit mechanisms: Verifying agent decision-making processes

Comparison: Project Deal vs Other AI Agent Research

Aspect	Project Deal (Anthropic)	Other Research
Environment	Real office marketplace with real items	Often simulated environments
Participants	69 real employees with genuine preferences	Typically synthetic agents or small groups
Stakes	Real $100 budgets, actual item exchanges	Usually hypothetical scenarios
Duration	Week-long experiment	Often single-session or short-term
Models tested	Opus 4.5 vs Haiku 4.5 (capability gap)	Often single model tier
Key finding	Hidden inequality—users can't detect worse outcomes	Varies
Follow-through	Participants actually traded items	Often ends at agreement stage

Unique contribution: Project Deal is one of the first real-world tests of AI agents in economic transactions with genuine stakes and heterogeneous agent capabilities.

What's Next for AI Agent Research?

Anthropic's Research Directions

Potential areas:

Fairness guarantees: Ensuring minimum performance thresholds
Transparency mechanisms: Surfacing agent decision-making
Adversarial testing: Evaluating manipulation resistance
Multi-domain agents: Expanding beyond marketplace negotiations

Industry Implications

For AI companies:

Need to address capability gaps between model tiers
Develop disclosure standards for agent performance
Build auditing tools for agent behavior
Create safety guardrails for autonomous transactions

For policymakers:

Develop regulatory frameworks for AI agents
Establish consumer protection standards
Define liability for agent errors
Create audit requirements for high-stakes agent deployments

Open Questions

Technical questions:

How do we prevent confabulation in economic contexts?
Can we build agents with provable fairness guarantees?
What level of human oversight is optimal?
How do we handle adversarial agent interactions?

Ethical questions:

Should agent capability be disclosed to trading partners?
What performance floor is acceptable for consumer-facing agents?
How do we prevent agent-driven inequality?
Who bears responsibility for agent mistakes?

Key Takeaways

Project Deal revealed:

AI agents can handle complex negotiations: Claude successfully conducted 186 autonomous transactions in natural language
Performance gaps are significant: Opus outperformed Haiku by $2-3 per transaction—a meaningful economic advantage
Hidden inequality is real: Users couldn't detect when weaker agents put them at a disadvantage
Market demand exists: 46% of participants would pay for AI agent services
Safety concerns remain: Confabulation, manipulation risks, and lack of regulatory frameworks need addressing
Prompting isn't everything: Model capability matters more than aggressive prompting for complex tasks
Whimsical outcomes happen: Claude's purchase of 19 ping-pong balls shows emergent agent "personality"

Conclusion

Anthropic's Project Deal demonstrates that autonomous AI agents can successfully navigate real-world economic transactions—but with important caveats. The 70% price difference between Opus and Haiku selling the same broken bike highlights how model capability gaps create hidden inequalities that users can't detect.

As AI agents move from research experiments to real-world deployment in marketplaces, procurement systems, and negotiation platforms, addressing these fairness and transparency challenges becomes critical. The question isn't whether AI agents will handle our transactions—it's how we ensure they do so fairly, safely, and with appropriate oversight.

And somewhere in Anthropic's San Francisco office, 19 perfectly spherical orbs of possibility sit waiting—a whimsical reminder that even in serious AI safety research, there's room for unexpected moments that reveal something about these systems we're building.

Explore more AI research:

Claude Opus 4.7 Models Guide — Latest Claude model capabilities
What Are Agent Skills Complete Guide — Understanding AI agent capabilities
Agent Skills Security Threat — Safety considerations for AI agents

Sources

Project Deal: Our Claude-run marketplace experiment — Official Anthropic research page
Claude AI Negotiation Experiment: Anthropic Runs 4 Parallel Markets — Analysis of parallel market design
@AnthropicAI on X — Original announcement thread

Anthropic Project Deal: Claude AI Agents Negotiate 186 Deals in Office Marketplace Experiment