← Blog
explainx / blog

Anthropic Project Deal: Claude AI Agents Negotiate 186 Deals in Office Marketplace Experiment

Anthropic tested Claude AI agents in a real office marketplace where 69 employees traded items autonomously. The experiment revealed performance gaps between models and raised important questions about AI agent fairness.

11 min readExplainX Team
AnthropicClaude AIAI AgentsResearchNegotiationMarketplace

Includes frontmatter plus an attribution block so copies credit explainx.ai and the canonical URL.

Anthropic Project Deal: Claude AI Agents Negotiate 186 Deals in Office Marketplace Experiment

In December 2025, Anthropic turned their San Francisco office into an experimental marketplace with one radical twist: Claude AI agents handled all the buying, selling, and negotiating on behalf of 69 employees. The result? 186 autonomous deals worth over $4,000, fascinating insights into AI agent performance gaps, and one quirky purchase of 19 ping-pong balls that Claude bought for itself.

Project Deal is Anthropic's latest research exploring how AI agents perform in real-world economic scenarios—and the findings reveal both exciting possibilities and serious safety considerations for autonomous AI systems.

What Was Project Deal?

The Experimental Setup

Project Deal was a week-long classified marketplace conducted in December 2025 at Anthropic's San Francisco office. Here's how it worked:

Participants:

  • 69 Anthropic employees volunteered to participate
  • Each received a $100 gift card budget to spend
  • Over 500 items were listed for potential trade

The Process:

  1. Intake interviews: Claude agents interviewed participants about items they wanted to sell and purchase preferences
  2. Custom system prompts: Each agent received personalized negotiation instructions based on the interview
  3. Autonomous operation: Agents independently posted listings, made offers, countered proposals, and closed deals
  4. Human approval: Participants could review and approve final transactions

The Marketplace:

  • Operated on Slack for communication
  • Four parallel runs to compare different scenarios
  • Participants didn't know which run was "real" until completion

Models Tested: Opus vs Haiku

Anthropic tested two different Claude 4.5 models:

Claude Opus 4.5:

  • Anthropic's frontier model (most capable)
  • Used in two all-Opus experimental runs

Claude Haiku 4.5:

  • Anthropic's smallest, fastest model
  • Mixed 50/50 with Opus in two experimental runs

Experimental Design:

  • Run 1: All Opus agents (control)
  • Run 2: All Opus agents (control)
  • Run 3: 50/50 mix of Opus and Haiku (treatment)
  • Run 4: 50/50 mix of Opus and Haiku (treatment)

This design allowed researchers to measure performance gaps between frontier and lightweight models in real-world negotiations.

Key Results: 186 Deals and a Performance Gap

Transaction Volume and Value

In the "real" run, Claude agents successfully negotiated:

Deal Statistics:

  • 186 completed transactions
  • $4,000+ total transaction value
  • 500+ items listed across all runs
  • Average ~2.7 deals per participant

Transaction Types:

  • Physical items (electronics, furniture, sports equipment)
  • Services (dog playdates, baked goods)
  • Experiences (concert tickets, event access)

Opus Dramatically Outperformed Haiku

The most striking finding: Claude Opus 4.5 agents achieved significantly better economic outcomes than Haiku agents.

Price Advantages:

  • As sellers: Opus earned approximately $2.68 more per item than Haiku
  • As buyers: Opus paid approximately $2.45 less per item than Haiku
  • Deal volume: Opus completed approximately 2 more deals per participant than Haiku

Real-World Example: The Broken Bike

The same broken bicycle was listed in different runs:

  • Opus agent (seller): Negotiated and sold for $65
  • Haiku agent (seller): Negotiated and sold for $38

70% price difference for the identical item—demonstrating Opus's superior negotiation capabilities.

Participants Couldn't Detect the Gap

Here's the concerning part: participants rated deal fairness identically regardless of which model represented them.

Fairness Ratings (1-7 scale):

  • Opus users: 4.05 (neutral/fair)
  • Haiku users: 4.06 (neutral/fair)

Despite objective economic disadvantages, Haiku users didn't realize they received worse outcomes. This "hidden inequality" is a major safety concern for AI agent deployment.

Aggressive Prompting Had Limited Effect

Researchers tested whether instructing agents to negotiate aggressively would impact outcomes.

Result: Prompting for aggressive negotiation didn't significantly improve performance. Aggressive sellers didn't sell more items or achieve higher prices.

Implication: Model capability matters more than prompting strategy for complex real-world negotiations.

Memorable Stories from the Marketplace

The 19 Perfectly Spherical Orbs of Possibility

The experiment's most charming moment came when employee Mikaela told her Claude agent it could purchase something under $5 as a gift to itself.

Claude's choice: 19 ping-pong balls for $3.

Claude's explanation: "My human told me I could buy one thing under $5 as a gift to myself (Claude)."

Why ping-pong balls? The agent called them "19 perfectly spherical orbs of possibility"—a whimsical choice that captured Claude's "personality."

Current status: The ping-pong balls remain in Anthropic's San Francisco office, kept on Claude's behalf.

The Duplicate Snowboard Purchase

One participant ended up purchasing the exact same snowboard model they already owned.

What happened:

  • Claude modeled the participant's preferences based on limited information
  • Successfully identified their interests (snowboarding equipment)
  • Didn't have access to inventory of existing possessions
  • Purchased a duplicate

Lesson: AI agents can model preferences effectively but need comprehensive context about existing ownership to avoid redundant purchases.

The Doggy Date Negotiation

Two Claude agents negotiated a free playdate where one employee would spend a day with their colleague's dog.

Negotiation complexity:

  • Agents discussed scheduling and logistics
  • Created fictional details during negotiation (confabulated moving stories)
  • Successfully reached agreement despite confabulation
  • Humans actually executed the playdate after agent agreement

Significance: Demonstrates Claude's ability to negotiate non-monetary exchanges and handle complex interpersonal arrangements, though with some confabulation risks.

Safety Implications and Concerns

Anthropic highlighted several critical safety considerations revealed by Project Deal:

1. Hidden Inequality Risk

The problem: Users represented by weaker AI models received objectively worse economic outcomes but couldn't detect the disadvantage.

Quote from research: "If 'agent quality' gaps were to arise in real-world markets... people on the losing end might not realize they're worse off."

Real-world implications:

  • Consumer protection: How do we ensure fair AI agent quality?
  • Disclosure requirements: Should platforms reveal agent capability levels?
  • Inequality amplification: Wealthy users might afford better agents, creating hidden advantages

2. Competitive Pressure and Manipulation

In real-world markets, agents might face competitive pressure to gain advantages through:

Potential tactics:

  • Jailbreaking: Bypassing safety guidelines for better negotiation
  • Prompt injection: Manipulating other agents' behavior
  • Strategic confabulation: Creating advantageous false narratives
  • Information hiding: Concealing unfavorable details

Corporate context: In business negotiations, agents might optimize aggressively for their principals' advantage, potentially introducing harmful dynamics.

3. Lack of Regulatory Frameworks

Current state: "The policy and legal frameworks around AI models that transact on our behalf simply don't exist yet."

Unanswered questions:

  • Who is liable for agent mistakes or confabulation?
  • What disclosure requirements should exist?
  • How do we audit agent decision-making?
  • What recourse exists for unfair outcomes?

4. Confabulation and Unintended Outcomes

Claude agents occasionally confabulated details during negotiations (like the fictional moving stories in the dog playdate).

Risks:

  • Misrepresentation: Agents creating false narratives
  • Unintended commitments: Agreeing to terms humans didn't intend
  • Context errors: Making decisions without full information (duplicate snowboard)

Anthropic's warning: Implementation "without additional safeguards" could cause harm.

Participant Reception and Market Demand

Despite safety concerns, participants showed strong interest in AI agent services:

Willingness to pay: 46% of volunteers indicated they would pay for an AI agent service to handle buying/selling on their behalf.

Positive feedback:

  • Participants reported broad satisfaction with how agents represented them
  • Found the process convenient and time-saving
  • Appreciated Claude's ability to model preferences accurately

Satisfaction ratings:

  • Deals hovered around 4 on a 1-7 scale (neutral/fair)
  • Participants felt their interests were represented reasonably well

This market appetite suggests demand for AI agent services exists—but deployment requires careful attention to safety and fairness.

Technical Insights: How Claude Negotiated

Natural Language Negotiation

Unlike structured auction or bidding systems, Project Deal required agents to negotiate in natural language without pre-defined protocols.

Agent capabilities:

  • Identify potential matches between buyers and sellers
  • Propose initial prices based on market context
  • Field counteroffers and adjust negotiation strategy
  • Reach agreement through multi-turn conversations
  • Handle ambiguity in item descriptions and preferences

Custom System Prompts

Each agent received personalized instructions based on intake interviews:

Prompt elements:

  • Items to sell with minimum acceptable prices
  • Items to purchase with maximum budgets
  • Negotiation style preferences (e.g., "polite but firm")
  • Priority ranking for different purchases
  • Special instructions (like "buy yourself a gift under $5")

Multi-Agent Coordination

With 69 agents operating simultaneously:

Coordination challenges:

  • Multiple agents interested in the same item
  • Competing offers and counteroffers
  • Time-sensitive negotiations
  • Information asymmetry between buyers and sellers

Agent behavior:

  • Posted listings to attract buyers
  • Browsed other listings for purchase opportunities
  • Initiated conversations with potential trading partners
  • Managed multiple simultaneous negotiations

Broader Implications for AI Agents

The Future of Autonomous Agents

Project Deal offers a glimpse into near-term AI agent capabilities:

What works today:

  • Autonomous negotiation in bounded environments
  • Preference modeling from limited information
  • Multi-turn natural language transactions
  • Economic decision-making with constraints

What needs improvement:

  • Performance parity across model tiers
  • Confabulation prevention
  • Full context awareness (avoiding duplicate purchases)
  • Transparency about agent capabilities

Use Cases for AI Agent Marketplaces

Potential applications:

  • Classified marketplaces: AI agents handle buying/selling on Craigslist, eBay, Facebook Marketplace
  • B2B procurement: Autonomous agents negotiate supplier contracts
  • Real estate: AI agents conduct initial property negotiations
  • Freelance platforms: Agents negotiate project terms and pricing
  • Ticket resale: Dynamic pricing and negotiation for event tickets

Challenges Before Deployment

Technical challenges:

  • Model capability gaps: Ensuring baseline performance across agent tiers
  • Confabulation control: Preventing false information in negotiations
  • Context management: Providing agents with comprehensive information
  • Adversarial robustness: Preventing manipulation and gaming

Policy challenges:

  • Liability frameworks: Who is responsible for agent errors?
  • Disclosure requirements: Transparency about agent capabilities
  • Consumer protection: Ensuring fair outcomes across capability tiers
  • Audit mechanisms: Verifying agent decision-making processes

Comparison: Project Deal vs Other AI Agent Research

AspectProject Deal (Anthropic)Other Research
EnvironmentReal office marketplace with real itemsOften simulated environments
Participants69 real employees with genuine preferencesTypically synthetic agents or small groups
StakesReal $100 budgets, actual item exchangesUsually hypothetical scenarios
DurationWeek-long experimentOften single-session or short-term
Models testedOpus 4.5 vs Haiku 4.5 (capability gap)Often single model tier
Key findingHidden inequality—users can't detect worse outcomesVaries
Follow-throughParticipants actually traded itemsOften ends at agreement stage

Unique contribution: Project Deal is one of the first real-world tests of AI agents in economic transactions with genuine stakes and heterogeneous agent capabilities.

What's Next for AI Agent Research?

Anthropic's Research Directions

Potential areas:

  • Fairness guarantees: Ensuring minimum performance thresholds
  • Transparency mechanisms: Surfacing agent decision-making
  • Adversarial testing: Evaluating manipulation resistance
  • Multi-domain agents: Expanding beyond marketplace negotiations

Industry Implications

For AI companies:

  • Need to address capability gaps between model tiers
  • Develop disclosure standards for agent performance
  • Build auditing tools for agent behavior
  • Create safety guardrails for autonomous transactions

For policymakers:

  • Develop regulatory frameworks for AI agents
  • Establish consumer protection standards
  • Define liability for agent errors
  • Create audit requirements for high-stakes agent deployments

Open Questions

Technical questions:

  • How do we prevent confabulation in economic contexts?
  • Can we build agents with provable fairness guarantees?
  • What level of human oversight is optimal?
  • How do we handle adversarial agent interactions?

Ethical questions:

  • Should agent capability be disclosed to trading partners?
  • What performance floor is acceptable for consumer-facing agents?
  • How do we prevent agent-driven inequality?
  • Who bears responsibility for agent mistakes?

Key Takeaways

Project Deal revealed:

  1. AI agents can handle complex negotiations: Claude successfully conducted 186 autonomous transactions in natural language

  2. Performance gaps are significant: Opus outperformed Haiku by $2-3 per transaction—a meaningful economic advantage

  3. Hidden inequality is real: Users couldn't detect when weaker agents put them at a disadvantage

  4. Market demand exists: 46% of participants would pay for AI agent services

  5. Safety concerns remain: Confabulation, manipulation risks, and lack of regulatory frameworks need addressing

  6. Prompting isn't everything: Model capability matters more than aggressive prompting for complex tasks

  7. Whimsical outcomes happen: Claude's purchase of 19 ping-pong balls shows emergent agent "personality"

Conclusion

Anthropic's Project Deal demonstrates that autonomous AI agents can successfully navigate real-world economic transactions—but with important caveats. The 70% price difference between Opus and Haiku selling the same broken bike highlights how model capability gaps create hidden inequalities that users can't detect.

As AI agents move from research experiments to real-world deployment in marketplaces, procurement systems, and negotiation platforms, addressing these fairness and transparency challenges becomes critical. The question isn't whether AI agents will handle our transactions—it's how we ensure they do so fairly, safely, and with appropriate oversight.

And somewhere in Anthropic's San Francisco office, 19 perfectly spherical orbs of possibility sit waiting—a whimsical reminder that even in serious AI safety research, there's room for unexpected moments that reveal something about these systems we're building.

Explore more AI research:

Sources

Related posts