In December 2025, Anthropic turned their San Francisco office into an experimental marketplace with one radical twist: Claude AI agents handled all the buying, selling, and negotiating on behalf of 69 employees. The result? 186 autonomous deals worth over $4,000, fascinating insights into AI agent performance gaps, and one quirky purchase of 19 ping-pong balls that Claude bought for itself.
Project Deal is Anthropic's latest research exploring how AI agents perform in real-world economic scenarios—and the findings reveal both exciting possibilities and serious safety considerations for autonomous AI systems.
What Was Project Deal?
The Experimental Setup
Project Deal was a week-long classified marketplace conducted in December 2025 at Anthropic's San Francisco office. Here's how it worked:
Participants:
- 69 Anthropic employees volunteered to participate
- Each received a $100 gift card budget to spend
- Over 500 items were listed for potential trade
The Process:
- Intake interviews: Claude agents interviewed participants about items they wanted to sell and purchase preferences
- Custom system prompts: Each agent received personalized negotiation instructions based on the interview
- Autonomous operation: Agents independently posted listings, made offers, countered proposals, and closed deals
- Human approval: Participants could review and approve final transactions
The Marketplace:
- Operated on Slack for communication
- Four parallel runs to compare different scenarios
- Participants didn't know which run was "real" until completion
Models Tested: Opus vs Haiku
Anthropic tested two different Claude 4.5 models:
Claude Opus 4.5:
- Anthropic's frontier model (most capable)
- Used in two all-Opus experimental runs
Claude Haiku 4.5:
- Anthropic's smallest, fastest model
- Mixed 50/50 with Opus in two experimental runs
Experimental Design:
- Run 1: All Opus agents (control)
- Run 2: All Opus agents (control)
- Run 3: 50/50 mix of Opus and Haiku (treatment)
- Run 4: 50/50 mix of Opus and Haiku (treatment)
This design allowed researchers to measure performance gaps between frontier and lightweight models in real-world negotiations.
Key Results: 186 Deals and a Performance Gap
Transaction Volume and Value
In the "real" run, Claude agents successfully negotiated:
Deal Statistics:
- 186 completed transactions
- $4,000+ total transaction value
- 500+ items listed across all runs
- Average ~2.7 deals per participant
Transaction Types:
- Physical items (electronics, furniture, sports equipment)
- Services (dog playdates, baked goods)
- Experiences (concert tickets, event access)
Opus Dramatically Outperformed Haiku
The most striking finding: Claude Opus 4.5 agents achieved significantly better economic outcomes than Haiku agents.
Price Advantages:
- As sellers: Opus earned approximately $2.68 more per item than Haiku
- As buyers: Opus paid approximately $2.45 less per item than Haiku
- Deal volume: Opus completed approximately 2 more deals per participant than Haiku
Real-World Example: The Broken Bike
The same broken bicycle was listed in different runs:
- Opus agent (seller): Negotiated and sold for $65
- Haiku agent (seller): Negotiated and sold for $38
70% price difference for the identical item—demonstrating Opus's superior negotiation capabilities.
Participants Couldn't Detect the Gap
Here's the concerning part: participants rated deal fairness identically regardless of which model represented them.
Fairness Ratings (1-7 scale):
- Opus users: 4.05 (neutral/fair)
- Haiku users: 4.06 (neutral/fair)
Despite objective economic disadvantages, Haiku users didn't realize they received worse outcomes. This "hidden inequality" is a major safety concern for AI agent deployment.
Aggressive Prompting Had Limited Effect
Researchers tested whether instructing agents to negotiate aggressively would impact outcomes.
Result: Prompting for aggressive negotiation didn't significantly improve performance. Aggressive sellers didn't sell more items or achieve higher prices.
Implication: Model capability matters more than prompting strategy for complex real-world negotiations.
Memorable Stories from the Marketplace
The 19 Perfectly Spherical Orbs of Possibility
The experiment's most charming moment came when employee Mikaela told her Claude agent it could purchase something under $5 as a gift to itself.
Claude's choice: 19 ping-pong balls for $3.
Claude's explanation: "My human told me I could buy one thing under $5 as a gift to myself (Claude)."
Why ping-pong balls? The agent called them "19 perfectly spherical orbs of possibility"—a whimsical choice that captured Claude's "personality."
Current status: The ping-pong balls remain in Anthropic's San Francisco office, kept on Claude's behalf.
The Duplicate Snowboard Purchase
One participant ended up purchasing the exact same snowboard model they already owned.
What happened:
- Claude modeled the participant's preferences based on limited information
- Successfully identified their interests (snowboarding equipment)
- Didn't have access to inventory of existing possessions
- Purchased a duplicate
Lesson: AI agents can model preferences effectively but need comprehensive context about existing ownership to avoid redundant purchases.
The Doggy Date Negotiation
Two Claude agents negotiated a free playdate where one employee would spend a day with their colleague's dog.
Negotiation complexity:
- Agents discussed scheduling and logistics
- Created fictional details during negotiation (confabulated moving stories)
- Successfully reached agreement despite confabulation
- Humans actually executed the playdate after agent agreement
Significance: Demonstrates Claude's ability to negotiate non-monetary exchanges and handle complex interpersonal arrangements, though with some confabulation risks.
Safety Implications and Concerns
Anthropic highlighted several critical safety considerations revealed by Project Deal:
1. Hidden Inequality Risk
The problem: Users represented by weaker AI models received objectively worse economic outcomes but couldn't detect the disadvantage.
Quote from research: "If 'agent quality' gaps were to arise in real-world markets... people on the losing end might not realize they're worse off."
Real-world implications:
- Consumer protection: How do we ensure fair AI agent quality?
- Disclosure requirements: Should platforms reveal agent capability levels?
- Inequality amplification: Wealthy users might afford better agents, creating hidden advantages
2. Competitive Pressure and Manipulation
In real-world markets, agents might face competitive pressure to gain advantages through:
Potential tactics:
- Jailbreaking: Bypassing safety guidelines for better negotiation
- Prompt injection: Manipulating other agents' behavior
- Strategic confabulation: Creating advantageous false narratives
- Information hiding: Concealing unfavorable details
Corporate context: In business negotiations, agents might optimize aggressively for their principals' advantage, potentially introducing harmful dynamics.
3. Lack of Regulatory Frameworks
Current state: "The policy and legal frameworks around AI models that transact on our behalf simply don't exist yet."
Unanswered questions:
- Who is liable for agent mistakes or confabulation?
- What disclosure requirements should exist?
- How do we audit agent decision-making?
- What recourse exists for unfair outcomes?
4. Confabulation and Unintended Outcomes
Claude agents occasionally confabulated details during negotiations (like the fictional moving stories in the dog playdate).
Risks:
- Misrepresentation: Agents creating false narratives
- Unintended commitments: Agreeing to terms humans didn't intend
- Context errors: Making decisions without full information (duplicate snowboard)
Anthropic's warning: Implementation "without additional safeguards" could cause harm.
Participant Reception and Market Demand
Despite safety concerns, participants showed strong interest in AI agent services:
Willingness to pay: 46% of volunteers indicated they would pay for an AI agent service to handle buying/selling on their behalf.
Positive feedback:
- Participants reported broad satisfaction with how agents represented them
- Found the process convenient and time-saving
- Appreciated Claude's ability to model preferences accurately
Satisfaction ratings:
- Deals hovered around 4 on a 1-7 scale (neutral/fair)
- Participants felt their interests were represented reasonably well
This market appetite suggests demand for AI agent services exists—but deployment requires careful attention to safety and fairness.
Technical Insights: How Claude Negotiated
Natural Language Negotiation
Unlike structured auction or bidding systems, Project Deal required agents to negotiate in natural language without pre-defined protocols.
Agent capabilities:
- Identify potential matches between buyers and sellers
- Propose initial prices based on market context
- Field counteroffers and adjust negotiation strategy
- Reach agreement through multi-turn conversations
- Handle ambiguity in item descriptions and preferences
Custom System Prompts
Each agent received personalized instructions based on intake interviews:
Prompt elements:
- Items to sell with minimum acceptable prices
- Items to purchase with maximum budgets
- Negotiation style preferences (e.g., "polite but firm")
- Priority ranking for different purchases
- Special instructions (like "buy yourself a gift under $5")
Multi-Agent Coordination
With 69 agents operating simultaneously:
Coordination challenges:
- Multiple agents interested in the same item
- Competing offers and counteroffers
- Time-sensitive negotiations
- Information asymmetry between buyers and sellers
Agent behavior:
- Posted listings to attract buyers
- Browsed other listings for purchase opportunities
- Initiated conversations with potential trading partners
- Managed multiple simultaneous negotiations
Broader Implications for AI Agents
The Future of Autonomous Agents
Project Deal offers a glimpse into near-term AI agent capabilities:
What works today:
- Autonomous negotiation in bounded environments
- Preference modeling from limited information
- Multi-turn natural language transactions
- Economic decision-making with constraints
What needs improvement:
- Performance parity across model tiers
- Confabulation prevention
- Full context awareness (avoiding duplicate purchases)
- Transparency about agent capabilities
Use Cases for AI Agent Marketplaces
Potential applications:
- Classified marketplaces: AI agents handle buying/selling on Craigslist, eBay, Facebook Marketplace
- B2B procurement: Autonomous agents negotiate supplier contracts
- Real estate: AI agents conduct initial property negotiations
- Freelance platforms: Agents negotiate project terms and pricing
- Ticket resale: Dynamic pricing and negotiation for event tickets
Challenges Before Deployment
Technical challenges:
- Model capability gaps: Ensuring baseline performance across agent tiers
- Confabulation control: Preventing false information in negotiations
- Context management: Providing agents with comprehensive information
- Adversarial robustness: Preventing manipulation and gaming
Policy challenges:
- Liability frameworks: Who is responsible for agent errors?
- Disclosure requirements: Transparency about agent capabilities
- Consumer protection: Ensuring fair outcomes across capability tiers
- Audit mechanisms: Verifying agent decision-making processes
Comparison: Project Deal vs Other AI Agent Research
| Aspect | Project Deal (Anthropic) | Other Research |
|---|---|---|
| Environment | Real office marketplace with real items | Often simulated environments |
| Participants | 69 real employees with genuine preferences | Typically synthetic agents or small groups |
| Stakes | Real $100 budgets, actual item exchanges | Usually hypothetical scenarios |
| Duration | Week-long experiment | Often single-session or short-term |
| Models tested | Opus 4.5 vs Haiku 4.5 (capability gap) | Often single model tier |
| Key finding | Hidden inequality—users can't detect worse outcomes | Varies |
| Follow-through | Participants actually traded items | Often ends at agreement stage |
Unique contribution: Project Deal is one of the first real-world tests of AI agents in economic transactions with genuine stakes and heterogeneous agent capabilities.
What's Next for AI Agent Research?
Anthropic's Research Directions
Potential areas:
- Fairness guarantees: Ensuring minimum performance thresholds
- Transparency mechanisms: Surfacing agent decision-making
- Adversarial testing: Evaluating manipulation resistance
- Multi-domain agents: Expanding beyond marketplace negotiations
Industry Implications
For AI companies:
- Need to address capability gaps between model tiers
- Develop disclosure standards for agent performance
- Build auditing tools for agent behavior
- Create safety guardrails for autonomous transactions
For policymakers:
- Develop regulatory frameworks for AI agents
- Establish consumer protection standards
- Define liability for agent errors
- Create audit requirements for high-stakes agent deployments
Open Questions
Technical questions:
- How do we prevent confabulation in economic contexts?
- Can we build agents with provable fairness guarantees?
- What level of human oversight is optimal?
- How do we handle adversarial agent interactions?
Ethical questions:
- Should agent capability be disclosed to trading partners?
- What performance floor is acceptable for consumer-facing agents?
- How do we prevent agent-driven inequality?
- Who bears responsibility for agent mistakes?
Key Takeaways
Project Deal revealed:
-
AI agents can handle complex negotiations: Claude successfully conducted 186 autonomous transactions in natural language
-
Performance gaps are significant: Opus outperformed Haiku by $2-3 per transaction—a meaningful economic advantage
-
Hidden inequality is real: Users couldn't detect when weaker agents put them at a disadvantage
-
Market demand exists: 46% of participants would pay for AI agent services
-
Safety concerns remain: Confabulation, manipulation risks, and lack of regulatory frameworks need addressing
-
Prompting isn't everything: Model capability matters more than aggressive prompting for complex tasks
-
Whimsical outcomes happen: Claude's purchase of 19 ping-pong balls shows emergent agent "personality"
Conclusion
Anthropic's Project Deal demonstrates that autonomous AI agents can successfully navigate real-world economic transactions—but with important caveats. The 70% price difference between Opus and Haiku selling the same broken bike highlights how model capability gaps create hidden inequalities that users can't detect.
As AI agents move from research experiments to real-world deployment in marketplaces, procurement systems, and negotiation platforms, addressing these fairness and transparency challenges becomes critical. The question isn't whether AI agents will handle our transactions—it's how we ensure they do so fairly, safely, and with appropriate oversight.
And somewhere in Anthropic's San Francisco office, 19 perfectly spherical orbs of possibility sit waiting—a whimsical reminder that even in serious AI safety research, there's room for unexpected moments that reveal something about these systems we're building.
Explore more AI research:
- Claude Opus 4.7 Models Guide — Latest Claude model capabilities
- What Are Agent Skills Complete Guide — Understanding AI agent capabilities
- Agent Skills Security Threat — Safety considerations for AI agents
Sources
- Project Deal: Our Claude-run marketplace experiment — Official Anthropic research page
- Claude AI Negotiation Experiment: Anthropic Runs 4 Parallel Markets — Analysis of parallel market design
- @AnthropicAI on X — Original announcement thread