Traditional search was designed for humans: you type a query, get back 10 blue links, and click through. This worked for decades. But in 2026, the primary consumers of search are no longer humansβthey're AI agents.
And AI agents don't just need answers. They need to orchestrate complex retrieval workflows with thousands of operations, custom logic, and dynamic strategies tailored to each task.
Perplexity has just introduced Search as Code (SaC), a fundamental rearchitecture of search for the agentic era. Instead of offering search as a monolithic service that returns fixed results, SaC exposes search primitives as an SDK that models program through generated code.
The results are dramatic: 2.5x performance improvement on complex benchmarks, 85% reduction in token usage, and the ability to execute search workflows that were previously impossible.
Let's dive into what makes SaC revolutionary and why it represents the future of agentic search.
The Problem with Traditional Search for Agents
How Search Worked Before SaC
For the past three years of AI development, search has followed a simple pattern:
- Model generates a query through function calling or MCP
- Search engine runs its predefined pipeline (retrieval, ranking, filtering)
- Model consumes the results as context for reasoning
Claude for Work
Use Claude as a thought partner for writing, research & decisions β no coding required. 2 live sessions with Yash Thakker.
Claude for Work is a 2-day live workshop on using Claude to supercharge your daily work β writing, research, analysis, and decision-making β without any coding required. Learn how to set up Claude Projects with custom instructions, run deep-research sprints, co-write documents that sound like you, and build repeatable prompt systems for your team. August 1β2, 2026. Hosted by Yash Thakker, founder of AISOLO Technologies, instructor to 350,000+ students.
Includes 1-year access to all session recordings, a personal prompt library, Discord community access, and a certificate of completion. No coding or technical background required. Designed for managers, marketers, founders, and writers.
This architecture has three fundamental limitations:
| Limitation | Impact on Agents |
|---|---|
| Coarse context | Pipeline optimized for recall, not precisionβintroduces irrelevant information |
| No domain knowledge leverage | Model can't apply its understanding to customize search strategy |
| Serial, inefficient control flow | Each search operation requires a model turn, adding latency and polluting context |
Real-World Failure Modes
Example 1: Vendor Advisory Research
A task requires finding 200+ high-severity CVEs from official vendor advisories (not aggregators like NVD or MITRE).
Traditional approach:
- Query "high severity CVE 2023-2025"
- Get mixed results (vendors, aggregators, news)
- Manually filter through noisy results
- Make dozens of serial queries
- Pollute context with 288K+ tokens
SaC approach:
- Generate code that defines vendor-specific query templates
- Fan out parallel queries across formats
- Verify CVE-to-version binding programmatically
- Filter in deterministic code, not token space
- Result: 100% accuracy, 85% fewer tokens (42.9K)
Example 2: Multi-Domain Research
An agent needs to:
- Search academic papers for methodology
- Search GitHub for implementations
- Search company blogs for real-world usage
- Cross-reference and synthesize findings
Traditional approach:
- Sequential searches through the same API
- Each search introduces full context to model
- No way to deduplicate or cross-reference efficiently
- Costs explode with each additional search
SaC approach:
- Parallel searches with domain-specific parameters
- Deduplication in code before model sees results
- Custom ranking and filtering logic
- Aggregation and synthesis at atomic level
Search as Code: The Architecture
SaC fundamentally reimagines how models interact with search infrastructure.
The Three-Layer Stack
ββββββββββββββββββββββββββββββββββββββ
β Models (GPT-5.5) β β Control plane: reasoning, planning, code generation
ββββββββββββββββββββββββββββββββββββββ€
β Compute Sandboxes β β Execution: deterministic compute, state management
ββββββββββββββββββββββββββββββββββββββ€
β Agentic Search SDK β β I/O layer: atomized search primitives
ββββββββββββββββββββββββββββββββββββββ
β
Perplexity Search Infrastructure
Layer 1: Models as Control Plane
Role: Decide what to search, how to search it, and generate code to execute the strategy
Key capabilities:
- Decompose tasks into retrieval requirements
- Design bespoke search pipelines
- Generate Python code that orchestrates primitives
- Optimize strategies based on intermediate results
Models used: GPT-5.5 (high reasoning), optimized with Agent Skills that teach SDK usage patterns
Layer 2: Compute Sandboxes
Role: Execute model-generated code in a secure, deterministic environment
Key features:
- Secure Python runtime with access to Agentic Search SDK
- Persistent filesystem for state management across turns
- Explicit serialization/deserialization to maintain context clarity
- Execution efficiency supporting thousands of operations per minute
Why not REPL-style state? Perplexity tested both approaches and found filesystem-based serde performs better on long trajectories. Requiring explicit state management helps models track what's preserved and whyβcrucial for complex workflows.
Layer 3: Agentic Search SDK
Role: Expose Perplexity's search stack as composable primitives
This is the revolutionary piece. Instead of wrapping an existing search API, Perplexity rearchitected their entire search stack into modular, atomic primitives.
SDK components:
| Primitive Category | Examples | What It Enables |
|---|---|---|
| Retrieval | search.web(), search.web_many() | Fetch candidates from index |
| Ranking | rank(), rerank_by() | Custom relevance scoring |
| Filtering | filter_by_domain(), filter_by_date() | Remove irrelevant results |
| Deduplication | dedupe_by_url(), dedupe_by_content() | Remove redundancy |
| Aggregation | group_by(), summarize() | Structured data extraction |
| Parsing | extract_fields(), parse_structured() | Schema-based extraction |
High-level shortcuts are also available (full end-to-end search), but models can bypass them when the task requires more control.
Continual improvement: The SDK itself is optimized through autoresearch loops that test changes against latency, codegen quality, and task performance over weeks.
How Search as Code Works: A Real Example
Let's walk through the CVE vendor advisory example mentioned in the original research article.
Task
Find 230+ high-severity CVEs from 2023-2025, citing only official vendor advisories. Each record must include:
- CVE ID
- Vendor and product name
- Fix version
- Evidence that the fix version is tied to the specific CVE
Traditional Approach Performance
- OpenAI Responses API: < 25% accuracy
- Anthropic Managed Agents: < 25% accuracy
- Exa Agent: < 25% accuracy
- Parallel Tasks: < 25% accuracy
SaC Approach: Generated Code
Part 1: Fan out over vendor advisory formats
# Define vendor-specific query templates
templates = [
("Mozilla",
'site:mozilla.org/en-US/security/advisories/mfsa{year} '
'"CVE-{year}-" "Fixed in" "Impact high"'),
("Jenkins",
'site:jenkins.io/security/advisory/{year} '
'"CVE-{year}" "Severity" "High" "Fix"'),
("Chrome",
'site:chromereleases.googleblog.com/{year} '
'"High CVE-{year}" "Stable channel has been updated"'),
("Android",
'source.android.com/docs/security/bulletin/{year}-{month:02d}-01 '
'"High" "CVE-{year}"'),
# ... more vendors
]
# Generate queries for each vendor-year combination
queries = [
{"vendor": vendor, "query": pattern.format(year=year, month=month)}
for year in [2023, 2024, 2025]
for vendor, pattern in templates
for month in ([1] if "{month" not in pattern else range(1, 13))
]
# Execute in parallel with concurrency control
seed_hits = sdk.search.web_many(queries, limit_per_query=8, concurrency=12)
# Filter to official vendor advisories only
pages = [
{"vendor": q["vendor"], "url": h.url, "text": join_result_fields(h)}
for q, hits in zip(queries, seed_hits)
for h in hits
if official_vendor_advisory(h.url, q["vendor"])
]
What happened:
- Model encoded domain knowledge (vendor advisory URL patterns) directly into the search strategy
- Parallel execution of targeted queries
- Filtering in code, not through noisy model context
- Preserved only official vendor sources
Part 2: Adaptive refinement with LLM subroutine
# Summarize coverage by vendor-year
coverage = summarize(pages, by=["vendor", "year", "url_kind"])
# Use LLM to suggest refinements for sparse areas
prompt = """
Goal: 230+ high or critical CVEs from official vendor advisories.
Avoid aggregators, CERTs, news, NVD, MITRE.
Current coverage:
{coverage}
Suggest site-scoped exact-phrase queries for sparse vendor-years.
Return JSON lines with vendor and query.
""".format(coverage=coverage)
raw = query_llm(prompt)
expanded_queries = [
row for row in parse_jsonl(raw)
if official_scope(row["query"]) and mentions_cve_year(row["query"])
]
# Execute expansion queries
expanded_hits = sdk.search.web_many(
unique(expanded_queries),
limit_per_query=8,
concurrency=12
)
What happened:
- Model as intermediate planning subroutine
- Adaptive backfilling based on coverage gaps
- Validation of generated queries before execution
- Maintains useful patterns without hardcoding
Part 3: Verification with custom logic
# Deduplicate and filter
all_hits = dedupe_by_url(flatten(seed_hits) + flatten(expanded_hits))
items = [
{"url": h.url, "vendor_hint": infer_vendor(h.url),
"text": join_result_fields(h)}
for h in all_hits
if official_vendor_advisory(h.url, infer_vendor(h.url))
]
# Extract and verify CVE-version binding
verified = sdk.llm.extract_many(
items,
instruction=(
"Keep only vendor advisories where the page ties a high or critical "
"CVE to a specific fixed version, build, patch, or security level."
),
schema={
"matches": bool,
"cve": str,
"vendor": str,
"product": str,
"fix_version": str,
"severity": str,
"source_url": str,
"evidence": str,
"version_bound_to_cve": bool,
"confidence": float,
},
)
# Final filtering and deduplication
records = [
to_cve_record(x) for x in verified
if x.matches and x.version_bound_to_cve
if high_or_critical(x.severity) and x.confidence > 0.75
]
records = dedupe_by(records, key="cve")
What happened:
- Custom verification logic defined entirely in code
- Structured extraction with validation
- Confidence-based filtering
- Final deduplication by CVE ID
Results
| Metric | Traditional Search | SaC |
|---|---|---|
| Accuracy | < 25% | 100% |
| Token usage | 288.7K | 42.9K (85% reduction) |
| Time to complete | Multiple hours | Minutes |
Why This Is a Paradigm Shift
From "Query β Response" to "Goal β Orchestration"
Old paradigm:
- Model asks a question
- Search returns an answer
- Model reasons over answer
New paradigm:
- Model defines a goal
- Model generates a retrieval program
- Program executes complex workflow
- Model reasons over final synthesized results
Code as Orchestrator and Gap-Filler
SaC doesn't just orchestrate existing primitives. Code can fill capability gaps on the fly.
Example: You need results matching a complex regex not supported by the query syntax.
Traditional approach: Try to approximate with query operators, get noisy results, filter in token space (expensive and error-prone).
SaC approach:
# Get superset with parallel queries
results = sdk.search.web_many(approximate_queries, concurrency=8)
# Deduplicate
unique_results = sdk.dedupe_by_url(flatten(results))
# Apply exact regex in code
import re
pattern = re.compile(complex_regex)
filtered = [r for r in unique_results if pattern.search(r.text)]
Result: Exact match without bloating SDK with niche functions.
Benchmark Performance
Perplexity evaluated SaC against four other agent systems across five benchmarks.
Overall Performance
| Benchmark | Perplexity SaC | OpenAI | Anthropic | Exa | Parallel |
|---|---|---|---|---|---|
| DSQA | 0.871 | 0.733 | 0.815 | 0.530 | 0.810 |
| BrowseComp | 0.805 | 0.720 | 0.598 | 0.380 | 0.560 |
| HLE | 0.612 | 0.614 | 0.566 | 0.387 | 0.515 |
| WideSearch | 0.651 | 0.522 | 0.590 | 0.471 | 0.584 |
| WANDR | 0.386 | 0.130 | 0.152 | 0.057 | 0.126 |
SaC leads or ties on 4 of 5 benchmarks.
WANDR: The Most Challenging Benchmark
WANDR tests complex "wide research" tasks requiring careful orchestration of search, compute, and reasoningβexactly what SaC was designed for.
Performance:
- Perplexity SaC: 0.386
- Next best (Anthropic): 0.152
- Advantage: 2.5x
Cost-Performance Frontier
SaC doesn't just win on performanceβit dominates the cost-performance tradeoff across reasoning levels:
| Reasoning Level | DSQA Score | Cost per Task | Position |
|---|---|---|---|
| SaC Low | 0.82 | $0.50 | Frontier (cheaper than all non-SaC, better than 2 of them) |
| SaC Medium | 0.85 | $0.85 | Frontier (best score under $1) |
| SaC High | 0.871 | $1.20 | Frontier (best absolute) |
Improvement Over Traditional Perplexity Baseline
SaC vs. traditional search pipeline (same infrastructure):
| Benchmark | Absolute Gain | Relative Gain |
|---|---|---|
| DSQA | +19.77 pp | +29% |
| BrowseComp | +15.30 pp | +23% |
| HLE | +8.50 pp | +16% |
| WideSearch | +9.20 pp | +17% |
| WANDR | +12.00 pp | +45% |
Technical Deep Dive: The Agentic Search SDK
Design Principles
1. Atomicity Break search into the smallest useful primitives. Don't expose "smart" functionsβexpose building blocks that can be composed into smart behaviors.
2. Composability Every primitive should work with every other primitive. No special cases or incompatible operations.
3. Efficiency Operations must be fast enough to support thousands of calls per minute. Latency directly impacts agent capability.
4. Consumability for LLMs API design optimized through autoresearch for codegen quality. Function names, parameter ordering, and documentation all tuned for model understanding.
Why Python?
Perplexity considered Python, Rust, TypeScript, and Bash.
Python won because:
- Ubiquitous in AI/ML ecosystems
- Natural fit for data processing (what search results become)
- Strong ecosystem for manipulation and analysis
- Frontier models excel at Python codegen
State Management: Filesystem vs. REPL
REPL approach:
- Variables persist across turns in-memory
- No serialization overhead
- More token-efficient
Filesystem + serde approach:
- Explicit persistence to disk
- Clear traceability of what's preserved
- Better performance on long trajectories
Perplexity's choice: Filesystem + serde
Testing showed that while both approaches perform similarly in normal use, filesystem-based serde provides better reliability on long trajectories. The requirement to explicitly serialize state helps models manage complexity betterβanalogous to the difference between a clean notebook vs. a 100-cell Jupyter notebook with cluttered namespace.
How Models Learn to Use the SDK
Agent Skills
The SDK is custom-built and unlikely to appear in pretraining data. Even with excellent documentation, models need guidance to compose primitives effectively.
Solution: Highly-tuned Agent Skills
These Skills:
- Teach effective SDK usage patterns
- Provide generalizable few-shot examples
- Show how to compose primitives into complex pipelines
- Are optimized through autoresearch loops
Size constraint: < 2000 tokens in root SKILL.md
- Prevents context bloat
- Forces distillation of only essential patterns
- Tokens spent on composition patterns, not API docs (available via reflection)
Autoresearch Optimization
Both the SDK and Agent Skills are optimized via continual autoresearch loops that:
- Propose SDK improvements (structure, naming, organization)
- Validate against metrics (latency, codegen quality, task performance)
- Deploy successful changes
- Iterate
This runs continuously over weeks, making hundreds of improvements.
Why Now? The Enabling Factors
1. Models Can Generate Production-Quality Code
GPT-5.5 and Claude Opus 4.5 reliably generate complex, multi-step programs with:
- Proper error handling
- Parallelism and asynchrony
- Control flow and state management
- Domain-specific logic
2. Code Execution is Fast and Secure
Modern sandboxes provide:
- Sub-second execution for complex programs
- Secure isolation
- Persistent state across turns
- Integration with external services (like search)
3. Search Infrastructure is Modular
Perplexity invested months rearchitecting their search stack into composable primitives. This wasn't just API designβit required rethinking every layer of the search pipeline.
4. Agent Harnesses Are Production-Ready
Tools like Perplexity Computer and Agent API provide:
- Reliable code generation
- Sandbox management
- State persistence
- Error recovery
Use Cases Enabled by SaC
1. Deep Research with Complex Requirements
Example: Competitive intelligence
- Search company announcements, press releases, SEC filings
- Search patent databases
- Search academic papers
- Search GitHub for open-source activity
- Cross-reference and identify patterns
- Result: Comprehensive competitive analysis with source verification
2. Compliance and Regulatory Research
Example: GDPR compliance audit
- Search for all data processing activities
- Identify legal bases for each
- Find retention policies
- Cross-reference with regulatory requirements
- Generate compliance gaps report
- Result: Structured compliance audit with evidence chain
3. Security Intelligence
Example: Vulnerability tracking
- Search CVE databases, vendor advisories, security mailing lists
- Filter by severity, affected products, fix availability
- Deduplicate across sources
- Track patch status
- Generate remediation priorities
- Result: Real-time security intelligence pipeline
4. Market Research at Scale
Example: Product-market fit analysis
- Search customer reviews across platforms
- Search social media mentions
- Search competitor positioning
- Extract sentiment, features, pain points
- Aggregate and cluster
- Result: Comprehensive market intelligence
Limitations and Future Work
Current Limitations
| Limitation | Impact | Mitigation |
|---|---|---|
| Model capability ceiling | Complex tasks require frontier models | Use GPT-5.5 or Claude Opus 4.5 |
| Sandbox overhead | Code execution adds ~200-500ms latency | Minimize turns, maximize work per turn |
| SDK learning curve | New models need Agent Skills | Continual autoresearch optimization |
| Cost at scale | Thousands of searches can be expensive | Aggressive deduplication and caching |
Future Research Directions
1. Joint optimization of SDK + Skills Currently optimized separately. Joint autoresearch loop could find better local minima.
2. Model training on SDK usage Train models specifically on SaC patterns to improve codegen quality and reduce reliance on Agent Skills.
3. SDK co-evolution during training Design the SDK itself during model training to maximize synergy.
4. Multi-agent SaC Enable multiple agents to share a SaC pipeline, coordinating searches and sharing state.
5. Physical world integration Extend SaC beyond information retrieval to physical actions (e.g., IoT, robotics).
How to Access SaC
Perplexity Computer
SaC is rolling out in Perplexity Computer, the consumer-facing autonomous AI agent product.
Access: perplexity.ai/computer
Agent API
SaC is available in Perplexity's Agent API for developers building agentic applications.
Documentation: docs.perplexity.ai/api/agents
Key features:
- Full SaC capabilities
- Custom Agent Skills
- Sandbox management
- State persistence
- Usage metering
What This Means for the Industry
The End of Monolithic Search
Traditional searchβone query, one responseβis fundamentally mismatched to agentic workflows.
SaC proves that programmable, atomic search primitives are the future:
- Agents can orchestrate thousands of operations
- Custom logic in deterministic code
- No context pollution
- Cost-efficient at scale
Code as the Universal Interface
Function calling and MCPs were transitional technologies. Generated code is the endgame interface for:
- Complex control flow
- State management
- Custom logic
- Efficient execution
A New Competitive Dimension
Search companies now compete on:
- Primitive quality: How atomic and composable are the building blocks?
- SDK design: How easily can models learn to use it?
- Execution efficiency: How many operations per minute?
- Context efficiency: How much can be handled in code vs. tokens?
Comparison: SaC vs. Alternatives
| Dimension | Perplexity SaC | Traditional Search APIs | RAG Pipelines |
|---|---|---|---|
| Control | Full programmatic control | Query parameters only | Fixed pipeline |
| Parallelism | Thousands of concurrent ops | Serial function calls | Single vector search |
| Custom logic | Arbitrary code | None | Limited via configuration |
| Context efficiency | Work in code, not tokens | All results in context | All chunks in context |
| Adaptability | Task-specific pipelines | One-size-fits-all | One-size-fits-all |
| Performance | 2.5x advantage (WANDR) | Baseline | Not designed for search |
Bottom Line
Search as Code represents a fundamental paradigm shift in how AI systems interact with information retrieval:
Before SaC:
- Models consume search
- Fixed pipelines
- Serial operations
- Context pollution
With SaC:
- Models program search
- Custom pipelines per task
- Thousands of parallel operations
- Deterministic computation
The performance gains speak for themselves: 2.5x better on complex benchmarks, 85% token reduction, 100% accuracy on tasks where alternatives fail.
More importantly, SaC enables entirely new classes of agent capabilities that were impossible with traditional search architectures.
As AI agents become the primary consumers of search, every search provider will need to adopt programmable architectures like SaC or risk irrelevance.
Perplexity has shown the path forward. The agentic era demands agentic search.
Related Posts
- The Agentic Era: How AI Agents Will Transform Everything (2026-2030)
- pplx-garden: Perplexity's open-source inference technology stack
- Anthropic Claude Managed Agents: Dreaming and Multiagent Orchestration
- What is MCP? Model Context Protocol Complete Guide
- Google Search at I/O 2026: AI Agents and Agentic Coding
Analysis based on Perplexity's research article "Rethinking Search as Code Generation" published June 2026.