What is Search as Code (SaC)?

Search as Code is Perplexity's new search architecture that exposes search primitives through an SDK, allowing AI models to generate Python code that orchestrates custom retrieval pipelines for each task. Unlike traditional search that returns fixed results, SaC lets agents control retrieval, ranking, filtering, and aggregation.

How does SaC differ from traditional search APIs?

Traditional search forces models to call predefined pipelines serially through function calls or MCPs. SaC exposes atomic search primitives that models compose via generated code in sandboxes, enabling thousands of parallel operations, custom logic, and access to intermediate state without polluting model context.

What performance improvements does SaC deliver?

On the WANDR benchmark, SaC achieves 2.5x better performance than the next-best system. It also reduces token usage by up to 85% (from 288.7K to 42.9K tokens) while maintaining 100% accuracy on complex search tasks like CVE vendor advisory identification. WANDR was fully open-sourced July 14, 2026 with 500 tasks and 170,495 required records — see [WANDR release guide](/blog/perplexity-wandr-benchmark-open-source-research-agents-july-2026).

Is Search as Code available now?

Yes, SaC is rolling out in Perplexity Computer and Agent API as of June 2026. It uses frontier models like GPT-5.5 and optimized Agent Skills to teach models how to effectively use the Agentic Search SDK.

Why did Perplexity build Search as Code?

Traditional monolithic search pipelines can't handle the complexity of modern agentic workflows that require thousands of retrieval operations with task-specific strategies. SaC gives agents fine-grained control over search to match the sophistication of tasks they're completing.

Perplexity's Search as Code: Rethinking Search for the | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

Perplexity's Search as Code: Rethinking Search for the | explainx.ai Blog | explainx.ai

Traditional search was designed for humans: you type a query, get back 10 blue links, and click through. This worked for decades. But in 2026, the primary consumers of search are no longer humans—they're AI agents.

And AI agents don't just need answers. They need to orchestrate complex retrieval workflows with thousands of operations, custom logic, and dynamic strategies tailored to each task.

Perplexity has just introduced Search as Code (SaC), a fundamental rearchitecture of search for the agentic era. Instead of offering search as a monolithic service that returns fixed results, SaC exposes search primitives as an SDK that models program through generated code.

The results are dramatic: 2.5x performance improvement on complex benchmarks, 85% reduction in token usage, and the ability to execute search workflows that were previously impossible.

Let's dive into what makes SaC revolutionary and why it represents the future of agentic search.

The Problem with Traditional Search for Agents

How Search Worked Before SaC

For the past three years of AI development, search has followed a simple pattern:

Model generates a query through function calling or MCP
Search engine runs its predefined pipeline (retrieval, ranking, filtering)
Model consumes the results as context for reasoning

This architecture has three fundamental limitations:

Limitation	Impact on Agents
Coarse context	Pipeline optimized for recall, not precision—introduces irrelevant information
No domain knowledge leverage	Model can't apply its understanding to customize search strategy
Serial, inefficient control flow	Each search operation requires a model turn, adding latency and polluting context

Real-World Failure Modes

Example 1: Vendor Advisory Research

A task requires finding 200+ high-severity CVEs from official vendor advisories (not aggregators like NVD or MITRE).

Traditional approach:

Query "high severity CVE 2023-2025"
Get mixed results (vendors, aggregators, news)
Manually filter through noisy results
Make dozens of serial queries
Pollute context with 288K+ tokens

SaC approach:

Generate code that defines vendor-specific query templates
Fan out parallel queries across formats
Verify CVE-to-version binding programmatically
Filter in deterministic code, not token space
Result: 100% accuracy, 85% fewer tokens (42.9K)

Example 2: Multi-Domain Research

An agent needs to:

Search academic papers for methodology
Search GitHub for implementations
Search company blogs for real-world usage
Cross-reference and synthesize findings

Traditional approach:

Sequential searches through the same API
Each search introduces full context to model
No way to deduplicate or cross-reference efficiently
Costs explode with each additional search

SaC approach:

Parallel searches with domain-specific parameters
Deduplication in code before model sees results
Custom ranking and filtering logic
Aggregation and synthesis at atomic level

Search as Code: The Architecture

SaC fundamentally reimagines how models interact with search infrastructure.

The Three-Layer Stack

snippet

┌────────────────────────────────────┐
│         Models (GPT-5.5)           │  ← Control plane: reasoning, planning, code generation
├────────────────────────────────────┤
│      Compute Sandboxes             │  ← Execution: deterministic compute, state management
├────────────────────────────────────┤
│    Agentic Search SDK              │  ← I/O layer: atomized search primitives
└────────────────────────────────────┘
             ↓
      Perplexity Search Infrastructure

Layer 1: Models as Control Plane

Role: Decide what to search, how to search it, and generate code to execute the strategy

Key capabilities:

Decompose tasks into retrieval requirements
Design bespoke search pipelines
Generate Python code that orchestrates primitives
Optimize strategies based on intermediate results

Models used: GPT-5.5 (high reasoning), optimized with Agent Skills that teach SDK usage patterns

Layer 2: Compute Sandboxes

Role: Execute model-generated code in a secure, deterministic environment

Key features:

Secure Python runtime with access to Agentic Search SDK
Persistent filesystem for state management across turns
Explicit serialization/deserialization to maintain context clarity
Execution efficiency supporting thousands of operations per minute

Why not REPL-style state? Perplexity tested both approaches and found filesystem-based serde performs better on long trajectories. Requiring explicit state management helps models track what's preserved and why—crucial for complex workflows.

Layer 3: Agentic Search SDK

Role: Expose Perplexity's search stack as composable primitives

This is the revolutionary piece. Instead of wrapping an existing search API, Perplexity rearchitected their entire search stack into modular, atomic primitives.

SDK components:

Primitive Category	Examples	What It Enables
Retrieval	`search.web()`, `search.web_many()`	Fetch candidates from index
Ranking	`rank()`, `rerank_by()`	Custom relevance scoring
Filtering	`filter_by_domain()`, `filter_by_date()`	Remove irrelevant results
Deduplication	`dedupe_by_url()`, `dedupe_by_content()`	Remove redundancy
Aggregation	`group_by()`, `summarize()`	Structured data extraction
Parsing	`extract_fields()`, `parse_structured()`	Schema-based extraction

High-level shortcuts are also available (full end-to-end search), but models can bypass them when the task requires more control.

Continual improvement: The SDK itself is optimized through autoresearch loops that test changes against latency, codegen quality, and task performance over weeks.

How Search as Code Works: A Real Example

Let's walk through the CVE vendor advisory example mentioned in the original research article.

Task

Find 230+ high-severity CVEs from 2023-2025, citing only official vendor advisories. Each record must include:

CVE ID
Vendor and product name
Fix version
Evidence that the fix version is tied to the specific CVE

Traditional Approach Performance

OpenAI Responses API: < 25% accuracy
Anthropic Managed Agents: < 25% accuracy
Exa Agent: < 25% accuracy
Parallel Tasks: < 25% accuracy

SaC Approach: Generated Code

Part 1: Fan out over vendor advisory formats

python

# Define vendor-specific query templates
templates = [
    ("Mozilla",
     'site:mozilla.org/en-US/security/advisories/mfsa{year} '
     '"CVE-{year}-" "Fixed in" "Impact high"'),
    ("Jenkins",
     'site:jenkins.io/security/advisory/{year} '
     '"CVE-{year}" "Severity" "High" "Fix"'),
    ("Chrome",
     'site:chromereleases.googleblog.com/{year} '
     '"High CVE-{year}" "Stable channel has been updated"'),
    ("Android",
     'source.android.com/docs/security/bulletin/{year}-{month:02d}-01 '
     '"High" "CVE-{year}"'),
    # ... more vendors
]

# Generate queries for each vendor-year combination
queries = [
    {"vendor": vendor, "query": pattern.format(year=year, month=month)}
    for year in [2023, 2024, 2025]
    for vendor, pattern in templates
    for month in ([1] if "{month" not in pattern else range(1, 13))
]

# Execute in parallel with concurrency control
seed_hits = sdk.search.web_many(queries, limit_per_query=8, concurrency=12)

# Filter to official vendor advisories only
pages = [
    {"vendor": q["vendor"], "url": h.url, "text": join_result_fields(h)}
    for q, hits in zip(queries, seed_hits)
    for h in hits
    if official_vendor_advisory(h.url, q[])
]

What happened:

Model encoded domain knowledge (vendor advisory URL patterns) directly into the search strategy
Parallel execution of targeted queries
Filtering in code, not through noisy model context
Preserved only official vendor sources

Part 2: Adaptive refinement with LLM subroutine

python

# Summarize coverage by vendor-year
coverage = summarize(pages, by=["vendor", "year", "url_kind"])

# Use LLM to suggest refinements for sparse areas
prompt = """
Goal: 230+ high or critical CVEs from official vendor advisories.
Avoid aggregators, CERTs, news, NVD, MITRE.
Current coverage:
{coverage}
Suggest site-scoped exact-phrase queries for sparse vendor-years.
Return JSON lines with vendor and query.
""".format(coverage=coverage)

raw = query_llm(prompt)
expanded_queries = [
    row for row in parse_jsonl(raw)
    if official_scope(row["query"]) and mentions_cve_year(row["query"])
]

# Execute expansion queries
expanded_hits = sdk.search.web_many(
    unique(expanded_queries),
    limit_per_query=8,
    concurrency=12
)

What happened:

Model as intermediate planning subroutine
Adaptive backfilling based on coverage gaps
Validation of generated queries before execution
Maintains useful patterns without hardcoding

Part 3: Verification with custom logic

python

# Deduplicate and filter
all_hits = dedupe_by_url(flatten(seed_hits) + flatten(expanded_hits))

items = [
    {"url": h.url, "vendor_hint": infer_vendor(h.url),
     "text": join_result_fields(h)}
    for h in all_hits
    if official_vendor_advisory(h.url, infer_vendor(h.url))
]

# Extract and verify CVE-version binding
verified = sdk.llm.extract_many(
    items,
    instruction=(
        "Keep only vendor advisories where the page ties a high or critical "
        "CVE to a specific fixed version, build, patch, or security level."
    ),
    schema={
        "matches": bool,
        "cve": str,
        "vendor": str,
        "product": str,
        "fix_version": str,
        "severity": str,
        "source_url": str,
        "evidence": str,
        "version_bound_to_cve": bool,
        "confidence": float,
    },
)

# Final filtering and deduplication
records = [
    to_cve_record(x) for x in verified
    if x.matches and x.version_bound_to_cve
    if high_or_critical(x.severity) and x.confidence > 0.75
]
records = dedupe_by(records, key="cve")

What happened:

Custom verification logic defined entirely in code
Structured extraction with validation
Confidence-based filtering
Final deduplication by CVE ID

Results

Metric	Traditional Search	SaC
Accuracy	< 25%	100%
Token usage	288.7K	42.9K (85% reduction)
Time to complete	Multiple hours	Minutes

Why This Is a Paradigm Shift

From "Query → Response" to "Goal → Orchestration"

Old paradigm:

Model asks a question
Search returns an answer
Model reasons over answer

New paradigm:

Model defines a goal
Model generates a retrieval program
Program executes complex workflow
Model reasons over final synthesized results

Code as Orchestrator and Gap-Filler

SaC doesn't just orchestrate existing primitives. Code can fill capability gaps on the fly.

Example: You need results matching a complex regex not supported by the query syntax.

Traditional approach: Try to approximate with query operators, get noisy results, filter in token space (expensive and error-prone).

SaC approach:

python

# Get superset with parallel queries
results = sdk.search.web_many(approximate_queries, concurrency=8)

# Deduplicate
unique_results = sdk.dedupe_by_url(flatten(results))

# Apply exact regex in code
import re
pattern = re.compile(complex_regex)
filtered = [r for r in unique_results if pattern.search(r.text)]

Result: Exact match without bloating SDK with niche functions.

Benchmark Performance

Perplexity evaluated SaC against four other agent systems across five benchmarks.

Overall Performance

Benchmark	Perplexity SaC	OpenAI	Anthropic	Exa	Parallel
DSQA	0.871	0.733	0.815	0.530	0.810
BrowseComp	0.805	0.720	0.598	0.380	0.560
HLE	0.612	0.614	0.566	0.387	0.515
WideSearch	0.651	0.522	0.590	0.471	0.584
WANDR	0.386	0.130	0.152	0.057	0.126

SaC leads or ties on 4 of 5 benchmarks.

WANDR: The Most Challenging Benchmark

WANDR tests complex "wide research" tasks requiring careful orchestration of search, compute, and reasoning—exactly what SaC was designed for.

Performance:

Perplexity SaC: 0.386
Next best (Anthropic): 0.152
Advantage: 2.5x

Cost-Performance Frontier

SaC doesn't just win on performance—it dominates the cost-performance tradeoff across reasoning levels:

Reasoning Level	DSQA Score	Cost per Task	Position
SaC Low	0.82	$0.50	Frontier (cheaper than all non-SaC, better than 2 of them)
SaC Medium	0.85	$0.85	Frontier (best score under $1)
SaC High	0.871	$1.20	Frontier (best absolute)

Improvement Over Traditional Perplexity Baseline

SaC vs. traditional search pipeline (same infrastructure):

Benchmark	Absolute Gain	Relative Gain
DSQA	+19.77 pp	+29%
BrowseComp	+15.30 pp	+23%
HLE	+8.50 pp	+16%
WideSearch	+9.20 pp	+17%
WANDR	+12.00 pp	+45%

Technical Deep Dive: The Agentic Search SDK

Design Principles

1. Atomicity Break search into the smallest useful primitives. Don't expose "smart" functions—expose building blocks that can be composed into smart behaviors.

2. Composability Every primitive should work with every other primitive. No special cases or incompatible operations.

3. Efficiency Operations must be fast enough to support thousands of calls per minute. Latency directly impacts agent capability.

4. Consumability for LLMs API design optimized through autoresearch for codegen quality. Function names, parameter ordering, and documentation all tuned for model understanding.

Why Python?

Perplexity considered Python, Rust, TypeScript, and Bash.

Python won because:

Ubiquitous in AI/ML ecosystems
Natural fit for data processing (what search results become)
Strong ecosystem for manipulation and analysis
Frontier models excel at Python codegen

State Management: Filesystem vs. REPL

REPL approach:

Variables persist across turns in-memory
No serialization overhead
More token-efficient

Filesystem + serde approach:

Explicit persistence to disk
Clear traceability of what's preserved
Better performance on long trajectories

Perplexity's choice: Filesystem + serde

Testing showed that while both approaches perform similarly in normal use, filesystem-based serde provides better reliability on long trajectories. The requirement to explicitly serialize state helps models manage complexity better—analogous to the difference between a clean notebook vs. a 100-cell Jupyter notebook with cluttered namespace.

How Models Learn to Use the SDK

Agent Skills

The SDK is custom-built and unlikely to appear in pretraining data. Even with excellent documentation, models need guidance to compose primitives effectively.

Solution: Highly-tuned Agent Skills

These Skills:

Teach effective SDK usage patterns
Provide generalizable few-shot examples
Show how to compose primitives into complex pipelines
Are optimized through autoresearch loops

Size constraint: < 2000 tokens in root SKILL.md

Prevents context bloat
Forces distillation of only essential patterns
Tokens spent on composition patterns, not API docs (available via reflection)

Autoresearch Optimization

Both the SDK and Agent Skills are optimized via continual autoresearch loops that:

Propose SDK improvements (structure, naming, organization)
Validate against metrics (latency, codegen quality, task performance)
Deploy successful changes
Iterate

This runs continuously over weeks, making hundreds of improvements.

Why Now? The Enabling Factors

1. Models Can Generate Production-Quality Code

GPT-5.5 and Claude Opus 4.5 reliably generate complex, multi-step programs with:

Proper error handling
Parallelism and asynchrony
Control flow and state management
Domain-specific logic

2. Code Execution is Fast and Secure

Modern sandboxes provide:

Sub-second execution for complex programs
Secure isolation
Persistent state across turns
Integration with external services (like search)

3. Search Infrastructure is Modular

Perplexity invested months rearchitecting their search stack into composable primitives. This wasn't just API design—it required rethinking every layer of the search pipeline.

4. Agent Harnesses Are Production-Ready

Tools like Perplexity Computer and Agent API provide:

Reliable code generation
Sandbox management
State persistence
Error recovery

Use Cases Enabled by SaC

1. Deep Research with Complex Requirements

Example: Competitive intelligence

Search company announcements, press releases, SEC filings
Search patent databases
Search academic papers
Search GitHub for open-source activity
Cross-reference and identify patterns
Result: Comprehensive competitive analysis with source verification

2. Compliance and Regulatory Research

Example: GDPR compliance audit

Search for all data processing activities
Identify legal bases for each
Find retention policies
Cross-reference with regulatory requirements
Generate compliance gaps report
Result: Structured compliance audit with evidence chain

3. Security Intelligence

Example: Vulnerability tracking

Search CVE databases, vendor advisories, security mailing lists
Filter by severity, affected products, fix availability
Deduplicate across sources
Track patch status
Generate remediation priorities
Result: Real-time security intelligence pipeline

4. Market Research at Scale

Example: Product-market fit analysis

Search customer reviews across platforms
Search social media mentions
Search competitor positioning
Extract sentiment, features, pain points
Aggregate and cluster
Result: Comprehensive market intelligence

Limitations and Future Work

Current Limitations

Limitation	Impact	Mitigation
Model capability ceiling	Complex tasks require frontier models	Use GPT-5.5 or Claude Opus 4.5
Sandbox overhead	Code execution adds ~200-500ms latency	Minimize turns, maximize work per turn
SDK learning curve	New models need Agent Skills	Continual autoresearch optimization
Cost at scale	Thousands of searches can be expensive	Aggressive deduplication and caching

Future Research Directions

1. Joint optimization of SDK + Skills Currently optimized separately. Joint autoresearch loop could find better local minima.

2. Model training on SDK usage Train models specifically on SaC patterns to improve codegen quality and reduce reliance on Agent Skills.

3. SDK co-evolution during training Design the SDK itself during model training to maximize synergy.

4. Multi-agent SaC Enable multiple agents to share a SaC pipeline, coordinating searches and sharing state.

5. Physical world integration Extend SaC beyond information retrieval to physical actions (e.g., IoT, robotics).

How to Access SaC

Perplexity Computer

SaC is rolling out in Perplexity Computer, the consumer-facing autonomous AI agent product.

Access: perplexity.ai/computer

Agent API

SaC is available in Perplexity's Agent API for developers building agentic applications.

Documentation: docs.perplexity.ai/api/agents

Key features:

Full SaC capabilities
Custom Agent Skills
Sandbox management
State persistence
Usage metering

What This Means for the Industry

The End of Monolithic Search

Traditional search—one query, one response—is fundamentally mismatched to agentic workflows.

SaC proves that programmable, atomic search primitives are the future:

Agents can orchestrate thousands of operations
Custom logic in deterministic code
No context pollution
Cost-efficient at scale

Code as the Universal Interface

Function calling and MCPs were transitional technologies. Generated code is the endgame interface for:

Complex control flow
State management
Custom logic
Efficient execution

A New Competitive Dimension

Search companies now compete on:

Primitive quality: How atomic and composable are the building blocks?
SDK design: How easily can models learn to use it?
Execution efficiency: How many operations per minute?
Context efficiency: How much can be handled in code vs. tokens?

Comparison: SaC vs. Alternatives

Dimension	Perplexity SaC	Traditional Search APIs	RAG Pipelines
Control	Full programmatic control	Query parameters only	Fixed pipeline
Parallelism	Thousands of concurrent ops	Serial function calls	Single vector search
Custom logic	Arbitrary code	None	Limited via configuration
Context efficiency	Work in code, not tokens	All results in context	All chunks in context
Adaptability	Task-specific pipelines	One-size-fits-all	One-size-fits-all
Performance	2.5x advantage (WANDR)	Baseline	Not designed for search

Bottom Line

Search as Code represents a fundamental paradigm shift in how AI systems interact with information retrieval:

Before SaC:

Models consume search
Fixed pipelines
Serial operations
Context pollution

With SaC:

Models program search
Custom pipelines per task
Thousands of parallel operations
Deterministic computation

The performance gains speak for themselves: 2.5x better on complex benchmarks, 85% token reduction, 100% accuracy on tasks where alternatives fail.

More importantly, SaC enables entirely new classes of agent capabilities that were impossible with traditional search architectures.

As AI agents become the primary consumers of search, every search provider will need to adopt programmable architectures like SaC or risk irrelevance.

Perplexity has shown the path forward. The agentic era demands agentic search.

Analysis based on Perplexity's research article "Rethinking Search as Code Generation" published June 2026.

Related posts

Perplexity Computer GLM 5.2 Orchestrator: 0.34× Opus Cost, Advisor Escalation

Types of AI Agents: Complete Taxonomy and When to Use Each (2026)

ReAct Prompting: The Reasoning + Acting Pattern Behind Modern AI Agents

The Problem with Traditional Search for Agents

How Search Worked Before SaC

Real-World Failure Modes

Search as Code: The Architecture

The Three-Layer Stack

Layer 1: Models as Control Plane

Layer 2: Compute Sandboxes

Layer 3: Agentic Search SDK

How Search as Code Works: A Real Example

Task

Traditional Approach Performance

SaC Approach: Generated Code

Results

Why This Is a Paradigm Shift

From "Query → Response" to "Goal → Orchestration"

Code as Orchestrator and Gap-Filler

Benchmark Performance

Overall Performance

WANDR: The Most Challenging Benchmark

Cost-Performance Frontier

Improvement Over Traditional Perplexity Baseline

Technical Deep Dive: The Agentic Search SDK

Design Principles

Why Python?

State Management: Filesystem vs. REPL

How Models Learn to Use the SDK

Agent Skills

Autoresearch Optimization

Why Now? The Enabling Factors

1. Models Can Generate Production-Quality Code

2. Code Execution is Fast and Secure

3. Search Infrastructure is Modular

4. Agent Harnesses Are Production-Ready

Use Cases Enabled by SaC

1. Deep Research with Complex Requirements

2. Compliance and Regulatory Research

3. Security Intelligence

4. Market Research at Scale

Limitations and Future Work

Current Limitations

Future Research Directions

How to Access SaC

Perplexity Computer

Agent API

What This Means for the Industry

The End of Monolithic Search

Code as the Universal Interface

A New Competitive Dimension

Comparison: SaC vs. Alternatives

Bottom Line

Related Posts