← Back to blog

explainx / blog

Perplexity's Search as Code: Rethinking Search for the Agentic Era

Perplexity introduces Search as Code (SaC), a revolutionary architecture that makes search natively programmable by AI agents through code generation. Explore how SaC achieves 2.5x better performance than alternatives.

Β·14 min readΒ·Yash Thakker
PerplexityAI AgentsSearchAgentic AICode GenerationInformation Retrieval
Perplexity's Search as Code: Rethinking Search for the Agentic Era

Traditional search was designed for humans: you type a query, get back 10 blue links, and click through. This worked for decades. But in 2026, the primary consumers of search are no longer humansβ€”they're AI agents.

And AI agents don't just need answers. They need to orchestrate complex retrieval workflows with thousands of operations, custom logic, and dynamic strategies tailored to each task.

Perplexity has just introduced Search as Code (SaC), a fundamental rearchitecture of search for the agentic era. Instead of offering search as a monolithic service that returns fixed results, SaC exposes search primitives as an SDK that models program through generated code.

The results are dramatic: 2.5x performance improvement on complex benchmarks, 85% reduction in token usage, and the ability to execute search workflows that were previously impossible.

Let's dive into what makes SaC revolutionary and why it represents the future of agentic search.


The Problem with Traditional Search for Agents

How Search Worked Before SaC

For the past three years of AI development, search has followed a simple pattern:

  1. Model generates a query through function calling or MCP
  2. Search engine runs its predefined pipeline (retrieval, ranking, filtering)
  3. Model consumes the results as context for reasoning
Live WorkshopAug 1–2, 2026 Β· 2 days

Claude for Work

Use Claude as a thought partner for writing, research & decisions β€” no coding required. 2 live sessions with Yash Thakker.

Register now

Claude for Work is a 2-day live workshop on using Claude to supercharge your daily work β€” writing, research, analysis, and decision-making β€” without any coding required. Learn how to set up Claude Projects with custom instructions, run deep-research sprints, co-write documents that sound like you, and build repeatable prompt systems for your team. August 1–2, 2026. Hosted by Yash Thakker, founder of AISOLO Technologies, instructor to 350,000+ students.

Includes 1-year access to all session recordings, a personal prompt library, Discord community access, and a certificate of completion. No coding or technical background required. Designed for managers, marketers, founders, and writers.

This architecture has three fundamental limitations:

LimitationImpact on Agents
Coarse contextPipeline optimized for recall, not precisionβ€”introduces irrelevant information
No domain knowledge leverageModel can't apply its understanding to customize search strategy
Serial, inefficient control flowEach search operation requires a model turn, adding latency and polluting context

Real-World Failure Modes

Example 1: Vendor Advisory Research

A task requires finding 200+ high-severity CVEs from official vendor advisories (not aggregators like NVD or MITRE).

Traditional approach:

  • Query "high severity CVE 2023-2025"
  • Get mixed results (vendors, aggregators, news)
  • Manually filter through noisy results
  • Make dozens of serial queries
  • Pollute context with 288K+ tokens

SaC approach:

  • Generate code that defines vendor-specific query templates
  • Fan out parallel queries across formats
  • Verify CVE-to-version binding programmatically
  • Filter in deterministic code, not token space
  • Result: 100% accuracy, 85% fewer tokens (42.9K)

Example 2: Multi-Domain Research

An agent needs to:

  1. Search academic papers for methodology
  2. Search GitHub for implementations
  3. Search company blogs for real-world usage
  4. Cross-reference and synthesize findings

Traditional approach:

  • Sequential searches through the same API
  • Each search introduces full context to model
  • No way to deduplicate or cross-reference efficiently
  • Costs explode with each additional search

SaC approach:

  • Parallel searches with domain-specific parameters
  • Deduplication in code before model sees results
  • Custom ranking and filtering logic
  • Aggregation and synthesis at atomic level

Search as Code: The Architecture

SaC fundamentally reimagines how models interact with search infrastructure.

The Three-Layer Stack

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Models (GPT-5.5)           β”‚  ← Control plane: reasoning, planning, code generation
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚      Compute Sandboxes             β”‚  ← Execution: deterministic compute, state management
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚    Agentic Search SDK              β”‚  ← I/O layer: atomized search primitives
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             ↓
      Perplexity Search Infrastructure

Layer 1: Models as Control Plane

Role: Decide what to search, how to search it, and generate code to execute the strategy

Key capabilities:

  • Decompose tasks into retrieval requirements
  • Design bespoke search pipelines
  • Generate Python code that orchestrates primitives
  • Optimize strategies based on intermediate results

Models used: GPT-5.5 (high reasoning), optimized with Agent Skills that teach SDK usage patterns

Layer 2: Compute Sandboxes

Role: Execute model-generated code in a secure, deterministic environment

Key features:

  • Secure Python runtime with access to Agentic Search SDK
  • Persistent filesystem for state management across turns
  • Explicit serialization/deserialization to maintain context clarity
  • Execution efficiency supporting thousands of operations per minute

Why not REPL-style state? Perplexity tested both approaches and found filesystem-based serde performs better on long trajectories. Requiring explicit state management helps models track what's preserved and whyβ€”crucial for complex workflows.

Layer 3: Agentic Search SDK

Role: Expose Perplexity's search stack as composable primitives

This is the revolutionary piece. Instead of wrapping an existing search API, Perplexity rearchitected their entire search stack into modular, atomic primitives.

SDK components:

Primitive CategoryExamplesWhat It Enables
Retrievalsearch.web(), search.web_many()Fetch candidates from index
Rankingrank(), rerank_by()Custom relevance scoring
Filteringfilter_by_domain(), filter_by_date()Remove irrelevant results
Deduplicationdedupe_by_url(), dedupe_by_content()Remove redundancy
Aggregationgroup_by(), summarize()Structured data extraction
Parsingextract_fields(), parse_structured()Schema-based extraction

High-level shortcuts are also available (full end-to-end search), but models can bypass them when the task requires more control.

Continual improvement: The SDK itself is optimized through autoresearch loops that test changes against latency, codegen quality, and task performance over weeks.


How Search as Code Works: A Real Example

Let's walk through the CVE vendor advisory example mentioned in the original research article.

Task

Find 230+ high-severity CVEs from 2023-2025, citing only official vendor advisories. Each record must include:

  • CVE ID
  • Vendor and product name
  • Fix version
  • Evidence that the fix version is tied to the specific CVE

Traditional Approach Performance

  • OpenAI Responses API: < 25% accuracy
  • Anthropic Managed Agents: < 25% accuracy
  • Exa Agent: < 25% accuracy
  • Parallel Tasks: < 25% accuracy

SaC Approach: Generated Code

Part 1: Fan out over vendor advisory formats

# Define vendor-specific query templates
templates = [
    ("Mozilla",
     'site:mozilla.org/en-US/security/advisories/mfsa{year} '
     '"CVE-{year}-" "Fixed in" "Impact high"'),
    ("Jenkins",
     'site:jenkins.io/security/advisory/{year} '
     '"CVE-{year}" "Severity" "High" "Fix"'),
    ("Chrome",
     'site:chromereleases.googleblog.com/{year} '
     '"High CVE-{year}" "Stable channel has been updated"'),
    ("Android",
     'source.android.com/docs/security/bulletin/{year}-{month:02d}-01 '
     '"High" "CVE-{year}"'),
    # ... more vendors
]

# Generate queries for each vendor-year combination
queries = [
    {"vendor": vendor, "query": pattern.format(year=year, month=month)}
    for year in [2023, 2024, 2025]
    for vendor, pattern in templates
    for month in ([1] if "{month" not in pattern else range(1, 13))
]

# Execute in parallel with concurrency control
seed_hits = sdk.search.web_many(queries, limit_per_query=8, concurrency=12)

# Filter to official vendor advisories only
pages = [
    {"vendor": q["vendor"], "url": h.url, "text": join_result_fields(h)}
    for q, hits in zip(queries, seed_hits)
    for h in hits
    if official_vendor_advisory(h.url, q["vendor"])
]

What happened:

  • Model encoded domain knowledge (vendor advisory URL patterns) directly into the search strategy
  • Parallel execution of targeted queries
  • Filtering in code, not through noisy model context
  • Preserved only official vendor sources

Part 2: Adaptive refinement with LLM subroutine

# Summarize coverage by vendor-year
coverage = summarize(pages, by=["vendor", "year", "url_kind"])

# Use LLM to suggest refinements for sparse areas
prompt = """
Goal: 230+ high or critical CVEs from official vendor advisories.
Avoid aggregators, CERTs, news, NVD, MITRE.
Current coverage:
{coverage}
Suggest site-scoped exact-phrase queries for sparse vendor-years.
Return JSON lines with vendor and query.
""".format(coverage=coverage)

raw = query_llm(prompt)
expanded_queries = [
    row for row in parse_jsonl(raw)
    if official_scope(row["query"]) and mentions_cve_year(row["query"])
]

# Execute expansion queries
expanded_hits = sdk.search.web_many(
    unique(expanded_queries),
    limit_per_query=8,
    concurrency=12
)

What happened:

  • Model as intermediate planning subroutine
  • Adaptive backfilling based on coverage gaps
  • Validation of generated queries before execution
  • Maintains useful patterns without hardcoding

Part 3: Verification with custom logic

# Deduplicate and filter
all_hits = dedupe_by_url(flatten(seed_hits) + flatten(expanded_hits))

items = [
    {"url": h.url, "vendor_hint": infer_vendor(h.url),
     "text": join_result_fields(h)}
    for h in all_hits
    if official_vendor_advisory(h.url, infer_vendor(h.url))
]

# Extract and verify CVE-version binding
verified = sdk.llm.extract_many(
    items,
    instruction=(
        "Keep only vendor advisories where the page ties a high or critical "
        "CVE to a specific fixed version, build, patch, or security level."
    ),
    schema={
        "matches": bool,
        "cve": str,
        "vendor": str,
        "product": str,
        "fix_version": str,
        "severity": str,
        "source_url": str,
        "evidence": str,
        "version_bound_to_cve": bool,
        "confidence": float,
    },
)

# Final filtering and deduplication
records = [
    to_cve_record(x) for x in verified
    if x.matches and x.version_bound_to_cve
    if high_or_critical(x.severity) and x.confidence > 0.75
]
records = dedupe_by(records, key="cve")

What happened:

  • Custom verification logic defined entirely in code
  • Structured extraction with validation
  • Confidence-based filtering
  • Final deduplication by CVE ID

Results

MetricTraditional SearchSaC
Accuracy< 25%100%
Token usage288.7K42.9K (85% reduction)
Time to completeMultiple hoursMinutes

Why This Is a Paradigm Shift

From "Query β†’ Response" to "Goal β†’ Orchestration"

Old paradigm:

  • Model asks a question
  • Search returns an answer
  • Model reasons over answer

New paradigm:

  • Model defines a goal
  • Model generates a retrieval program
  • Program executes complex workflow
  • Model reasons over final synthesized results

Code as Orchestrator and Gap-Filler

SaC doesn't just orchestrate existing primitives. Code can fill capability gaps on the fly.

Example: You need results matching a complex regex not supported by the query syntax.

Traditional approach: Try to approximate with query operators, get noisy results, filter in token space (expensive and error-prone).

SaC approach:

# Get superset with parallel queries
results = sdk.search.web_many(approximate_queries, concurrency=8)

# Deduplicate
unique_results = sdk.dedupe_by_url(flatten(results))

# Apply exact regex in code
import re
pattern = re.compile(complex_regex)
filtered = [r for r in unique_results if pattern.search(r.text)]

Result: Exact match without bloating SDK with niche functions.


Benchmark Performance

Perplexity evaluated SaC against four other agent systems across five benchmarks.

Overall Performance

BenchmarkPerplexity SaCOpenAIAnthropicExaParallel
DSQA0.8710.7330.8150.5300.810
BrowseComp0.8050.7200.5980.3800.560
HLE0.6120.6140.5660.3870.515
WideSearch0.6510.5220.5900.4710.584
WANDR0.3860.1300.1520.0570.126

SaC leads or ties on 4 of 5 benchmarks.

WANDR: The Most Challenging Benchmark

WANDR tests complex "wide research" tasks requiring careful orchestration of search, compute, and reasoningβ€”exactly what SaC was designed for.

Performance:

  • Perplexity SaC: 0.386
  • Next best (Anthropic): 0.152
  • Advantage: 2.5x

Cost-Performance Frontier

SaC doesn't just win on performanceβ€”it dominates the cost-performance tradeoff across reasoning levels:

Reasoning LevelDSQA ScoreCost per TaskPosition
SaC Low0.82$0.50Frontier (cheaper than all non-SaC, better than 2 of them)
SaC Medium0.85$0.85Frontier (best score under $1)
SaC High0.871$1.20Frontier (best absolute)

Improvement Over Traditional Perplexity Baseline

SaC vs. traditional search pipeline (same infrastructure):

BenchmarkAbsolute GainRelative Gain
DSQA+19.77 pp+29%
BrowseComp+15.30 pp+23%
HLE+8.50 pp+16%
WideSearch+9.20 pp+17%
WANDR+12.00 pp+45%

Technical Deep Dive: The Agentic Search SDK

Design Principles

1. Atomicity Break search into the smallest useful primitives. Don't expose "smart" functionsβ€”expose building blocks that can be composed into smart behaviors.

2. Composability Every primitive should work with every other primitive. No special cases or incompatible operations.

3. Efficiency Operations must be fast enough to support thousands of calls per minute. Latency directly impacts agent capability.

4. Consumability for LLMs API design optimized through autoresearch for codegen quality. Function names, parameter ordering, and documentation all tuned for model understanding.

Why Python?

Perplexity considered Python, Rust, TypeScript, and Bash.

Python won because:

  • Ubiquitous in AI/ML ecosystems
  • Natural fit for data processing (what search results become)
  • Strong ecosystem for manipulation and analysis
  • Frontier models excel at Python codegen

State Management: Filesystem vs. REPL

REPL approach:

  • Variables persist across turns in-memory
  • No serialization overhead
  • More token-efficient

Filesystem + serde approach:

  • Explicit persistence to disk
  • Clear traceability of what's preserved
  • Better performance on long trajectories

Perplexity's choice: Filesystem + serde

Testing showed that while both approaches perform similarly in normal use, filesystem-based serde provides better reliability on long trajectories. The requirement to explicitly serialize state helps models manage complexity betterβ€”analogous to the difference between a clean notebook vs. a 100-cell Jupyter notebook with cluttered namespace.


How Models Learn to Use the SDK

Agent Skills

The SDK is custom-built and unlikely to appear in pretraining data. Even with excellent documentation, models need guidance to compose primitives effectively.

Solution: Highly-tuned Agent Skills

These Skills:

  • Teach effective SDK usage patterns
  • Provide generalizable few-shot examples
  • Show how to compose primitives into complex pipelines
  • Are optimized through autoresearch loops

Size constraint: < 2000 tokens in root SKILL.md

  • Prevents context bloat
  • Forces distillation of only essential patterns
  • Tokens spent on composition patterns, not API docs (available via reflection)

Autoresearch Optimization

Both the SDK and Agent Skills are optimized via continual autoresearch loops that:

  1. Propose SDK improvements (structure, naming, organization)
  2. Validate against metrics (latency, codegen quality, task performance)
  3. Deploy successful changes
  4. Iterate

This runs continuously over weeks, making hundreds of improvements.


Why Now? The Enabling Factors

1. Models Can Generate Production-Quality Code

GPT-5.5 and Claude Opus 4.5 reliably generate complex, multi-step programs with:

  • Proper error handling
  • Parallelism and asynchrony
  • Control flow and state management
  • Domain-specific logic

2. Code Execution is Fast and Secure

Modern sandboxes provide:

  • Sub-second execution for complex programs
  • Secure isolation
  • Persistent state across turns
  • Integration with external services (like search)

3. Search Infrastructure is Modular

Perplexity invested months rearchitecting their search stack into composable primitives. This wasn't just API designβ€”it required rethinking every layer of the search pipeline.

4. Agent Harnesses Are Production-Ready

Tools like Perplexity Computer and Agent API provide:

  • Reliable code generation
  • Sandbox management
  • State persistence
  • Error recovery

Use Cases Enabled by SaC

1. Deep Research with Complex Requirements

Example: Competitive intelligence

  • Search company announcements, press releases, SEC filings
  • Search patent databases
  • Search academic papers
  • Search GitHub for open-source activity
  • Cross-reference and identify patterns
  • Result: Comprehensive competitive analysis with source verification

2. Compliance and Regulatory Research

Example: GDPR compliance audit

  • Search for all data processing activities
  • Identify legal bases for each
  • Find retention policies
  • Cross-reference with regulatory requirements
  • Generate compliance gaps report
  • Result: Structured compliance audit with evidence chain

3. Security Intelligence

Example: Vulnerability tracking

  • Search CVE databases, vendor advisories, security mailing lists
  • Filter by severity, affected products, fix availability
  • Deduplicate across sources
  • Track patch status
  • Generate remediation priorities
  • Result: Real-time security intelligence pipeline

4. Market Research at Scale

Example: Product-market fit analysis

  • Search customer reviews across platforms
  • Search social media mentions
  • Search competitor positioning
  • Extract sentiment, features, pain points
  • Aggregate and cluster
  • Result: Comprehensive market intelligence

Limitations and Future Work

Current Limitations

LimitationImpactMitigation
Model capability ceilingComplex tasks require frontier modelsUse GPT-5.5 or Claude Opus 4.5
Sandbox overheadCode execution adds ~200-500ms latencyMinimize turns, maximize work per turn
SDK learning curveNew models need Agent SkillsContinual autoresearch optimization
Cost at scaleThousands of searches can be expensiveAggressive deduplication and caching

Future Research Directions

1. Joint optimization of SDK + Skills Currently optimized separately. Joint autoresearch loop could find better local minima.

2. Model training on SDK usage Train models specifically on SaC patterns to improve codegen quality and reduce reliance on Agent Skills.

3. SDK co-evolution during training Design the SDK itself during model training to maximize synergy.

4. Multi-agent SaC Enable multiple agents to share a SaC pipeline, coordinating searches and sharing state.

5. Physical world integration Extend SaC beyond information retrieval to physical actions (e.g., IoT, robotics).


How to Access SaC

Perplexity Computer

SaC is rolling out in Perplexity Computer, the consumer-facing autonomous AI agent product.

Access: perplexity.ai/computer

Agent API

SaC is available in Perplexity's Agent API for developers building agentic applications.

Documentation: docs.perplexity.ai/api/agents

Key features:

  • Full SaC capabilities
  • Custom Agent Skills
  • Sandbox management
  • State persistence
  • Usage metering

What This Means for the Industry

The End of Monolithic Search

Traditional searchβ€”one query, one responseβ€”is fundamentally mismatched to agentic workflows.

SaC proves that programmable, atomic search primitives are the future:

  • Agents can orchestrate thousands of operations
  • Custom logic in deterministic code
  • No context pollution
  • Cost-efficient at scale

Code as the Universal Interface

Function calling and MCPs were transitional technologies. Generated code is the endgame interface for:

  • Complex control flow
  • State management
  • Custom logic
  • Efficient execution

A New Competitive Dimension

Search companies now compete on:

  • Primitive quality: How atomic and composable are the building blocks?
  • SDK design: How easily can models learn to use it?
  • Execution efficiency: How many operations per minute?
  • Context efficiency: How much can be handled in code vs. tokens?

Comparison: SaC vs. Alternatives

DimensionPerplexity SaCTraditional Search APIsRAG Pipelines
ControlFull programmatic controlQuery parameters onlyFixed pipeline
ParallelismThousands of concurrent opsSerial function callsSingle vector search
Custom logicArbitrary codeNoneLimited via configuration
Context efficiencyWork in code, not tokensAll results in contextAll chunks in context
AdaptabilityTask-specific pipelinesOne-size-fits-allOne-size-fits-all
Performance2.5x advantage (WANDR)BaselineNot designed for search

Bottom Line

Search as Code represents a fundamental paradigm shift in how AI systems interact with information retrieval:

Before SaC:

  • Models consume search
  • Fixed pipelines
  • Serial operations
  • Context pollution

With SaC:

  • Models program search
  • Custom pipelines per task
  • Thousands of parallel operations
  • Deterministic computation

The performance gains speak for themselves: 2.5x better on complex benchmarks, 85% token reduction, 100% accuracy on tasks where alternatives fail.

More importantly, SaC enables entirely new classes of agent capabilities that were impossible with traditional search architectures.

As AI agents become the primary consumers of search, every search provider will need to adopt programmable architectures like SaC or risk irrelevance.

Perplexity has shown the path forward. The agentic era demands agentic search.


Related Posts


Analysis based on Perplexity's research article "Rethinking Search as Code Generation" published June 2026.

Related posts