Vercel spent years building agents internally — v0, an autonomous SDR, a data analyst that handles 30,000 Slack questions a month, a support agent that resolves 92% of tickets without human help. Every one of them had to hand-roll the same plumbing before it could do anything useful: durability, credential brokering, approval gates, observability. None of it carried over from one agent to the next.
On June 17, 2026, Vercel shipped the answer to that problem. eve is an open-source, filesystem-first TypeScript framework for building, running, and scaling agents. The positioning is explicit: what Next.js did for the web, eve is doing for agents.
Guillermo Rauch, who built Next.js on the premise that pages/index.js is all you need, put it plainly: "Eve asks for even less. agent/instructions.md. Put some English in there and you're good to go."
An Agent Is a Directory
The core design decision in eve is that an agent is a directory of files, and each file's name and location defines what it is. No framework registration, no configuration maps, no boilerplate to wire things together.
agent/
agent.ts # the model it runs on
instructions.md # who it is
tools/
run_sql.ts # what it can do
post_chart.ts
skills/
revenue-definitions.md # what it knows
subagents/
investigator/ # who it delegates to
channels/
slack.ts # where it lives
schedules/
monday-summary.ts # when it acts on its own
At a glance, the tree tells you what an agent is, what it does, where it lives, and when it acts on its own. eve picks up each file at build time and wires it into the agent loop — tools become callable functions, skills become context loaded on demand, channels become live surfaces.
The analogy to Next.js is structural: just as Next.js turns a file at pages/about.js into a route by owning the routing layer, eve turns a file at agent/tools/run_sql.ts into an agent capability by owning the agent loop.
What Ships With the Framework
Most agent frameworks give you an execution primitive and leave production concerns to you. eve ships production already built in.
Durable Execution
Agents wait on people, call slow systems, and run for hours, days, or weeks. Every conversation in eve is a durable workflow built on the open-source Workflow SDK, with each step checkpointed. A session can pause, survive a crash or a deploy, and resume exactly where it stopped.
The practical implication: when you push a new deploy, sessions that are mid-task finish on the version they started on. There is no interruption, no lost state, no need to re-drive the conversation from the top.
Sandboxed Compute
Code your agents write should be treated as untrusted. eve keeps agent-generated code out of your application runtime entirely. Every agent gets its own sandbox — an isolated environment for shell commands, scripts, and file reads and writes — running in a separate security context from the harness that controls the agent.
Locally the sandbox runs on Docker, microsandbox, or just-bash. In production it runs on Vercel Sandbox. The backend is an adapter interface, so you can write an adapter for any other provider.
Human-in-the-Loop Approvals
Any tool in eve can be configured to require human approval before it executes. When the agent reaches that tool, it pauses and waits — indefinitely, without consuming compute — until someone approves or rejects. The approval resumes the session exactly where it left off.
export default defineTool({
description: "Run a read-only SQL query against the warehouse.",
inputSchema: z.object({ sql: z.string() }),
needsApproval: ({ toolInput }) => estimateScanGb(toolInput.sql) > 50,
async execute({ sql }) {
// unchanged from the non-approval version
},
});
One field on the tool. The framework handles the pause, the approval interface (rendered as Slack buttons in the Slack channel, for example), and the resume.
Subagents
An eve agent can delegate work to other agents. A subagent is the same directory structure one level down, inside subagents/, with its own agent.ts, instructions.md, and tools. The parent calls it like any other tool. The child starts with a clean context window, does the work, and returns the result.
// agent/subagents/investigator/agent.ts
export default defineAgent({
description: "Investigates anomalies before the analyst reports them.",
model: "anthropic/claude-opus-4.8",
});
The parent agent does not need to know how the investigator works — it just calls it and gets the result. This composability is what makes eve's multi-agent patterns clean: each agent has bounded context and bounded responsibility.
Tracing and Evals
Every run produces a standard OpenTelemetry trace. Each model call and tool call appears in order with inputs and outputs, down to the commands run in the sandbox.
ai.eve.turn # one span per turn
├── ai.streamText # the model call
│ └── ai.streamText.doStream
└── ai.toolCall # run_sql, with inputs and outputs
Spans export to Braintrust, Honeycomb, Datadog, Jaeger, or any other OTel-compatible tracing service. On Vercel, they surface in an Agent Runs tab under Observability.
Evals go further: scored test suites written as files, runnable locally or in CI.
export default defineEval({
description: "The analyst answers revenue questions by the team's rules.",
async test(t) {
await t.send("What was revenue last week?");
t.completed();
t.calledTool("run_sql");
t.check(t.reply, includes("net of refunds"));
},
});
Wire eve eval into CI and a prompt change or model swap shows you what it broke before your users do. Every commit gets a preview deployment that carries the agent's channels with it — the team can talk to the next version of the Slack bot before it replaces the one they use every day.
The Core Building Blocks
Tools
A tool is a single TypeScript file. The filename becomes the tool name. There is no registration step.
// agent/tools/run_sql.ts
export default defineTool({
description: "Run a read-only SQL query against the orders and customers tables.",
inputSchema: z.object({
sql: z.string().describe("A single read-only SELECT statement."),
}),
async execute({ sql }) {
const { columns, rows } = await runReadOnlySql(sql);
return { columns, rows: rows.slice(0, 500), truncated: rows.length > 500 };
},
});
eve picks it up at build time, hands the model its description and schema, and brokers execution. You write what the tool does — the framework handles everything else.
Skills
A skill is a markdown file with a frontmatter description. It is loaded into the model's context only when the topic comes up, not on every call.
---
description: How this team defines revenue. Load before answering any revenue question.
---
Revenue is recognized net of refunds, over the subscription term.
Weeks are Monday-anchored, in UTC.
Exclude trial and internal accounts from every number.
Skills are the eve equivalent of retrieval-augmented context, without the retrieval pipeline. The model decides when to load them based on the description.
Connections
A connection points at an MCP server or any API with a compatible OpenAPI document. eve discovers the remote tools, hands them to the model, and brokers auth. The model never sees the connection's URL or credentials.
// agent/connections/linear.ts
export default defineMcpClientConnection({
url: "https://mcp.linear.app/sse",
description: "Linear workspace: issues, projects, cycles, and comments.",
auth: {
getToken: async () => ({ token: process.env.LINEAR_API_TOKEN! }),
},
});
At launch, eve connects to Slack, GitHub, Snowflake, Salesforce, Notion, and Linear, plus anything reachable over OAuth, an API key, or an MCP server. Vercel Connect handles interactive OAuth with consent and token refresh built in.
Channels
A channel is a small adapter file. The same agent serves every surface.
eve channels add slack
# writes channels/slack.ts and deploys as part of the next push
The Slack channel renders approvals as buttons, questions as select menus, and posts typing indicators while the agent works. Sessions move between channels — a question asked in Slack can continue on the web, and an incident webhook can open an investigation thread in Slack.
At launch: HTTP API (on by default), Slack, Discord, Teams, Telegram, Twilio, GitHub, Linear, plus defineChannel for custom surfaces.
Schedules
A schedule is a cron expression and a handler. On Vercel it deploys as a Vercel Cron Job.
// agent/schedules/monday-summary.ts
export default defineSchedule({
cron: "0 9 * * 1",
async run({ receive, waitUntil, appAuth }) {
waitUntil(
receive(slack, {
message: "Summarize last week's revenue and post it to the team channel.",
target: { channelId: "C0123ABC" },
auth: appAuth,
}),
);
},
});
The agent acts on its own schedule without anyone asking it to.
How Vercel Runs 100+ Agents on eve Internally
Vercel runs more than a hundred agents in production on eve. Five of them illustrate the range of what the framework is being used for.
d0 — The Data Analyst: The most-used internal tool at Vercel. Anyone can ask d0 anything in Slack and get an answer from the warehouse. It handles more than 30,000 questions a month. Every query is scoped to the asker's own permissions — d0 can never show you a table you could not already see.
Lead Agent — The Autonomous SDR: Runs the playbook of Vercel's best sales rep around the clock, working every new lead the moment it comes in and following up on its own. It costs about $5,000 a year to run, returns 32 times that, and one engineer maintains it part-time.
Athena — The Sales Cockpit: Built by RevOps in six weeks without engineers. Answers pipeline and forecast questions from Snowflake and Salesforce in plain language. Pipeline coverage nearly doubled after it went live.
Vertex — The Support Engineer: Handles tickets across the help center, docs, and Slack around the clock. Reads the ticket, finds the answer, responds — solving 92% of tickets without human help and escalating the rest so the support team can focus on problems that actually need them.
V — The Routing Agent: Receives everything in Slack first and routes each task to the right agent in the fleet. Instead of the team tracking which of hundreds of agents handles what, V figures it out. The whole fleet works like one agent instead of a hundred separate options.
All of these began as separate projects on separate stacks with their own state management, credential brokering, and logging. Today they live in one monorepo and are built, observed, and upgraded the same way, no matter which team owns them.
eve vs. AI SDK vs. LangChain
It is worth clarifying where eve sits relative to the tools it builds on.
eve vs. AI SDK: The AI SDK is a lower-level foundation — provider abstraction, streaming, tool use, minimal dependencies, UI-framework-agnostic. It gives you full control, including provider-specific configuration options. eve builds on the AI SDK internally. Lars Grammel from the AI SDK team described it accurately: "AI SDK is lower level foundation... eve is higher level, opinionated framework." They are complementary, not competing.
eve vs. LangChain / LangGraph: LangChain gives you pre-built harness components — chains, agents, tool connectors — as a library you configure. LangGraph gives you a graph-based state machine for multi-agent workflows. Both are lower-level than eve and leave production concerns (durability, sandboxing, approval gates, channel integrations) to you. eve is the full stack: framework, runtime, and production infrastructure in one.
eve vs. Claude Code /loop: Claude Code's /loop runs a simple retry loop for coding tasks. eve is a general-purpose agent framework with durability, multi-channel deployment, and eval infrastructure. Different scope, different use case.
The Dev Loop
Starting an eve agent locally is one command:
eve dev
The terminal UI shows every step of every run as it happens — which skill was loaded, which tool was called, what the model said, which steps are checkpointed. The TUI is a client over HTTP, so curl, a test script, or CI can drive the same agent and inspect the same structured events.
Deploying is also one command:
vercel deploy
Nothing about the agent changes when it deploys. The sandbox swaps to Vercel Sandbox without a code change. The agent you were talking to in dev is reachable at a public URL. The deploy does not interrupt in-flight sessions — a session mid-task when you push finishes on the version it started on.
Getting Started
The public preview is available now. The CLI wizard walks from picking a model to a running dev server in under a minute:
npx eve@latest init my-agent
If you want a coding agent to set it up:
Set up an Eve agent for the user. Eve is a filesystem-first TypeScript
framework for durable agents, published as the npm package eve. Read its
docs: once eve is installed they are bundled at node_modules/eve/docs.
Scaffold with `npx eve@latest init <name>`. Make sure agent/agent.ts and
agent/instructions.md exist, then add a first typed tool at
agent/tools/get_weather.ts using defineTool from eve/tools with a Zod
inputSchema and an inline execute. Start the dev server, then exercise
the HTTP API: create a session with POST /eve/v1/session, attach to
GET /eve/v1/session/:id/stream, and send a follow-up with the returned
continuationToken.
Documentation is at eve.dev/docs. Development is open at github.com/vercel/eve.
Why This Matters
A year ago, agents triggered less than 3% of deployments on Vercel. Now they trigger around 29%. Vercel expects half of all deployments to come from agents soon.
The inflection point is the same one that happened with the web before Next.js: enough people had built the same thing the hard way that the abstractions had earned their existence. Every generation of software reaches this point — the moment where the plumbing is well understood enough to be hidden, so you can focus on what you are building rather than how it runs.
For agents, that moment is now. An agent is a directory. Eve runs it.