PixelRAG is an open-source visual retrieval-augmented generation system from UC Berkeley's SkyLab, BAIR, and Berkeley NLP. Instead of parsing web pages and PDFs to text (which destroys tables, charts, and visual layout), PixelRAG renders them as screenshot tiles and retrieves over the images. A Qwen3-VL-Embedding model, LoRA-fine-tuned on screenshot data, embeds page images so visual content is searchable. It ships with a pre-built 8.28M-page Wikipedia index and a Claude Code plugin (pixelbrowse) that lets Claude screenshot any URL and read it as a human would.

How much better is PixelRAG vs text-based RAG?

The Berkeley team reports up to 18% accuracy improvement on SimpleQA benchmarks and a 3x reduction in tokens per query in agent runs compared to text-based RAG baselines. The gains are largest on documents with tables, charts, and structured layout — exactly the content that HTML parsers destroy.

How do I install PixelRAG?

pip install pixelrag for the core renderer and CLI. pip install 'pixelrag[embed]' adds the embedding pipeline. pip install 'pixelrag[index]' adds the full orchestrated pipeline. pip install 'pixelrag[serve]' adds the FastAPI search server. The Claude Code plugin installs with: claude plugin marketplace add StarTrail-org/PixelRAG && claude plugin install pixelbrowse@pixelrag-plugins. The hosted Wikipedia API at api.pixelrag.ai requires no local setup.

Can I use PixelRAG with my own documents?

Yes. Create a pixelrag.yaml pointing at your document directory, set embed.model to Qwen/Qwen3-VL-Embedding-2B and embed.device to cuda or cpu, then run pixelrag index build followed by pixelrag serve. The full pipeline — local docs to a searchable FAISS index — runs from that one YAML file. You can also run the stages independently: pixelrag chunk, pixelrag embed, pixelrag build-index.

What is the pixelbrowse Claude skill?

pixelbrowse is a Claude Code plugin (skill) from the PixelRAG project. Instead of fetching a page's raw HTML (which strips visual structure), it uses pixelshot to screenshot the page and passes the image to Claude. Claude sees charts, tables, diagrams, and layout the way a person does. Install with two lines: pip install pixelrag and claude plugin install pixelbrowse@pixelrag-plugins. Then ask Claude to "screenshot https://news.ycombinator.com and summarize the top stories" or use /screenshot in an interactive session.

PixelRAG: Visual RAG That Reads Pages as Screenshots | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

PixelRAG: Visual RAG That Reads Pages as Screenshots | explainx.ai Blog | explainx.ai

Text-based RAG has a structural problem that chunking strategies and rerankers cannot fix: HTML parsers throw away the page.

Tables become flat text with no column alignment. Charts become nothing. Side-by-side comparisons collapse into sequential sentences. The visual structure that makes the page human-readable vanishes before retrieval even starts.

PixelRAG is the UC Berkeley project that sidesteps this entirely. Instead of parsing pages to text, it renders them as screenshot tiles and retrieves over the images using a vision-language embedding model. The reader model — Claude, GPT, Qwen, whatever you use — reads the answer directly from what a human would see.

The project comes from Berkeley's SkyLab, BAIR, and Berkeley NLP groups (led by Yichuan Wang, Zhifei Li, Zirui Wang, Paul Teiletche, and Lesheng Jin, with Matei Zaharia, Joseph Gonzalez, and Sewon Min advising). It is Apache 2.0, ships with a pre-built 8.28M-page Wikipedia index, and adds a Claude Code plugin (pixelbrowse) that gives Claude visual page access in one command.

The Problem With Text-Based RAG

Every traditional RAG pipeline does something like this:

Fetch a web page
Parse HTML to text chunks
Embed the chunks
Retrieve the most relevant chunks
Pass chunks to a reader model

Step 2 is where information dies. Consider a Wikipedia table listing historical stock prices by year. As HTML: perfectly structured. As parsed text: Year Price 1990 12.4 1991 18.7 ... — the column headers may survive but the spatial relationship is gone. Now ask the reader "what was the highest price before 1995?" The table's answer is obvious visually. The text dump makes it a string parsing problem.

This gets worse with:

Charts and graphs — entirely missing from text output
Multi-column layouts — merged into single-stream text
Infographics — completely lost
Form layouts — field-value relationships scrambled

bash

# Ask Claude to screenshot a page and reason about it
claude -p "screenshot https://news.ycombinator.com and summarize the top stories"
claude -p "screenshot https://arxiv.org/abs/2404.12387 and explain the key findings"

bash

pip install 'pixelrag[serve]'

huggingface-cli download StarTrail-org/pixelrag-faiss-indexes \
  --repo-type dataset \
  --include "search_index_normed_v2/*" \
  --local-dir ./index

pixelrag serve --index-dir ./index/search_index_normed_v2 --port 30001

Metric	Text RAG	PixelRAG
SimpleQA accuracy	baseline	+18% higher
Tokens per query (agent runs)	baseline	3x fewer
Tables preserved	partial	complete (as image)
Charts preserved	no	yes
Visual layout preserved	no	yes
Setup for Wikipedia search	full pipeline	zero (hosted API)

python

from pixelrag_render import render_url

# Render a page to tiles
tiles = render_url("https://en.wikipedia.org/wiki/Python", "./tiles")

# Each tile is an image file path you can pass to a vision model
for tile in tiles:
    print(tile)

PixelRAG: Berkeley's Visual RAG That Reads Web Pages as Screenshots (Not HTML)

The Problem With Text-Based RAG

Related posts

Claude for Open Source Expanded: 6 Months of Claude Max 20x for Maintainers

Can Claude or LLMs Watch a Video? Here's How to Make It Work

video-use: Edit Videos With Claude Code — No Premiere Pro Needed

How PixelRAG Works

1. The Renderer (pixelshot)

2. The Embedding Model

Quick Start

Hosted Wikipedia API (no setup required)

Install PixelRAG

Give Claude Eyes: The pixelbrowse Plugin

Building Your Own Index

Downloading the Pre-Built Wikipedia Index

Performance Numbers

Using PixelRAG Programmatically

Fine-Tuning on Your Own Data

What This Changes About Web Search in Agent Pipelines

Project Links

The Problem With Text-Based RAG

Related posts

Claude for Open Source Expanded: 6 Months of Claude Max 20x for Maintainers

Can Claude or LLMs Watch a Video? Here's How to Make It Work

video-use: Edit Videos With Claude Code — No Premiere Pro Needed

How PixelRAG Works

1. The Renderer (pixelshot)

2. The Embedding Model

Quick Start

Hosted Wikipedia API (no setup required)

Install PixelRAG

Give Claude Eyes: The pixelbrowse Plugin

Building Your Own Index

Downloading the Pre-Built Wikipedia Index

Performance Numbers

Using PixelRAG Programmatically

Fine-Tuning on Your Own Data

What This Changes About Web Search in Agent Pipelines

Project Links

Related Reading