What are the three OKF sample bundles from Google?

Google ships three conformant Open Knowledge Format (OKF) bundles in the GoogleCloudPlatform/knowledge-catalog repo: GA4 e-commerce (Google Merchandise Store analytics), Stack Overflow public dataset, and Bitcoin blockchain public dataset. Each was produced by the reference BigQuery enrichment agent that drafts concept docs for tables, metrics, and joins.

What is the GA4 OKF sample bundle based on?

The GA4 bundle maps to bigquery-public-data.ga4_obfuscated_sample_ecommerce— three months of obfuscated BigQuery event export data (2020-11-01 to 2021-01-31) from the Google Merchandise Store. Fields may contain placeholders like , NULL, or empty strings due to obfuscation.

What is the Bitcoin OKF sample bundle based on?

The Bitcoin bundle maps to bigquery-public-data.bitcoin_blockchain—full historical Bitcoin blockchain data in blocks_raw and transactions tables, updated every 10 minutes. Google has offered this public dataset since 2018 for OLAP analytics on an immutable OLTP ledger.

Do I need billing to explore these BigQuery datasets?

You need a Google Cloud project with BigQuery API enabled. BigQuery Sandbox mode and the free usage tier are sufficient for sample queries on both datasets. Enable billing only if you exceed free-tier limits.

How do OKF bundles differ from querying BigQuery directly?

BigQuery holds raw rows; OKF bundles are pre-compiled markdown concept pages with YAML frontmatter (type, description, resource links, cross-references). Agents read linked concept docs instead of re-deriving schema and join paths on every question—the same LLM wiki compile-once pattern Karpathy describes.

Where can I browse OKF sample bundles?

Clone GoogleCloudPlatform/knowledge-catalog on GitHub. Sample bundles live in the repo alongside okf/SPEC.md. Google also ships a static HTML visualizer that renders any OKF bundle as an interactive graph—no backend required.

OKF Sample Bundles: GA4 & Bitcoin BigQuery Guide | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

OKF Sample Bundles: GA4 & Bitcoin BigQuery Guide | explainx.ai Blog | explainx.ai

When Google launched Open Knowledge Format (OKF) v0.1 in June 2026, it did not ship an empty spec—it included three living sample bundles produced by a reference BigQuery enrichment agent:

OKF bundle	Underlying BigQuery public dataset
GA4 e-commerce	`bigquery-public-data.ga4_obfuscated_sample_ecommerce`
Stack Overflow	Stack Overflow public dataset
Bitcoin	`bigquery-public-data.bitcoin_blockchain`

This guide goes hands-on on the two datasets you linked: GA4 e-commerce and Bitcoin in BigQuery—what they contain, how to query them, and how OKF turns raw tables into agent-readable knowledge graphs.

TL;DR

Property	Value
Date range	2020-11-01 → 2021-01-31
Project path	`bigquery-public-data.ga4_obfuscated_sample_ecommerce`
Primary table	`events_*` (date-sharded)
Docs	GA4 BigQuery sample dataset

sql

SELECT
  COUNT(*) AS event_count,
  COUNT(DISTINCT user_pseudo_id) AS user_count,
  COUNT(DISTINCT event_date) AS day_count
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`

OKF concept type	Examples
Dataset	`ga4_obfuscated_sample_ecommerce` overview
Table	`events_*` schema, sharding pattern
Metric	Sessions, purchases, `user_pseudo_id` uniqueness
Event	`purchase`, `add_to_cart`, `page_view` parameters

Property	Value
Project path	`bigquery-public-data.bitcoin_blockchain`
Tables	`blocks_raw`, `transactions`
Update cadence	~every 10 minutes as new blocks broadcast
Also on	Kaggle (BigQuery Python client in Kernels)

Analysis	Insight
BTC transacted per day	Economic activity on-network
Recipient addresses per day	User growth proxy (Metcalfe's Law)
NVT Ratio	Network Value to Transactions—valuation metric
Mining difficulty vs search volume	Fundamental + attention correlation

sql

#standardSQL
SELECT *
FROM (
  SELECT
    transaction_id,
    COUNT(transaction_id) AS dup_transaction_count
  FROM `bigquery-public-data.bitcoin_blockchain.transactions`
  GROUP BY transaction_id
)
WHERE dup_transaction_count > 1

OKF concept type	Examples
Dataset	`bitcoin_blockchain` overview
Table	`transactions`, `blocks_raw` columns
Metric	NVT Ratio, daily transaction volume
Entity	Notable addresses, pizza purchase
Anomaly	Pre-BIP-0030 duplicate transaction_ids

snippet

BigQuery dataset
    ↓  (walk tables/views)
Draft OKF concept .md per table/metric
    ↓  (second LLM pass)
Enrich with citations, joins, descriptions from authoritative docs
    ↓
Commit conformant OKF bundle → git
    ↓
Visualize with static HTML graph viewer
    ↓
Ingest into Cloud Knowledge Catalog for GCP agents

Approach	Agent behavior	Tradeoff
Raw BigQuery	Agent queries `INFORMATION_SCHEMA`, guesses joins	Flexible, slow, error-prone
RAG on docs	Retrieves doc chunks per question	No cross-link maintenance
OKF bundle	Reads pre-compiled concept graph	Upfront enrichment cost; reliable at moderate scale

Repo	GoogleCloudPlatform/knowledge-catalog
GA4 dataset	`ga4_obfuscated_sample_ecommerce` — Nov 2020–Jan 2021
Bitcoin dataset	`bitcoin_blockchain` — full history, updates ~every 10 min
GA4 tables	`events_*` (sharded by date)
Bitcoin tables	`blocks_raw`, `transactions`
Cost	BigQuery Sandbox / free tier sufficient for samples
OKF value	Pre-linked concept pages vs raw schema discovery

OKF Sample Bundles: GA4 E-commerce & Bitcoin BigQuery Datasets Guide

TL;DR

Related posts

Open Knowledge Format (OKF): Google's Standard for AI Agent Memory

AI Skills Every Developer Needs in 2026: The Prompting → MCP → Agents Roadmap

Muse Spark 1.1: Meta Model API, 1M Context, and Agentic Coding Upgrade

Why Public Datasets Make Good OKF Demos

Bundle 1: GA4 E-commerce (Google Merchandise Store)

What it is

Prerequisites

Limitations (read before trusting numbers)

Starter query: dataset overview

What agents learn from the OKF bundle

Next steps for GA4

Bundle 2: Bitcoin Blockchain

What it is

Why BigQuery for blockchain?

Network fundamentals queries

Famous transaction: Bitcoin pizza (May 17, 2010)

Anomaly detection: duplicate transactions

What agents learn from the OKF bundle

Bundle 3: Stack Overflow (brief)

Raw BigQuery → OKF Bundle: The Pipeline

Browse a bundle without BigQuery

Sample Queries Cheat Sheet

GA4: events by name

GA4: purchase events only

Bitcoin: daily transaction count

Bitcoin: largest outputs (recent)

OKF vs Raw SQL for Agents

Getting Started Checklist

Summary

TL;DR

Related posts

Open Knowledge Format (OKF): Google's Standard for AI Agent Memory

AI Skills Every Developer Needs in 2026: The Prompting → MCP → Agents Roadmap

Muse Spark 1.1: Meta Model API, 1M Context, and Agentic Coding Upgrade

Why Public Datasets Make Good OKF Demos

Bundle 1: GA4 E-commerce (Google Merchandise Store)

What it is

Prerequisites

Limitations (read before trusting numbers)

Starter query: dataset overview

What agents learn from the OKF bundle

Next steps for GA4

Bundle 2: Bitcoin Blockchain

What it is

Why BigQuery for blockchain?

Network fundamentals queries

Famous transaction: Bitcoin pizza (May 17, 2010)

Anomaly detection: duplicate transactions

What agents learn from the OKF bundle

Bundle 3: Stack Overflow (brief)

Raw BigQuery → OKF Bundle: The Pipeline

Browse a bundle without BigQuery

Sample Queries Cheat Sheet

GA4: events by name

GA4: purchase events only

Bitcoin: daily transaction count

Bitcoin: largest outputs (recent)

OKF vs Raw SQL for Agents

Getting Started Checklist

Summary

Related Reading