What is Google TabFM?

TabFM (Tabular Foundation Model) is a foundation model from Google Research for tabular classification and regression, announced June 30, 2026. It frames tabular prediction as in-context learning — the model takes training examples and test rows together as input and predicts in one forward pass without per-dataset training, hyperparameter search, or manual feature engineering.

How is TabFM different from XGBoost or random forests?

Tree-based models like XGBoost require fitting on each new dataset, plus hyperparameter tuning and often extensive feature engineering. TabFM is pretrained on hundreds of millions of synthetic tabular datasets and generalizes zero-shot to unseen tables. You call predict once — no .fit() loop on your specific distribution beyond passing train rows as context.

Is TabFM open source?

Yes. TabFM is Apache 2.0 on GitHub (google-research/tabfm), weights on Hugging Face, and installable via pip install tabfm. Google notes it is not an officially supported Google product — research code with enterprise paths via BigQuery integration planned.

What is TabFM-Ensemble?

TabFM-Ensemble is a higher-accuracy configuration that adds cross features and SVD features, then combines 32 model outputs with non-negative least squares optimal weights. For classification it also uses Platt scaling for calibration. It tops TabArena Elo scores but requires more compute than out-of-the-box TabFM.

How does TabFM rank on benchmarks?

Google evaluated TabFM on TabArena — a living benchmark with Elo scores from head-to-head win rates across 38 classification and 13 regression datasets (700 to 150,000 samples). TabFM and TabFM-Ensemble rank at or near the top versus heavily tuned XGBoost, random forests, and other tabular baselines. Per-fold details are on the GitHub repo.

Will TabFM work in BigQuery?

Google announced BigQuery integration in the coming weeks — advanced regression and classification via a simple AI.PREDICT SQL command, similar to how TimesFM landed in BigQuery ML. No ML expertise required for SQL-first teams.

Google TabFM: Zero-Shot Tabular Foundation Model Guide (2026) | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

Google TabFM: Zero-Shot Tabular Foundation Model Guide (2026) | explainx.ai Blog | explainx.ai

Tabular data still runs most of enterprise ML — churn prediction, fraud detection, credit scoring, ops forecasting on structured columns. For decades the workflow was the same: load a table, engineer features, tune XGBoost or a random forest, cross-validate, deploy.

Google Research's TabFM (announced June 30, 2026) asks a different question: what if tabular prediction worked like a language model — zero-shot, one forward pass, no per-dataset training?

It is the tabular sibling to TimesFM, which already shifted how teams handle time-series forecasting. TabFM targets classification and regression on mixed-type columns with a scikit-learn-compatible API, weights on Hugging Face, code on GitHub, and pip install tabfm (v1.0.0).

TL;DR


What it is	Foundation model for tabular classification + regression via in-context learning
Release	June 30, 2026 — TabFM v1.0.0

Scope	Coverage
Classification	38 datasets
Regression	13 datasets
Sample sizes	700 → 150,000 rows

Variant	What it does	Tuning required
TabFM	Single forward pass, out-of-the-box	None
TabFM-Ensemble	Cross features + SVD features, 32-way ensemble, NNLS optimal weights; Platt scaling on classification	Ensemble setup, not full HPO on base trees

python

pip install tabfm

from tabfm import tabfm_v1_0_0

# Load pretrained TabFM v1.0.0 (JAX or PyTorch backend)
model = tabfm_v1_0_0.load()

# Standard sklearn-style API — ICL happens inside predict
model.fit(X_train, y_train)
predictions = model.predict(X_test)

Approach	Per-dataset training	Feature engineering	Typical strength
XGBoost / RF	Yes — full fit + HPO	Often extensive	Strong default on medium tables
TabPFN	No — ICL	Minimal	Small/medium tables, fast ICL
TabICL	No — compressed ICL	Minimal	Efficient ICL at scale
TabFM	No — hybrid architecture + massive synthetic pretrain	Eliminated for zero-shot	TabArena-leading Elo, sklearn API, BigQuery path
LLM on CSV text	Prompt-only	Fragile on wide/numeric tables	General reasoning, not tabular-native

Google TabFM: Zero-Shot Foundation Model for Tabular Classification and Regression

TL;DR

Related posts

Google TimesFM 2.5: The Open-Source Time Series Foundation Model Explained

Apertus: The Fully Open Foundation Model Making AI Truly Sovereign

code-review-graph: Stop AI Coding Agents From Re-Reading Your Whole Repo

The Bottleneck TabFM Targets

How TabFM Works

1. Alternating row and column attention

2. Row compression

3. In-context learning on compressed rows

Training on Synthetic Data at Scale

TabArena Benchmarks

Quick Start

BigQuery: `AI.PREDICT` Coming Soon

TabFM vs the Tabular Landscape

Limitations and Honest Caveats

Who Should Care

Key Links

TL;DR

Related posts

Google TimesFM 2.5: The Open-Source Time Series Foundation Model Explained

Apertus: The Fully Open Foundation Model Making AI Truly Sovereign

code-review-graph: Stop AI Coding Agents From Re-Reading Your Whole Repo

The Bottleneck TabFM Targets

How TabFM Works

1. Alternating row and column attention

2. Row compression

3. In-context learning on compressed rows

Training on Synthetic Data at Scale

TabArena Benchmarks

Quick Start

BigQuery: AI.PREDICT Coming Soon

TabFM vs the Tabular Landscape

Limitations and Honest Caveats

Who Should Care

Key Links

Related Reading

BigQuery: `AI.PREDICT` Coming Soon