← Blog
explainx / blog

What are parameters in a large language model? Billions, MoE, and what 2026 model cards really say

Model parameters are the learned numbers inside a neural net—roughly, how big the model is. Here is a clear picture of total vs active parameters, why frontier APIs often hide counts, and a table of top models with public figures (Meta Llama 4) next to the undisclosed front tier.

4 min readExplainX Team
LLM basicsModel parametersMoEMeta LlamaAI researchAnthropicOpenAI

Includes frontmatter plus an attribution block so copies credit explainx.ai and the canonical URL.

What are parameters in a large language model? Billions, MoE, and what 2026 model cards really say

Parameters (often billions of parameters, or B / bn) are the usual shorthand for how big a neural language model is in terms of learned weights. They are not the same as tokens (text units) and not the same as context length (how much text fits in one request)—but all three get compared when people discuss GPT-class, Claude-class, Gemini-class, and open weights like Llama.

This article hews to what vendors publish in 2026 and to one open line with full public tables: Meta’s Llama 4 on the official model card. Frontier APIs often list behavior and limits without a single headline parameter count; we cover why below.


What “parameters” means in one paragraph

A transformer-style LLM is a stack of layers that transform vectors representing tokens. Pretraining and fine-tuning adjust the entries of large weight matrices (and related biases) so the model improves at next-token prediction, tool use, or multimodal tasks—depending on the architecture.

Parameter count is how many of those scalar weights sit in the shipped checkpoint. Some model cards also break out separate totals for a tokenizer, vision tower, or audio encoder—read the specific card for the SKU you run.


Why people still talk in “billions”

  1. Capacity — With similar data and training, a larger weight budget can represent richer patterns; in practice, data and recipe still dominate outcomes.
  2. Serving cost — More weights (especially active per forward pass) tend to mean more FLOPs and memory at inference, though quantization and hardware matter.
  3. MoE (mixture of experts) — A model can have a huge total while routing each token through only a subset of “expert” blocks, so active width is the better first-order handle on per-step compute.

Scaling laws in research usually relate loss to compute, data, and size together; a headline “B” count is one line in a larger system.


Total vs “active” parameters: MoE in plain terms

In a dense model, “70B parameters” generally means on the order of 70B weights on the main path of each token (implementation details aside).

In an MoE design, many parallel feedforward experts exist, but a router sends each token to one or a few of them. Cards often list total parameters (all experts) and activated parameters (roughly what runs for a typical forward pass).

Meta Llama 4 (from the model card table, April 2025 release; confirm on GitHub for updates):

ModelActivated (per card)Total (per card)Context length (per card)
Llama 4 Scout (17B × 16E)17B109B10M tokens
Llama 4 Maverick (17B × 128E)17B400B1M tokens

E denotes expert count in Meta’s notation. Scout and Maverick share the same activated width in this table but differ in total size and in context length by design. Always re-read the model card for the exact checkpoint you deploy.


Frontier API models: strong specs, often no public parameter line

  • OpenAI documents GPT-5.4 and lists model behavior, context, and API model pages—without a public total parameter count in the same way open releases do.
  • Anthropic publishes Claude Opus 4.7 and a models overview with context, pricing, and features—not a “N billion parameters” headline.
  • Google DeepMind lists Gemini 3.1 Pro capabilities, modalities, and context—again, typically without a full parameter count in the consumer-facing card.

If you see a billion-scale number for a closed model in a third-party post, treat it as analysis or speculation unless the vendor or a vetted system card states it.


How to use parameter counts in practice

  • Open-weight models (Llama, others): the model card, license, and memory notes tell you if a run fits your GPUs—active size and quantization usually matter more than a huge MoE total for download size vs runtime.
  • APIs: use vendor docs for latency, context window, tools, and $/M tokens (tokens explainer, Caveman economics).
  • Benchmarks: treat headline size as weak evidence without measurements on your task and data.

Read next

Parameter and architecture details change with each release. Prefer the model card and API docs of the specific checkpoint you use.

Related posts