nemo-mbridge▌
20 indexed skills · max 10 per page
nemo-mbridge-perf-moe-optimization-workflow
nvidia/skills · nemo-mbridge
Systematic workflow for MoE training optimization in Megatron Bridge, based on the Megatron-Core MoE paper. Covers the Three Walls framework, parallel folding, recompute strategy, dispatcher choice, and CUDA-graph bring-up.
nemo-mbridge-perf-megatron-fsdp
nvidia/skills · nemo-mbridge
Operational guide for enabling Megatron FSDP in Megatron-Bridge, including config knobs, code anchors, pitfalls, and verification.
nemo-mbridge-perf-moe-vlm-training
nvidia/skills · nemo-mbridge
Practical guidance for training MoE VLMs in Megatron Bridge. Compares FSDP and 3D-parallel approaches, using rounded lessons from Qwen3-VL, Qwen3-Next, and other multimodal experiments.
nemo-mbridge-perf-memory-tuning
nvidia/skills · nemo-mbridge
Techniques for reducing peak GPU memory in Megatron Bridge — expandable segments, parallelism resizing, activation recompute, CPU offloading constraints, and common OOM fixes.
nemo-mbridge-perf-parallelism-strategies
nvidia/skills · nemo-mbridge
Operational guide for choosing and combining parallelism strategies in Megatron Bridge, including sizing rules, hardware topology mapping, and combined parallelism configuration.
nemo-mbridge-perf-cuda-graphs
nvidia/skills · nemo-mbridge
Validate and use CUDA graph capture in Megatron Bridge, including local full-iteration graphs and Transformer Engine scoped graphs for attention, MLP, and MoE modules.
nemo-mbridge-perf-cpu-offloading
nvidia/skills · nemo-mbridge
Validate and use CPU offloading in Megatron Bridge, including layer-level activation offloading and fractional optimizer state offloading with HybridDeviceOptimizer.
nemo-mbridge-perf-activation-recompute
nvidia/skills · nemo-mbridge
Validate and use selective and full activation recompute in Megatron Bridge to reduce GPU memory usage at the cost of extra compute.
nemo-mbridge-mlm-bridge-training
nvidia/skills · nemo-mbridge
Run Megatron-LM (MLM) and Megatron Bridge training with mock or real data. Covers correlation testing, available recipes, and multi-GPU examples.
nemo-mbridge-multi-node-slurm
nvidia/skills · nemo-mbridge
Convert single-node scripts to multi-node Slurm sbatch jobs and debug common multi-node failures. Covers srun-native vs uv run torch.distributed approaches, container setup, NCCL timeouts, OOM sizing for MoE models, and interactive allocation.