skill tag
trainingโ
12 indexed skills ยท max 10 per page
skills (12)
simpo-training
davila7/claude-code-templates ยท AI/ML
SimPO is a reference-free preference optimization method that outperforms DPO without needing a reference model.
grpo-rl-training
davila7/claude-code-templates ยท AI/ML
Expert-level guidance for implementing Group Relative Policy Optimization (GRPO) using the Transformer Reinforcement Learning (TRL) library. This skill provides battle-tested patterns, critical insights, and production-ready workflows for fine-tuning language models with custom reward functions.