skill tag

training▌

12 indexed skills · max 10 per page

skills (12)

simpo-training

davila7/claude-code-templates · AI/ML

SimPO is a reference-free preference optimization method that outperforms DPO without needing a reference model.

grpo-rl-training

davila7/claude-code-templates · AI/ML

Expert-level guidance for implementing Group Relative Policy Optimization (GRPO) using the Transformer Reinforcement Learning (TRL) library. This skill provides battle-tested patterns, critical insights, and production-ready workflows for fine-tuning language models with custom reward functions.

prevpage 2 / 2next