tag
evals▌
2 indexed skills · max 10 per page
skills (2)
phoenix-evals
arize-ai/phoenix · Productivity
Build evaluators for AI/LLM applications. Code first, LLM for nuance, validate against humans.
ai-evals
refoundai/lenny-skills · AI/ML
Systematic evaluation framework for AI products using practitioner-driven methodologies. \n \n Guides users through understanding what \"good\" looks like, designing rubrics and test cases, and implementing scoring criteria aligned with actual user needs \n Emphasizes manual review and error analysis as prerequisites to building meaningful evals, with structured workflows for clustering failure patterns \n Flags common pitfalls including vague criteria, LLM-as-judge without validation, and Liker