langsmith-evaluator
Build evaluation pipelines for LangSmith with LLM-as-Judge and custom code evaluators.
Works with
What it does
Three core components: creating evaluators (LLM-as-Judge or custom code), defining run functions to capture agent outputs and trajectories, and running evaluations locally or auto-running via uploaded evaluators
Supports both offline evaluators (comparing run outputs to dataset examples) and online evaluators (real-time quality checks on production runs)
Requires LangSmith API key and project confi
Installation Guide
How to use langsmith-evaluator on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- βΊCursor installed and configured on your machine
- βΊNode.js 16+ with npm β verify with
node --version - βΊActive project directory where you want to add
langsmith-evaluator
Run the install command
Execute the skills CLI command in your project's root directory to begin installation:
Fetches langsmith-evaluator from langchain-ai/langsmith-skills and configures it for Cursor.
Select Cursor when prompted
The CLI shows a list of agents. Use arrow keys and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Restart Cursor to activate langsmith-evaluator. Access via /langsmith-evaluator in your agent's command palette.
Security Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your environment. Always review source, verify the publisher, and test in isolation before production.
Documentation
langsmith-evaluator
No content available.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases
User Story & Requirements Generation
Create detailed user stories, acceptance criteria, and feature specs
Example
Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios
Reduce spec writing time by 50%, ensure comprehensive coverage
Competitive Analysis
Research competitors, compare features, identify gaps
Example
Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities
Complete competitive research in 2 hours instead of 2 days
Roadmap Prioritization
Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs
Example
Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale