benchmark-models

garrytan/gstack · updated Apr 22, 2026

$npx skills add https://github.com/garrytan/gstack --skill gstack
0 commentsdiscussion
summary

Cross-model benchmark skill for comparing Claude, GPT/Codex, and Gemini on the same prompt and optionally an LLM-judge quality pass.

skill.md

Cross-model benchmark skill for comparing Claude, GPT/Codex, and Gemini on the same prompt and optionally an LLM-judge quality pass. Imported from benchmark-models/SKILL.md in garrytan/gstack.

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.565 reviews
  • Chaitanya Patil· Dec 20, 2024

    Solid pick for teams standardizing on skills: benchmark-models is focused, and the summary matches what you get after install.

  • Advait Singh· Dec 20, 2024

    Useful defaults in benchmark-models — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Charlotte Gupta· Dec 16, 2024

    benchmark-models has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Dev Jain· Dec 8, 2024

    benchmark-models is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Arya Rahman· Dec 8, 2024

    Solid pick for teams standardizing on skills: benchmark-models is focused, and the summary matches what you get after install.

  • Layla Choi· Nov 27, 2024

    benchmark-models fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • James Abbas· Nov 27, 2024

    I recommend benchmark-models for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Dev Reddy· Nov 19, 2024

    benchmark-models reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Rahul Santra· Nov 11, 2024

    I recommend benchmark-models for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Layla Kim· Nov 11, 2024

    benchmark-models has been reliable in day-to-day use. Documentation quality is above average for community skills.

showing 1-10 of 65

1 / 7