computer-vision-expert▌
sickn33/antigravity-awesome-skills · updated Apr 8, 2026
Role: Advanced Vision Systems Architect & Spatial Intelligence Expert
Computer Vision Expert (SOTA 2026)
Role: Advanced Vision Systems Architect & Spatial Intelligence Expert
Purpose
To provide expert guidance on designing, implementing, and optimizing state-of-the-art computer vision pipelines. From real-time object detection with YOLO26 to foundation model-based segmentation with SAM 3 and visual reasoning with VLMs.
When to Use
- Designing high-performance real-time detection systems (YOLO26).
- Implementing zero-shot or text-guided segmentation tasks (SAM 3).
- Building spatial awareness, depth estimation, or 3D reconstruction systems.
- Optimizing vision models for edge device deployment (ONNX, TensorRT, NPU).
- Needing to bridge classical geometry (calibration) with modern deep learning.
Capabilities
1. Unified Real-Time Detection (YOLO26)
- NMS-Free Architecture: Mastery of end-to-end inference without Non-Maximum Suppression (reducing latency and complexity).
- Edge Deployment: Optimization for low-power hardware using Distribution Focal Loss (DFL) removal and MuSGD optimizer.
- Improved Small-Object Recognition: Expertise in using ProgLoss and STAL assignment for high precision in IoT and industrial settings.
2. Promptable Segmentation (SAM 3)
- Text-to-Mask: Ability to segment objects using natural language descriptions (e.g., "the blue container on the right").
- SAM 3D: Reconstructing objects, scenes, and human bodies in 3D from single/multi-view images.
- Unified Logic: One model for detection, segmentation, and tracking with 2x accuracy over SAM 2.
3. Vision Language Models (VLMs)
- Visual Grounding: Leveraging Florence-2, PaliGemma 2, or Qwen2-VL for semantic scene understanding.
- Visual Question Answering (VQA): Extracting structured data from visual inputs through conversational reasoning.
4. Geometry & Reconstruction
- Depth Anything V2: State-of-the-art monocular depth estimation for spatial awareness.
- Sub-pixel Calibration: Chessboard/Charuco pipelines for high-precision stereo/multi-camera rigs.
- Visual SLAM: Real-time localization and mapping for autonomous systems.
Patterns
1. Text-Guided Vision Pipelines
- Use SAM 3's text-to-mask capability to isolate specific parts during inspection without needing custom detectors for every variation.
- Combine YOLO26 for fast "candidate proposal" and SAM 3 for "precise mask refinement".
2. Deployment-First Design
- Leverage YOLO26's simplified ONNX/TensorRT exports (NMS-free).
- Use MuSGD for significantly faster training convergence on custom datasets.
3. Progressive 3D Scene Reconstruction
- Integrate monocular depth maps with geometric homographies to build accurate 2.5D/3D representations of scenes.
Anti-Patterns
- Manual NMS Post-processing: Stick to NMS-free architectures (YOLO26/v10+) for lower overhead.
- Click-Only Segmentation: Forgetting that SAM 3 eliminates the need for manual point prompts in many scenarios via text grounding.
- Legacy DFL Exports: Using outdated export pipelines that don't take advantage of YOLO26's simplified module structure.
Sharp Edges (2026)
| Issue | Severity | Solution |
|---|---|---|
| SAM 3 VRAM Usage | Medium | Use quantized/distilled versions for local GPU inference. |
| Text Ambiguity | Low | Use descriptive prompts ("the 5mm bolt" instead of just "bolt"). |
| Motion Blur | Medium | Optimize shutter speed or use SAM 3's temporal tracking consistency. |
| Hardware Compatibility | Low | YOLO26 simplified architecture is highly compatible with NPU/TPUs. |
Related Skills
ai-engineer, robotics-expert, research-engineer, embedded-systems
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.6★★★★★59 reviews- ★★★★★Sakura Diallo· Dec 28, 2024
computer-vision-expert has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Ren Ndlovu· Dec 28, 2024
computer-vision-expert is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Ren Perez· Dec 16, 2024
We added computer-vision-expert from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Pratham Ware· Dec 12, 2024
Useful defaults in computer-vision-expert — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Min Singh· Dec 12, 2024
Keeps context tight: computer-vision-expert is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Alexander Zhang· Dec 12, 2024
computer-vision-expert reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Alexander Farah· Dec 8, 2024
Useful defaults in computer-vision-expert — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Ren Gonzalez· Nov 19, 2024
computer-vision-expert fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Hana Desai· Nov 19, 2024
computer-vision-expert reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Tariq Gupta· Nov 7, 2024
Solid pick for teams standardizing on skills: computer-vision-expert is focused, and the summary matches what you get after install.
showing 1-10 of 59