Covers 13+ Vision APIs across subject lifting, hand/body pose, person segmentation, text OCR, barcode detection, and document scanning with decision trees for choosing the right tool
Includes 15 production patterns: combining APIs to exclude hands from objects, real-time gesture recognition, multi-person segmentation, fitness action classif
Confirm successful installation by checking the skill directory location:
.cursor/skills/axiom-vision
Restart Cursor to activate axiom-vision. Access via /axiom-vision in your agent's command palette.
β
Security Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your environment. Always review source, verify the publisher, and test in isolation before production.
Guides you through implementing computer vision: subject segmentation, hand/body pose detection, person detection, text recognition, barcode detection, document scanning, and combining Vision APIs to solve complex problems.
When to Use This Skill
Use when you need to:
β Isolate subjects from backgrounds (subject lifting)
β Detect and track hand poses for gestures
β Detect and track body poses for fitness/action classification
β Segment multiple people separately
β Exclude hands from object bounding boxes (combining APIs)
β Choose between VisionKit and Vision framework
β Combine Vision with CoreImage for compositing
β Decide which Vision API solves your problem
β Recognize text in images (OCR)
β Detect barcodes and QR codes
β Scan documents with perspective correction
β Extract structured data from documents (iOS 26+)
β Build live scanning experiences (DataScannerViewController)
Example Prompts
"How do I isolate a subject from the background?"
"I need to detect hand gestures like pinch"
"How can I get a bounding box around an object without including the hand holding it?"
"Should I use VisionKit or Vision framework for subject lifting?"
"How do I segment multiple people separately?"
"I need to detect body poses for a fitness app"
"How do I preserve HDR when compositing subjects on new backgrounds?"
"How do I recognize text in an image?"
"I need to scan QR codes from camera"
"How do I extract data from a receipt?"
"Should I use DataScannerViewController or Vision directly?"
"How do I scan documents and correct perspective?"
"I need to extract table data from a document"
Red Flags
Signs you're making this harder than it needs to be:
β Manually implementing subject segmentation with CoreML models
β Using ARKit just for body pose (Vision works offline)
β Writing gesture recognition from scratch (use hand pose + simple distance checks)
β Processing on main thread (blocks UI - Vision is resource intensive)
β Training custom models when Vision APIs already exist
β Not checking confidence scores (low confidence = unreliable landmarks)
β Forgetting to convert coordinates (lower-left origin vs UIKit top-left)
β Building custom text recognizer when VNRecognizeTextRequest exists
β Using AVFoundation + Vision when DataScannerViewController suffices
β Processing every camera frame for scanning (skip frames, use region of interest)
β Enabling all barcode symbologies when you only need one (performance hit)
β Ignoring RecognizeDocumentsRequest when you need table/list structure (iOS 26+)
Mandatory First Steps
Before implementing any Vision feature:
1. Choose the Right API (Decision Tree)
What do you need to do?
ββ Isolate subject(s) from background?
β ββ Need system UI + out-of-process β VisionKit
β β ββ ImageAnalysisInteraction (iOS/iPadOS)
β β ββ ImageAnalysisOverlayView (macOS)
β ββ Need custom pipeline / HDR / large images β Vision
β β ββ VNGenerateForegroundInstanceMaskRequest
β ββ Need to EXCLUDE hands from object β Combine APIs
β ββ Subject mask + Hand pose + custom masking (see Pattern 1)
β
ββ Segment people?
β ββ All people in one mask β VNGeneratePersonSegmentationRequest
β ββ Separate mask per person (up to 4) β VNGeneratePersonInstanceMaskRequest
β
ββ Detect hand pose/gestures?
β ββ Just hand location β VNDetectHumanRectanglesRequest
β ββ 21 hand landmarks β VNDetectHumanHandPoseRequest
β ββ Gesture recognition β Hand pose + distance checks
β
ββ Detect body pose?
β ββ 2D normalized landmarks β VNDetectHumanBodyPoseRequest
β ββ 3D real-world coordinates β VNDetectHumanBodyPose3DRequest
β ββ Action classification β Body pose + CreateML model
β
ββ Face detection?
β ββ Just bounding boxes β VNDetectFaceRectanglesRequest
β ββ Detailed landmarks β VNDetectFaceLandmarksRequest
β
ββ Person detection (location only)?
β ββ VNDetectHumanRectanglesRequest
β
ββ Recognize text in images?
β ββ Real-time from camera + need UI β DataScannerViewController (iOS 16+)
β ββ Processing captured image β VNRecognizeTextRequest
β β ββ Need speed (real-time camera) β recognitionLevel = .fast
β β ββ Need accuracy (documents) β recognitionLevel = .accurate
β ββ Need structured documents (iOS 26+) β RecognizeDocumentsRequest
β
ββ Detect barcodes/QR codes?
β ββ Real-time camera + need UI β DataScannerViewController (iOS 16+)
β ββ Processing image β VNDetectBarcodesRequest
β
ββ Scan documents?
ββ Need built-in UI + perspective correction β VNDocumentCameraViewController
ββ Need structured data (tables, lists) β RecognizeDocumentsRequest (iOS 26+)
ββ Custom pipeline β VNDetectDocumentSegmentationRequest + perspective correction
2. Set Up Background Processing
NEVER run Vision on main thread:
let processingQueue =DispatchQueue(label:"com.yourapp.vision", qos:.userInitiated)processingQueue.async{do{let request =VNGenerateForegroundInstanceMaskRequest()let handler =VNImageRequestHandler(cgImage: image)try handler.perform([request])// Process observations...DispatchQueue.main.async{// Update UI}}catch{// Handle error}}
3. Choose the Right Request Handler
Processing video frames? Use VNSequenceRequestHandler (maintains inter-frame state for temporal smoothing). For single images, use VNImageRequestHandler. Creating a new VNImageRequestHandler per frame discards temporal context and causes jittery results. See axiom-vision-ref for full comparison and code examples.
βΊAccess to product documentation and roadmap tools (Jira, Notion, etc.)
βΊUnderstanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
βΊStakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Steps
1Install product management skill
2Start with user story generation for known feature
3Progress to competitive analysis: research 2-3 competitors
4Use for roadmap prioritization: apply RICE/ICE scoring
5Draft stakeholder communications and refine based on feedback
6Build template library for recurring PM tasks
7Share effective prompts with product team
Common Pitfalls
β Not validating competitive researchβverify facts before sharing
β Accepting user stories without involving engineering team
β Over-relying on frameworks without qualitative judgment
β Not customizing outputs to company culture and communication style
β Skipping stakeholder validation of generated requirements
Best Practices
β Do
+Validate research and competitive analysis with real data
+Collaborate with engineering when generating technical requirements
+Customize frameworks and templates to your company context
+Use skill for first drafts, refine with stakeholder input
+Document successful prompt patterns for PM tasks
+Combine AI efficiency with human judgment and intuition
β Don't
βDon't publish competitive analysis without fact-checking
βDon't finalize user stories without engineering review
βDon't make prioritization decisions solely on AI scoring
βDon't skip customer validation of generated requirements
βDon't ignore company-specific context and culture
π‘ Pro Tips
β Provide context: company goals, constraints, customer feedback
β Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
β Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
β Use skill for 70% generation + 30% customization to company needs
When to Use This
β Use when
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
β Avoid when
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path
1Basic: user stories, feature specs, status updates