This skill guides the implementation of vision chat functionality using the z-ai-web-dev-sdk package, enabling AI models to understand and respond to images combined with text prompts.
Works with
AI-first code editor with Composer
Before installing skills in Cursor, ensure your development environment meets these requirements:
node --versionvlmExecute the skills CLI command in your project's root directory to begin installation:
Fetches vlm from answerzhao/agent-skills and configures it for Cursor.
The CLI shows a list of agents. Use arrow keys and space to select Cursor:
Confirm successful installation by checking the skill directory location:
Restart Cursor to activate vlm. Access via /vlm in your agent's command palette.
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your environment. Always review source, verify the publisher, and test in isolation before production.
Submit your Claude Code skill and start earning
Create detailed user stories, acceptance criteria, and feature specs
Example
Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios
Reduce spec writing time by 50%, ensure comprehensive coverage
Research competitors, compare features, identify gaps
Example
Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities
Complete competitive research in 2 hours instead of 2 days
Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs
Example
Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale
1
total installs
1
this week
26
GitHub stars
0
upvotes
Run in your terminal
1
installs
1
this week
26
stars
This skill guides the implementation of vision chat functionality using the z-ai-web-dev-sdk package, enabling AI models to understand and respond to images combined with text prompts.
Skill Location: {project_path}/skills/VLM
this skill is located at above path in your project.
Reference Scripts: Example test scripts are available in the {Skill Location}/scripts/ directory for quick testing and reference. See {Skill Location}/scripts/vlm.ts for a working example.
Vision Chat allows you to build applications that can analyze images, extract information from visual content, and answer questions about images through natural language conversation.
IMPORTANT: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code.
The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below.
For simple image analysis tasks, you can use the z-ai CLI instead of writing code. This is ideal for quick image descriptions, testing vision capabilities, or simple automation.
# Describe an image from URL
z-ai vision --prompt "What's in this image?" --image "https://example.com/photo.jpg"
# Using short options
z-ai vision -p "Describe this image" -i "https://example.com/image.png"
# Analyze a local image file
z-ai vision -p "What objects are in this photo?" -i "./photo.jpg"
# Save response to file
z-ai vision -p "Describe the scene" -i "./landscape.png" -o description.json
# Analyze multiple images at once
z-ai vision \
-p "Compare these two images" \
-i "./photo1.jpg" \
-i "./photo2.jpg" \
-o comparison.json
# Multiple images with detailed analysis
z-ai vision \
--prompt "What are the differences between these images?" \
--image "https://example.com/before.jpg" \
--image "https://example.com/after.jpg"
# Enable thinking for complex visual reasoning
z-ai vision \
-p "Count the number of people in this image and describe their activities" \
-i "./crowd.jpg" \
--thinking \
-o analysis.json
# Stream the vision analysis
z-ai vision -p "Describe this image in detail" -i "./photo.jpg" --stream
--prompt, -p <text>: Required - Question or instruction about the image(s)--image, -i <URL or path>: Optional - Image URL or local file path (can be used multiple times)--thinking, -t: Optional - Enable chain-of-thought reasoning (default: disabled)--output, -o <path>: Optional - Output file path (JSON format)--stream: Optional - Stream the response in real-timeUse CLI for:
Use SDK for:
For better performance and reliability, use base64 encoding to pass images to the model instead of image URLs.
The Vision Chat API supports three types of media content:
Use this type for static images (PNG, JPEG, GIF, WebP, etc.)
{
role: 'user',
content: [
{ type: 'text', text: prompt },
{ type: 'image_url', image_url: { url: imageUrl } }
]
}
Use this type for video content (MP4, AVI, MOV, etc.)
{
role: 'user',
content: [
{ type: 'text', text: prompt },
{ type: 'video_url', video_url: { url: videoUrl } }
]
}
Use this type for document files (PDF, DOCX, TXT, etc.)
{
role: 'user',
content: [
{ type: 'text', text: prompt },
{ type: 'file_url', file_url: { url: fileUrl } }
]
}
Note: You can combine multiple content types in a single message. For example, you can include both text and multiple images, or text with both an image and a document.
import ZAI from 'z-ai-web-dev-sdk';
async function analyzeImage(imageUrl, question) {
const zai = await ZAI.create();
const response = await zai.chat.completions.createVision({
messages: [
{
role: 'user',
content: [
{
type: 'text',
text: question
},
{
type: 'image_url',
image_url: {
url: imageUrl
}
}
]
}
],
thinking: { type: 'disabled' }
});
return response.choices[0]?.message?.content;
}
// Usage
const result = await analyzeImage(
'https://example.com/product.jpg',
'Describe this product in detail'
);
console.log('Analysis:', result);
import ZAI from 'z-ai-web-dev-sdk';
async function compareImages(imageUrls, question) {
const zai = await ZAI.create();
const content = [
{
type: 'text',
text: question
},
...imageUrls.map(url => ({
type: 'image_url',
image_url: { url }
}))
];
const response = await zai.chat.completions.createVision({
messages: [
{
role: 'user',
content: content
}
],
thinking: { type: 'disabled' }
});
return response.choices[0]?.message?.contentβMake data-driven prioritization decisions faster
Stakeholder Communication
Draft PRDs, status updates, and stakeholder presentations
Example
Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement
βSave 3-5 hours/week on communication overhead
Implementation Guide
Prerequisites
- βΊClaude Desktop or compatible AI client
- βΊAccess to product documentation and roadmap tools (Jira, Notion, etc.)
- βΊUnderstanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
- βΊStakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Steps
- 1Install product management skill
- 2Start with user story generation for known feature
- 3Progress to competitive analysis: research 2-3 competitors
- 4Use for roadmap prioritization: apply RICE/ICE scoring
- 5Draft stakeholder communications and refine based on feedback
- 6Build template library for recurring PM tasks
- 7Share effective prompts with product team
Common Pitfalls
- β Not validating competitive researchβverify facts before sharing
- β Accepting user stories without involving engineering team
- β Over-relying on frameworks without qualitative judgment
- β Not customizing outputs to company culture and communication style
- β Skipping stakeholder validation of generated requirements
Best Practices
β Do
- +Validate research and competitive analysis with real data
- +Collaborate with engineering when generating technical requirements
- +Customize frameworks and templates to your company context
- +Use skill for first drafts, refine with stakeholder input
- +Document successful prompt patterns for PM tasks
- +Combine AI efficiency with human judgment and intuition
β Don't
- βDon't publish competitive analysis without fact-checking
- βDon't finalize user stories without engineering review
- βDon't make prioritization decisions solely on AI scoring
- βDon't skip customer validation of generated requirements
- βDon't ignore company-specific context and culture
π‘ Pro Tips
- β
Provide context: company goals, constraints, customer feedback
- β
Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
- β
Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
- β
Use skill for 70% generation + 30% customization to company needs
When to Use This
β Use when
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
β Avoid when
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path
- 1Basic: user stories, feature specs, status updates
- 2Intermediate: competitive analysis, prioritization frameworks, PRDs
- 3Advanced: product strategy, go-to-market planning, OKR setting
- 4Expert: product vision, market positioning, business model innovation
Related Skills
grill-me
452mattpocock/skills
Productivitysame categorypremortem
202parcadei/continuous-claude-v3
Productivitysame categorydeslop
129cursor/plugins
Productivitysame categoryframer-motion
108pproenca/dot-skills
Productivitysame categorytravel-planner
101ailabs-393/ai-labs-claude-skills
Productivitysame categorywrite-a-prd
100mattpocock/skills
Productivitysame categoryReviews
4.4β
β
β
β
β
74 reviews- ZZaid Sharmaβ
β
β
β
β
Dec 24, 2024
I recommend vlm for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- EEvelyn Choiβ
β
β
β
β
Dec 20, 2024
Useful defaults in vlm β fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- NNikhil Whiteβ
β
β
β
β
Dec 12, 2024
vlm has been reliable in day-to-day use. Documentation quality is above average for community skills.
- MMia Harrisβ
β
β
β
β
Dec 12, 2024
vlm reduced setup friction for our internal harness; good balance of opinion and flexibility.
- KKofi Thompsonβ
β
β
β
β
Dec 8, 2024
vlm is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- CChaitanya Patilβ
β
β
β
β
Dec 4, 2024
Useful defaults in vlm β fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- DDiya Perezβ
β
β
β
β
Dec 4, 2024
Registry listing for vlm matched our evaluation β installs cleanly and behaves as described in the markdown.
- IIra Khannaβ
β
β
β
β
Nov 27, 2024
Solid pick for teams standardizing on skills: vlm is focused, and the summary matches what you get after install.
- IIra Garciaβ
β
β
β
β
Nov 27, 2024
Useful defaults in vlm β fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- PPiyush Gβ
β
β
β
β
Nov 23, 2024
vlm is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
showing 1-10 of 74
1 / 8Discussion
Comments β not star reviews- No comments yet β start the thread.