ai-ml

AI Vision

tan-yong-sheng

by tan-yong-sheng

AI Vision uses Google Cloud Vertex AI to analyze images and videos, leveraging intelligent file handling for optimized u

Integrates with Google's Gemini and Vertex AI models to analyze images, compare multiple images, and process video content with intelligent file handling that automatically optimizes upload strategies for different file sizes.

github stars

42

0 commentsdiscussion

Both formats append explainx.ai attribution and the canonical URL for this MCP server listing.

Dual provider support (Gemini + Vertex AI)Handles both images and videosIntelligent file upload optimization

best for

  • / Content creators analyzing visual media
  • / Developers building vision-enabled applications
  • / Researchers processing image/video datasets
  • / Teams needing automated visual content analysis

capabilities

  • / Analyze images with AI-powered vision models
  • / Process video content for insights and analysis
  • / Compare multiple images side-by-side
  • / Upload files via URLs, local paths, or base64
  • / Store and manage media files in Google Cloud Storage
  • / Switch between Gemini API and Vertex AI providers

what it does

Analyzes images and videos using Google's Gemini or Vertex AI models, with intelligent file handling for different content types and sizes.

about

AI Vision is a community-built MCP server published by tan-yong-sheng that provides AI assistants with tools and capabilities via the Model Context Protocol. AI Vision uses Google Cloud Vertex AI to analyze images and videos, leveraging intelligent file handling for optimized u It is categorized under ai ml.

how to install

You can install AI Vision in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

license

MIT

AI Vision is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

readme

AI Vision MCP Server

A powerful Model Context Protocol (MCP) server that provides AI-powered image and video analysis using Google Gemini and Vertex AI models.

Features

  • Dual Provider Support: Choose between Google Gemini API and Vertex AI
  • Multimodal Analysis: Support for both image and video content analysis
  • Flexible File Handling: Upload via multiple methods (URLs, local files, base64)
  • Storage Integration: Built-in Google Cloud Storage support
  • Comprehensive Validation: Zod-based data validation throughout
  • Error Handling: Robust error handling with retry logic and circuit breakers
  • TypeScript: Full TypeScript support with strict type checking

Quick Start

Pre-requisites

You could choose either to use google provider or vertex_ai provider. For simplicity, google provider is recommended.

Below are the environment variables you need to set based on your selected provider. (Note: It’s recommended to set the timeout configuration to more than 5 minutes for your MCP client).

(i) Using Google AI Studio Provider

export IMAGE_PROVIDER="google" # or vertex_ai
export VIDEO_PROVIDER="google" # or vertex_ai
export GEMINI_API_KEY="your-gemini-api-key"

Get your Google AI Studio's api key here

(ii) Using Vertex AI Provider

export IMAGE_PROVIDER="vertex_ai"
export VIDEO_PROVIDER="vertex_ai"
export VERTEX_CLIENT_EMAIL="[email protected]"
export VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----
"
export VERTEX_PROJECT_ID="your-gcp-project-id"
export GCS_BUCKET_NAME="your-gcs-bucket"

Refer to the guideline here on how to set this up.

Installation

Below are the installation guide for this MCP on different MCP clients, such as Claude Desktop, Claude Code, Cursor, Cline, etc.

<details> <summary>Claude Desktop</summary>

Add to your Claude Desktop configuration:

(i) Using Google AI Studio Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "google",
        "VIDEO_PROVIDER": "google",
        "GEMINI_API_KEY": "your-gemini-api-key"
      }
    }
  }
}

(ii) Using Vertex AI Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "vertex_ai",
        "VIDEO_PROVIDER": "vertex_ai",
        "VERTEX_CLIENT_EMAIL": "[email protected]",
        "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----
",
        "VERTEX_PROJECT_ID": "your-gcp-project-id",
        "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
      }
    }
  }
}
</details> <details> <summary>Claude Code</summary>

(i) Using Google AI Studio Provider

claude mcp add ai-vision-mcp \
  -e IMAGE_PROVIDER=google \
  -e VIDEO_PROVIDER=google \
  -e GEMINI_API_KEY=your-gemini-api-key \
  -- npx ai-vision-mcp

(ii) Using Vertex AI Provider

claude mcp add ai-vision-mcp \
  -e IMAGE_PROVIDER=vertex_ai \
  -e VIDEO_PROVIDER=vertex_ai \
  -e VERTEX_CLIENT_EMAIL=your-service-account@project.iam.gserviceaccount.com \
  -e VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----
" \
  -e VERTEX_PROJECT_ID=your-gcp-project-id \
  -e GCS_BUCKET_NAME=ai-vision-mcp-{VERTEX_PROJECT_ID} \
  -- npx ai-vision-mcp

Note: Increase the MCP startup timeout to 1 minutes and MCP tool execution timeout to about 5 minutes by updating ~\.claude\settings.json as follows:

{
  "env": {
    "MCP_TIMEOUT": "60000",
    "MCP_TOOL_TIMEOUT": "300000"
  }
}
</details> <details> <summary>Cursor</summary>

Go to: Settings -> Cursor Settings -> MCP -> Add new global MCP server

Pasting the following configuration into your Cursor ~/.cursor/mcp.json file is the recommended approach. You may also install in a specific project by creating .cursor/mcp.json in your project folder. See Cursor MCP docs for more info.

(i) Using Google AI Studio Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "google",
        "VIDEO_PROVIDER": "google",
        "GEMINI_API_KEY": "your-gemini-api-key"
      }
    }
  }
}

(ii) Using Vertex AI Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "vertex_ai",
        "VIDEO_PROVIDER": "vertex_ai",
        "VERTEX_CLIENT_EMAIL": "[email protected]",
        "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----
",
        "VERTEX_PROJECT_ID": "your-gcp-project-id",
        "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
      }
    }
  }
}
</details> <details> <summary>Cline</summary>

Cline uses a JSON configuration file to manage MCP servers. To integrate the provided MCP server configuration:

  1. Open Cline and click on the MCP Servers icon in the top navigation bar.
  2. Select the Installed tab, then click Advanced MCP Settings.
  3. In the cline_mcp_settings.json file, add the following configuration:

(i) Using Google AI Studio Provider

{
  "mcpServers": {
    "timeout": 300, 
    "type": "stdio",
    "ai-vision-mcp": {
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "google",
        "VIDEO_PROVIDER": "google",
        "GEMINI_API_KEY": "your-gemini-api-key"
      }
    }
  }
}

(ii) Using Vertex AI Provider

{
  "mcpServers": {
    "ai-vision-mcp": {
      "timeout": 300,
      "type": "stdio",
      "command": "npx",
      "args": ["ai-vision-mcp"],
      "env": {
        "IMAGE_PROVIDER": "vertex_ai",
        "VIDEO_PROVIDER": "vertex_ai",
        "VERTEX_CLIENT_EMAIL": "[email protected]",
        "VERTEX_PRIVATE_KEY": "-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----
",
        "VERTEX_PROJECT_ID": "your-gcp-project-id",
        "GCS_BUCKET_NAME": "ai-vision-mcp-{VERTEX_PROJECT_ID}"
      }
    }
  }
}
</details> <details> <summary>Other MCP clients</summary>

The server uses stdio transport and follows the standard MCP protocol. It can be integrated with any MCP-compatible client by running:

npx ai-vision-mcp
</details>

MCP Tools

The server provides four main MCP tools:

1) analyze_image

Analyzes an image using AI and returns a detailed description.

Parameters:

  • imageSource (string): URL, base64 data, or file path to the image
  • prompt (string): Question or instruction for the AI
  • options (object, optional): Analysis options including temperature and max tokens

Examples:

  1. Analyze image from URL:
{
  "imageSource": "https://plus.unsplash.com/premium_photo-1710965560034-778eedc929ff",
  "prompt": "What is this image about? Describe what you see in detail."
}
  1. Analyze local image file:
{
  "imageSource": "C:\Users\username\Downloads\image.jpg",
  "prompt": "What is this image about? Describe what you see in detail."
}

2) compare_images

Compares multiple images using AI and returns a detailed comparison analysis.

Parameters:

  • imageSources (array): Array of image sources (URLs, base64 data, or file paths) - minimum 2, maximum 4 images
  • prompt (string): Question or instruction for comparing the images
  • options (object, optional): Analysis options including temperature and max tokens

Examples:

  1. Compare images from URLs:
{
  "imageSources": [
    "https://example.com/image1.jpg",
    "https://example.com/image2.jpg"
  ],
  "prompt": "Compare these two images and tell me the differences"
}
  1. Compare mixed sources:
{
  "imageSources": [
    "https://example.com/image1.jpg",
    "C:\\Users\\username\\Downloads\\image2.jpg",
    "data:image/jpeg;base64,/9j/4AAQSkZJRgAB..."
  ],
  "prompt": "Which image has the best lighting quality?"
}

3) detect_objects_in_image

Detects objects in an image using AI vision models and generates annotated images with bounding boxes. Returns detected objects with coordinates and either saves the annotated image to a file or temporary directory.

Parameters:

  • imageSource (string): URL, base64 data, or file path to the image
  • prompt (string): Custom detection prompt describing what to detect or recognize in the image
  • outputFilePath (string, optional): Explicit output path for the annotated image

Configuration: This function uses optimized default parameters for object detection and does not accept runtime options parameter. To customize the AI parameters (temperature, topP, topK, maxTokens), use environment variables:

# Recommended environment variable settings for object detection (these are now the defaults)
TEMPERATURE_FOR_DETECT_OBJECTS_IN_IMAGE=0.0     # Deterministic responses
TOP_P_FOR_DETECT_OBJECTS_IN_IMAGE=0.95          # Nucleus sampling
TOP_K_FOR_DETECT_OBJECTS_IN_IMAGE=30            # Vocabulary selection
MAX_TOKENS_FOR_DETECT_OBJECTS_IN_IMAGE=8192     # High token limit for JSON

File Handling Logic:

  1. Explicit outputFilePath provided → Saves to the exact path specified
  2. If not explicit outputFilePath → Automatically saves to temporary directory

Response Types:

  • Returns file object when explicit outputFilePath is provided
  • Returns tempFile object when explicit outputFilePath is not provided so the image file output is auto-saved to temporary folder
  • A

FAQ

What is the AI Vision MCP server?
AI Vision is a Model Context Protocol (MCP) server profile on explainx.ai. MCP lets AI hosts (e.g. Claude Desktop, Cursor) call tools and resources through a standard interface; this page summarizes categories, install hints, and community ratings.
How do MCP servers relate to agent skills?
Skills are reusable instruction packages (often SKILL.md); MCP servers expose live capabilities. Teams frequently combine both—skills for workflows, MCP for APIs and data. See explainx.ai/skills and explainx.ai/mcp-servers for parallel directories.
How are reviews shown for AI Vision?
This profile displays 68 aggregated ratings (sample rows for discoverability plus signed-in user reviews). Average score is about 4.8 out of 5—verify behavior in your own environment before production use.

Use Cases

Extended AI Capabilities

Add new capabilities to Claude beyond text generation

Example

Access external data sources, execute code, interact with tools and services

Transform Claude from chatbot to action-taking agent

Context Enhancement

Provide Claude with access to relevant context and data

Example

Load project documentation, access knowledge bases, query databases

Get more accurate, context-aware responses

Workflow Automation

Automate multi-step workflows combining AI and external tools

Example

Research → Summarize → Create document → Send notification

Complete complex tasks end-to-end without manual steps

Implementation Guide

Prerequisites

  • Claude Desktop 0.7.0+ or Cursor IDE with MCP support
  • Basic understanding of MCP architecture and capabilities
  • Access credentials for integrated services (if required)
  • Willingness to experiment and iterate on configuration

Time Estimate

15-60 minutes depending on server complexity

Installation Steps

  1. 1.Install MCP server: npm install -g [package-name] or via GitHub
  2. 2.Add server configuration to ~/.claude/mcp.json
  3. 3.Provide required credentials and configuration
  4. 4.Restart Claude Desktop to load new server
  5. 5.Test basic functionality with simple prompts
  6. 6.Explore capabilities and experiment with use cases
  7. 7.Document successful patterns for reuse

Troubleshooting

  • MCP server not loading: Check config syntax, verify installation
  • Connection errors: Check network, firewall, credentials
  • Feature not working: Read server docs, check required parameters
  • Performance issues: Monitor resource usage, check for network latency
  • Conflicts with other servers: Check port assignments, namespace collisions

Best Practices

✓ Do

  • +Read server documentation thoroughly before setup
  • +Start with simple use cases to validate functionality
  • +Test in non-production environment first
  • +Monitor resource usage and performance
  • +Keep servers updated for bug fixes and new features
  • +Document configuration for team members
  • +Use environment variables for sensitive configuration

✗ Don't

  • Don't grant overly permissive access to MCP servers
  • Don't skip reading security considerations in docs
  • Don't expose sensitive data without proper controls
  • Don't run untrusted MCP servers without code review
  • Don't ignore error messages—investigate root cause

💡 Pro Tips

  • Combine multiple MCP servers for powerful workflows
  • Create custom MCP servers for your specific needs
  • Share successful configurations with team
  • Use MCP inspector for debugging
  • Join MCP community for tips and troubleshooting

Technical Details

Architecture

Model Context Protocol standardizes how AI hosts (Claude, Cursor) communicate with external tools and data sources through server implementations.

Protocols

  • Model Context Protocol (MCP)
  • JSON-RPC 2.0
  • stdio or HTTP transport

Compatibility

  • Claude Desktop
  • Cursor IDE
  • Custom MCP clients

When to Use This

✓ Use When

Use when you need Claude to access external data, execute actions, or integrate with tools. Best for extending AI capabilities beyond conversation.

✗ Avoid When

Avoid when native integrations exist (use official APIs directly), for real-time critical systems, or when security/compliance requires zero external dependencies.

Integration

  • Tool composition: Chain multiple MCP tools in workflows
  • Context augmentation: Provide AI with relevant external data
  • Action delegation: Let AI execute tasks on external systems
  • Bidirectional sync: Keep AI context and external systems in sync

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.

List & Promote Your MCP Server

Share your MCP server with the developer community

GET_STARTED →
MCP server reviews

Ratings

4.868 reviews
  • Daniel Iyer· Dec 28, 2024

    We evaluated AI Vision against two servers with overlapping tools; this profile had the clearer scope statement.

  • Meera Khan· Dec 28, 2024

    Useful MCP listing: AI Vision is the kind of server we cite when onboarding engineers to host + tool permissions.

  • Valentina Tandon· Dec 24, 2024

    AI Vision is among the better-indexed MCP projects we tried; the explainx.ai summary tracks the official description.

  • Ishan Patel· Dec 20, 2024

    AI Vision has been reliable for tool-calling workflows; the MCP profile page is a good permalink for internal docs.

  • Chaitanya Patil· Dec 16, 2024

    We wired AI Vision into a staging workspace; the listing’s GitHub and npm pointers saved time versus hunting across READMEs.

  • Ama Anderson· Dec 12, 2024

    AI Vision is a well-scoped MCP server in the explainx.ai directory — install snippets and categories matched our Claude Code setup.

  • Ishan Rao· Nov 27, 2024

    Strong directory entry: AI Vision surfaces stars and publisher context so we could sanity-check maintenance before adopting.

  • Ama Rao· Nov 19, 2024

    I recommend AI Vision for teams standardizing on MCP; the explainx.ai page compares cleanly with sibling servers.

  • Fatima Ndlovu· Nov 19, 2024

    AI Vision reduced integration guesswork — categories and install configs on the listing matched the upstream repo.

  • Mateo Wang· Nov 15, 2024

    According to our notes, AI Vision benefits from clear Model Context Protocol framing — fewer ambiguous “AI plugin” claims.

showing 1-10 of 68

1 / 7