ai-mldeveloper-tools

MCP-Upstage-Server

by UpstageAI

MCP-Upstage-Server: AI document extraction with Upstage AI — automatic data extraction from documents, custom schemas an

Enables document processing through Upstage AI services including parsing various document formats, extracting structured information with custom schemas, auto-generating extraction schemas, and classifying documents into categories.

github stars

2

Supports multiple document formatsTypeScript implementation with error handlingDual transport support (stdio and HTTP)

best for

  • / Developers building document processing workflows
  • / Teams needing automated data extraction from business documents
  • / AI applications requiring structured document analysis

capabilities

  • / Parse PDFs, images, and Office documents
  • / Extract structured information using custom schemas
  • / Auto-generate extraction schemas from documents
  • / Classify documents into categories like invoice, receipt, contract
  • / Generate structured data from unstructured documents

what it does

Processes documents through Upstage AI services to parse various formats, extract structured data with custom schemas, and classify document types. Supports PDFs, images, and Office files with automated schema generation.

about

MCP-Upstage-Server is an official MCP server published by UpstageAI that provides AI assistants with tools and capabilities via the Model Context Protocol. MCP-Upstage-Server: AI document extraction with Upstage AI — automatic data extraction from documents, custom schemas an It is categorized under ai ml, developer tools.

how to install

You can install MCP-Upstage-Server in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

license

MIT

MCP-Upstage-Server is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

readme

MCP-Upstage-Server

Node.js/TypeScript implementation of the MCP server for Upstage AI services.

Features

  • Document Parsing: Extract structure and content from various document types (PDF, images, Office files)
  • Information Extraction: Extract structured information using custom or auto-generated schemas
  • Schema Generation: Automatically generate extraction schemas from document analysis
  • Document Classification: Classify documents into predefined categories (invoice, receipt, contract, etc.)
  • Built with TypeScript for type safety
  • Dual transport support: stdio (default) and HTTP Streamable
  • Async/await pattern throughout
  • Comprehensive error handling and retry logic
  • Progress reporting support

Installation

Prerequisites

Install from npm

# Install globally
npm install -g mcp-upstage-server

# Or use with npx (no installation required)
npx mcp-upstage-server

Install from source

# Clone the repository
git clone https://github.com/UpstageAI/mcp-upstage.git
cd mcp-upstage/mcp-upstage-node

# Install dependencies
npm install

# Build the project
npm run build

# Set up environment variables
cp .env.example .env
# Edit .env and add your UPSTAGE_API_KEY

Usage

Running the server

# With stdio transport (default)
UPSTAGE_API_KEY=your-api-key npx mcp-upstage-server

# With HTTP Streamable transport
UPSTAGE_API_KEY=your-api-key npx mcp-upstage-server --http

# With HTTP transport on custom port
UPSTAGE_API_KEY=your-api-key npx mcp-upstage-server --http --port 8080

# Show help
npx mcp-upstage-server --help

# Development mode (from source)
npm run dev

# Production mode (from source)
npm start

Integration with Claude Desktop

Option 1: stdio transport (default)

{
  "mcpServers": {
    "upstage": {
      "command": "npx",
      "args": ["mcp-upstage-server"],
      "env": {
        "UPSTAGE_API_KEY": "your-api-key-here"
      }
    }
  }
}

Option 2: HTTP Streamable transport

{
  "mcpServers": {
    "upstage-http": {
      "command": "npx",
      "args": ["mcp-upstage-server", "--http", "--port", "3000"],
      "env": {
        "UPSTAGE_API_KEY": "your-api-key-here"
      }
    }
  }
}

Transport Options

stdio Transport (Default)

  • Pros: Simple setup, direct process communication
  • Cons: Single client connection only
  • Usage: Default mode, no additional configuration needed

HTTP Streamable Transport

  • Pros: Multiple client support, network accessible, RESTful API
  • Cons: Requires port management, network configuration
  • Endpoints:
    • POST /mcp - Main MCP communication endpoint
    • GET /mcp - Server-Sent Events stream
    • GET /health - Health check endpoint

Available Tools

parse_document

Parse a document using Upstage AI's document digitization API.

Parameters:

  • file_path (required): Path to the document file
  • output_formats (optional): Array of output formats (e.g., ['html', 'text', 'markdown'])

Supported formats: PDF, JPEG, PNG, TIFF, BMP, GIF, WEBP

extract_information

Extract structured information from documents using Upstage Universal Information Extraction.

Parameters:

  • file_path (required): Path to the document file
  • schema_path (optional): Path to JSON schema file
  • schema_json (optional): JSON schema as string
  • auto_generate_schema (optional, default: true): Auto-generate schema if none provided

Supported formats: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX

generate_schema

Generate an extraction schema for a document using Upstage AI's schema generation API.

Parameters:

  • file_path (required): Path to the document file to analyze

Supported formats: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX

This tool analyzes a document and automatically generates a JSON schema that defines the structure and fields that can be extracted from similar documents. The generated schema can then be used with the extract_information tool when auto_generate_schema is set to false.

Use cases:

  • Create reusable schemas for multiple similar documents
  • Have more control over extraction fields
  • Ensure consistent field naming across extractions

The tool returns both a readable schema object and a schema_json string that can be directly copied and used with the extract_information tool.

classify_document

Classify a document into predefined categories using Upstage AI's document classification API.

Parameters:

  • file_path (required): Path to the document file to classify
  • schema_path (optional): Path to JSON file containing custom classification schema
  • schema_json (optional): JSON string containing custom classification schema

Supported formats: JPEG, PNG, BMP, PDF, TIFF, HEIC, DOCX, PPTX, XLSX

This tool analyzes a document and classifies it into categories. By default, it uses a comprehensive set of document types, but you can provide custom classification categories.

Default categories:

  • invoice, receipt, contract, cv, bank_statement, tax_document, insurance, business_card, letter, form, certificate, report, others

Use cases:

  • Automatically sort and organize documents by type
  • Filter documents for specific processing workflows
  • Build document management systems with automatic categorization

Schema Guide for Information Extraction

When auto_generate_schema is false, you need to provide a custom schema. Here's how to format it correctly:

📋 Basic Schema Structure

The schema must follow this exact structure:

{
  "type": "json_schema",
  "json_schema": {
    "name": "document_schema",
    "schema": {
      "type": "object",
      "properties": {
        "field_name": {
          "type": "string|number|array|object",
          "description": "Description of what to extract"
        }
      }
    }
  }
}

❌ Common Mistakes

Wrong: Missing nested structure

{
  "company_name": {
    "type": "string"
  }
}

Wrong: Incorrect response_format

{
  "schema": {
    "company_name": "string"
  }
}

Wrong: Missing properties wrapper

{
  "type": "json_schema",
  "json_schema": {
    "name": "document_schema", 
    "schema": {
      "type": "object",
      "company_name": {
        "type": "string"
      }
    }
  }
}

✅ Correct Examples

Simple schema:

{
  "type": "json_schema",
  "json_schema": {
    "name": "document_schema",
    "schema": {
      "type": "object",
      "properties": {
        "company_name": {
          "type": "string",
          "description": "Name of the company"
        },
        "invoice_number": {
          "type": "string",
          "description": "Invoice number"
        },
        "total_amount": {
          "type": "number",
          "description": "Total invoice amount"
        }
      }
    }
  }
}

Complex schema with arrays and objects:

{
  "type": "json_schema",
  "json_schema": {
    "name": "document_schema",
    "schema": {
      "type": "object",
      "properties": {
        "company_info": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "address": {"type": "string"},
            "phone": {"type": "string"}
          },
          "description": "Company information"
        },
        "items": {
          "type": "array",
          "items": {
            "type": "object", 
            "properties": {
              "item_name": {"type": "string"},
              "quantity": {"type": "number"},
              "price": {"type": "number"}
            }
          },
          "description": "List of invoice items"
        },
        "invoice_date": {
          "type": "string",
          "description": "Invoice date in YYYY-MM-DD format"
        }
      }
    }
  }
}

🛠️ Schema Creation Helper

You can create schemas programmatically:

function createSchema(fields) {
  return JSON.stringify({
    "type": "json_schema",
    "json_schema": {
      "name": "document_schema",
      "schema": {
        "type": "object",
        "properties": fields
      }
    }
  });
}

// Usage example:
const schema = createSchema({
  "company_name": {
    "type": "string",
    "description": "Company name"
  },
  "total": {
    "type": "number", 
    "description": "Total amount"
  }
});

💡 Data Types

  • "string": Text data (names, addresses, etc.)
  • "number": Numeric data (amounts, quantities, etc.)
  • "boolean": True/false values
  • "array": Lists of items
  • "object": Nested structures
  • "null": Null values

📝 Best Practices

  1. Always include descriptions: They help the AI understand what to extract
  2. Use specific field names: invoice_date instead of date
  3. Nest related fields: Group related information in objects
  4. Validate your JSON: Use a JSON validator before using the schema
  5. Test with simple schemas first: Start with basic fields before adding complexity

Classification Schema Guide

The classify_document tool uses a different schema format optimized for classification tasks. Here's how to create custom classification schemas:

📋 Simple Classification Categories

For custom categories, just provide an array of category objects:

[
  {"const": "category1", "description": "Description of category 1"},
  {"const": "category2", "description": "Description of category 2"},
  {"const": "others", "description": "Fallback category"}
]

The tool automatically wraps this in the proper schema structure for the API.

✅ Correct Classification Examples

Medical document classifier:

[
  {"const": "prescription", "description": "Medical prescription document"},
  {"const": "lab_result", "description": "Laboratory test results"},
  {"const": "medical_recor

---

FAQ

What is the MCP-Upstage-Server MCP server?
MCP-Upstage-Server is a Model Context Protocol (MCP) server profile on explainx.ai. MCP lets AI hosts (e.g. Claude Desktop, Cursor) call tools and resources through a standard interface; this page summarizes categories, install hints, and community ratings.
How do MCP servers relate to agent skills?
Skills are reusable instruction packages (often SKILL.md); MCP servers expose live capabilities. Teams frequently combine both—skills for workflows, MCP for APIs and data. See explainx.ai/skills and explainx.ai/mcp-servers for parallel directories.
How are reviews shown for MCP-Upstage-Server?
This profile displays 10 aggregated ratings (sample rows for discoverability plus signed-in user reviews). Average score is about 4.5 out of 5—verify behavior in your own environment before production use.
MCP server reviews

Ratings

4.510 reviews
  • Shikha Mishra· Oct 10, 2024

    MCP-Upstage-Server is among the better-indexed MCP projects we tried; the explainx.ai summary tracks the official description.

  • Piyush G· Sep 9, 2024

    We evaluated MCP-Upstage-Server against two servers with overlapping tools; this profile had the clearer scope statement.

  • Chaitanya Patil· Aug 8, 2024

    Useful MCP listing: MCP-Upstage-Server is the kind of server we cite when onboarding engineers to host + tool permissions.

  • Sakshi Patil· Jul 7, 2024

    MCP-Upstage-Server reduced integration guesswork — categories and install configs on the listing matched the upstream repo.

  • Ganesh Mohane· Jun 6, 2024

    I recommend MCP-Upstage-Server for teams standardizing on MCP; the explainx.ai page compares cleanly with sibling servers.

  • Oshnikdeep· May 5, 2024

    Strong directory entry: MCP-Upstage-Server surfaces stars and publisher context so we could sanity-check maintenance before adopting.

  • Dhruvi Jain· Apr 4, 2024

    MCP-Upstage-Server has been reliable for tool-calling workflows; the MCP profile page is a good permalink for internal docs.

  • Rahul Santra· Mar 3, 2024

    According to our notes, MCP-Upstage-Server benefits from clear Model Context Protocol framing — fewer ambiguous “AI plugin” claims.

  • Pratham Ware· Feb 2, 2024

    We wired MCP-Upstage-Server into a staging workspace; the listing’s GitHub and npm pointers saved time versus hunting across READMEs.

  • Yash Thakker· Jan 1, 2024

    MCP-Upstage-Server is a well-scoped MCP server in the explainx.ai directory — install snippets and categories matched our Claude Code setup.