Scienceofficial

interpro-database

google-deepmind/science-skills · updated Jun 4, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills add https://github.com/google-deepmind/science-skills --skill interpro-database
0 commentsdiscussion
summary

### Interpro Database

  • name: "interpro-database"
  • description: "Identify domains, families, and sites in proteins; find all proteins in a family or sharing a domain; explore species distribution for a domain; annotate genomes with protein families and GO terms. In..."
skill.md
name
interpro-database
description
> Identify domains, families, and sites in proteins; find all proteins in a family or sharing a domain; explore species distribution for a domain; annotate genomes with protein families and GO terms. InterPro combines 14 databases (e.g., Pfam, CDD) into one searchable resource. InterPro-N significantly expands annotation and sequence coverage with deep learning. Includes domain architecture (IDA) search.

InterPro Database Access

Prerequisites

  1. uv: Read the uv skill and follow its Setup instructions to ensure uv is installed and on PATH.
  2. User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://www.ebi.ac.uk/interpro/ and https://www.ebi.ac.uk/about/terms-of-use/, then (2) create the file recording the notification text and timestamp.

Overview

InterPro combines signatures from multiple, diverse databases into a single searchable resource, reducing redundancy and helping users interpret their sequence analysis results. By uniting these member databases (e.g., Pfam, CDD, SMART), InterPro capitalises on their individual strengths to produce a powerful diagnostic tool and integrated resource.

Use interpro-database to:

  • Identify what domains, families, and sites are found in a particular protein.
  • Identify all proteins that belong to a protein family or contain a particular domain, even when the names and activities of the proteins are highly variable.
  • Examine the species in which a particular protein family or domain is found.
  • Annotate genomes with protein family information and Gene Ontology (GO) terms.

This skill provides a robust utility, interpro_client.py, to interact with the InterPro API seamlessly. It natively handles rate limiting (HTTP 429), background query sleep tracking (HTTP 408), terminal errors (HTTP 404/410), and lazy pagination.

Core Rules

  • Use the Wrapper: ALWAYS execute the scripts/interpro_client.py helper script to query the database rather than accessing the database directly. The scripts automatically enforce fair use and implement retry logic.
  • For exploratory queries: ALWAYS use the CLI with a strict --limit. This allows you to rapidly understand the data schema without polluting your context window or fetching millions of results.
  • Output to file: Use the CLI with --output to output to a file rather than attempting to print it all to the console. Process the output using jq or code.
  • For more complex pipelines import the module natively into your Python scripts to consume the generator directly, preventing the need to deserialize CLI strings in large workflows.
  • Notification: If this skill is used, ensure this is mentioned in the output.

Examples:

uv run ./scripts/interpro_client.py fetch protein --source_db reviewed --limit 2 --query_params tax_id=9606 --output exploratory_results.jsonl
import sys
sys.path.append('scripts')
from interpro_client import fetch_interpro_data
import itertools

# fetch_interpro_data lazily yields results page-by-page
results = fetch_interpro_data(
    endpoint="entry",
    source_db="pfam",
    query_params={"page_size": 10}
)
for match in itertools.islice(results, 10):
    print(match["metadata"]["accession"])

4 Ways to Construct Endpoints:

The arguments strictly map to the four common API path constructions. Do not format your own / separated strings:

  1. /{endpoint} (e.g. /entry) uv run ./scripts/interpro_client.py fetch entry --limit 10 --output entries.jsonl
  2. /{endpoint}/{sourceDB} (e.g. /entry/pfam) uv run ./scripts/interpro_client.py fetch entry --source_db pfam --limit 10 --output pfam_entries.jsonl
  3. /{endpoint}/{sourceDB}/{accession} (e.g. /entry/pfam/PF00001) uv run ./scripts/interpro_client.py fetch entry --source_db pfam --accession PF00001 --limit 10 --output pf00001_entry.jsonl
  4. /{endpoint}/{sourceDB}/{linked_endpoint}/{sourceDB}/{accession} (e.g. /entry/interpro/protein/uniprot/P04637) uv run ./scripts/interpro_client.py fetch entry \ --source_db interpro \ --linked_endpoint protein \ --linked_source_db uniprot \ --linked_accession P04637 \ --limit 10 --output p04637_entries.jsonl

Valid Source Databases (--source_db)

Each endpoint only accepts specific source_db values. Using an invalid value returns a 404 error.

  • /entry (16 values): interpro, pfam, cathgene3d, ssf, panther, cdd, profile, smart, ncbifam, prosite, prints, hamap, pirsf, sfld, antifam.
  • /protein (3 values): uniprot (all), reviewed (SwissProt), unreviewed (TrEMBL).
  • /structure (1 value): pdb.
  • /taxonomy (1 value): uniprot.
  • /proteome (1 value): uniprot.
  • /set (2 values): pfam, cdd.

Quick Reference / Core Endpoints & Parameters

For a complete, exhaustive list of all query parameters, see the Full API Reference.

The API is fully open and supports 6 core endpoints. You can combine them using the linked parameters described above. Below is a nested list of the specific query parameters available for each endpoint:

  • /entry (Domain, family, active site, repeat, or homologous superfamily entries)

    • integrated: Filter by integrated status (e.g., pfam).
    • type: Filter by type (e.g., family, domain, homologous_superfamily).
    • go_term / go_category: Filter by Gene Ontology.
    • ida_search / ida_ignore / exact / ordered: Filter by domain architecture (see IDA Search section).
    • extra_fields: Request additional data (e.g., counters for match coordinates).
    • group_by / sort_by: Aggregate or sort results (valid values depend on context, see Full API Reference).
    • Example: uv run ./scripts/interpro_client.py count entry --source_db pfam --query_params type=domain --output count.jsonl
  • /protein (Protein records matching entries or domains)

    • tax_id: Filter by taxonomy ID (does not search lineage).
    • match_presence: Filter by proteins having InterPro matches (true/false).
    • is_fragment: Filter complete vs. fragment sequences.
    • group_by: Aggregate results (e.g., taxonomy).
    • extra_fields: Request sequence or match details.
    • isoforms / residues / structureinfo: Include specific sub-features.
    • conservation / extra_features: Append residue conservation flags or Mobidb/coil features (only valid for /protein/{source_db}/{accession}).
    • Example: uv run ./scripts/interpro_client.py fetch protein --source_db uniprot --limit 20 --query_params tax_id=9606 --output human_proteins.jsonl
  • /structure (PDB structures linked to InterPro entries)

    • experiment_type: Filter by experimental method (e.g., X-RAY DIFFRACTION).
    • resolution: Filter by resolution limit.
    • extra_fields: Include additional structural metadata.
    • group_by: Aggregate results.
    • Example: ./scripts/interpro_client.py fetch structure --source_db pdb --accession 1ATP --limit 10 --output 1atp_structures.jsonl
  • /taxonomy (Taxonomy distribution nodes)

    • key_species: Filter to limit to key species.
    • with_names: Include scientific names.
    • filter_by_entry / filter_by_entry_db: Filter intersection with specific entries.
    • extra_fields: Additional taxonomic metadata.
    • Example: ./scripts/interpro_client.py fetch taxonomy --source_db uniprot --accession 9606 --limit 10 --output human_taxonomy.jsonl
  • /proteome (Complete proteomes linked to InterPro)

    • extra_fields: General query expansion.
    • Example: uv run ./scripts/interpro_client.py fetch proteome --source_db uniprot --accession UP000005640 --limit 10 --output proteome.jsonl
  • /set (Curated sets of related entries, e.g., Pfam clans)

    • extra_fields: Additional metadata (only valid for /set/{sourceDB}).
    • Example: uv run ./scripts/interpro_client.py fetch set --source_db pfam --accession CL0001 --limit 10 --output pfam_clan.jsonl

InterPro Domain Architecture (IDA) Search

InterPro provides powerful tools for searching proteins by their domain architecture (the exact combination and order of domains). Because the API does not allow querying proteins directly by multiple domains at once (e.g., "give me proteins with PF00069 AND PF00017"), finding proteins with specific domain combinations requires a two-step process.

Step 1: Find matching architectures (ida_search)

The ida_search parameter is used on the root /entry endpoint to find all Domain Architectures (IDAs) containing the domains you specify.

  • Constraints:
    • Valid ONLY on the root /entry endpoint.
    • Cannot be combined with non-IDA parameters.
  • Modifiers (Only valid with ida_search):
    • ida_ignore: Ignores the given domains in the search (query param).
    • ordered: Ensures domains appear in the exact specified order (flag).
    • exact: Ensures the architecture matches exactly (no additional domains) (flag). Requires ordered flag to be present.

Example: Find architectures containing both a kinase domain (PF00069) and an SH2 domain (PF00017), in that exact order:

uv run scripts/interpro_client.py fetch entry
  --query_params ida_search=PF00069,PF00017
  --flags ordered exact
  --output architectures.jsonl

Note: This returns the architectures and their unique ida_ids, not all individual proteins.

Step 2: Fetch proteins for those architectures (ida)

Once you have the ida_ids (e.g., 619edbb...) from Step 1, you can fetch all the actual proteins that share that precise layout by filtering the /protein endpoint.

Constraints:

  • Valid on /protein and /entry/{sourceDB}/{accession} endpoints.

Example: Fetch proteins matching one of the architecture IDs from Step 1:

uv run scripts/interpro_client.py fetch protein
  --source_db uniprot
  --query_params ida=619edbb2b445bfa3ad51bd894e3c115b025a5f25
  --output matching_proteins.jsonl

(When building pipelines or querying comprehensively, you would loop through all the ida_ids from Step 1 and run Step 2 for each one).

InterPro Entry Types

Each InterPro entry is assigned a type indicating what you can infer when a protein matches the entry:

  • Domain: Distinct functional, structural or sequence units that may exist in a variety of biological contexts. Example: PH domain or classical C2H2 zinc finger.
  • Family: A group of proteins sharing a common evolutionary origin reflected by related functions, sequence similarities, or primary/secondary/tertiary structures.
  • Homologous Superfamily: Proteins sharing an evolutionary origin reflected by structural similarity but often displaying very low sequence similarity. Usually comprises signatures from the SUPERFAMILY and CATH-Gene3D databases.
  • Repeat: A short sequence that is typically repeated within a protein, often <50 amino acids long. Example: Leucine Rich Repeats or WD40 repeats.
  • Site: Includes Active site (sequence containing conserved residues for catalytic activity) and Binding site (sequence containing conserved residues forming a protein interaction site).

InterPro-N Predictions

InterPro-N is a deep-learning-based extension of the standard InterPro database. It utilizes an AI architecture inspired by computer vision to treat protein sequence annotation as a "panoptic segmentation" task, labeling residues and distinguishing between domains.

When to use InterPro-N

Standard InterPro signatures are the "gold standard" and should not be discarded in favor of InterPro-N predictions. Use InterPro-N primarily to fill in gaps or refine results.

In addition to standard InterPro:

  • Analyzing "Dark Matter" (Uncharacterised Proteins): Use when a protein returns no hits in standard InterPro. InterPro-N excels at identifying remote homologs.
  • Resolving Complex Repeats: Use for proteins with multiple tandem repeats (e.g., TPR or WD40) where standard HMMs might merge or miss them.
  • Predicting Discontinuous Domains: Use when a domain sequence is interrupted by a completely different inserted sequence.

Instead of standard InterPro (specific scenarios):

  • Precise Boundary Delineation: When you need more accurate start-and-stop coordinates for a domain than fuzzy standard hits provide.
  • Large-Scale Metagenomic Screening: For initial high-recovery screening of fragmented or highly divergent sequences.

Fallback Strategy: Checking InterPro-N

When you are asked to find annotations for a protein and standard InterPro queries return no results or no annotations, you MUST check InterPro-N as a fallback.

Example Scenario: If a user asks to "List the SSF annotations for protein X" and the standard query returns no hits, you should retry the query with the interpro_n flag.

This fallback is crucial because InterPro-N can identify remote homologs and domains in "dark matter" proteins that standard methods miss.

If found, ALWAYS report to the user that these annotations are deep learning predictions from InterPro-N.

How to Use

InterPro-N predictions are accessed by passing the interpro_n flag to the protein endpoint with uniprot as the source database.

Via CLI:

uv run ./scripts/interpro_client.py fetch protein
    --source_db uniprot
    --accession A0A096LNN2
    --flags interpro_n
    --output A0A096LNN2_interpro_n.jsonl

Via Python Pipeline:

results = fetch_interpro_data(
    endpoint="protein",
    source_db="uniprot",
    accession="A0A096LNN2",
    flags=["interpro_n"])

Strict Lookup Rules

  1. Always Use UniProt Accessions, NEVER Gene Names: When looking up proteins in InterPro, you MUST use their UniProt Accessions (e.g. P04637). InterPro does not natively support or reliably map gene names (e.g. TP53). If the user provides a gene name, you must use a database like Ensembl or UniProt first to resolve it to an accession.

  2. NEVER Iterate to Count: When asked for an aggregate count (e.g., "How many domains are there?"), you MUST read the count field from the initial API JSON response using the get_interpro_count() helper. NEVER iterate over the fetch_interpro_data generator to tally elements. Iterating over an endpoint with 50,000+ entries just to count them silently hangs the agent and abuses the API. Every time. No exceptions.

    Correct:

    Via CLI:

    uv run ./scripts/interpro_client.py count entry
        --source_db interpro
        --query_params type=domain
        --output count.json
    

    Via Python Pipeline:

    from interpro_client import get_interpro_count
    cnt = get_interpro_count(
        endpoint="entry",
        source_db="interpro",
        query_params={"type": "domain"},
    )
    

    Wrong (Iterating over fetch):

    # NEVER DO THIS:
    uv run ./scripts/interpro_client.py fetch entry
        --source_db interpro
        --query_params type=domain
        --output output.jsonl
        && wc -l output.jsonl
    

Quick examples

For detailed examples of the invocations and JSON output schemas returned by various endpoints, see the Example Responses Reference. This TSV contains command-line calls, Python equivalents, and the corresponding JSON payload structures.

1. Determining all protein domains

# Fetches InterPro Entries within UniProt protein P04637
# URL equivalent: /entry/interpro/protein/uniprot/P04637
uv run ./scripts/interpro_client.py fetch entry
    --source_db interpro
    --linked_endpoint protein
    --linked_source_db uniprot
    --linked_accession P04637
    --output p04637_domains.jsonl

2. Fetching all PDB structures for an Entry

# URL equivalent: /structure/pdb/entry/interpro/IPR011615
# Only fetch the first 5 structures
uv run ./scripts/interpro_client.py fetch structure
    --source_db pdb
    --linked_endpoint entry
    --linked_source_db interpro
    --linked_accession IPR011615
    --output ipr011615_structures.jsonl
how to use interpro-database

How to use interpro-database on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add interpro-database
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills add https://github.com/google-deepmind/science-skills --skill interpro-database

The skills CLI fetches interpro-database from GitHub repository google-deepmind/science-skills and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/interpro-database

Reload or restart Cursor to activate interpro-database. Access the skill through slash commands (e.g., /interpro-database) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

Task Automation & Efficiency

Automate repetitive workflows and reduce manual effort

Example

Generate reports, summarize documents, draft communications

Save 3-5 hours per week on routine tasks

Knowledge Enhancement

Learn new skills, understand complex topics, get expert guidance

Example

Explain concepts, provide examples, suggest learning resources

Accelerate learning and skill development by 2x

Quality Improvement

Enhance output quality through reviews, suggestions, and refinements

Example

Review drafts, suggest improvements, catch errors

Improve work quality by 30-40% with less effort

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client with skill support
  • Clear understanding of task or problem to solve
  • Willingness to iterate and refine outputs

Time Estimate

15-45 minutes depending on use case complexity

Installation Steps

  1. 1.Install skill using provided installation command
  2. 2.Test with simple use case relevant to your work
  3. 3.Evaluate output quality and relevance
  4. 4.Iterate on prompts to improve results
  5. 5.Integrate into regular workflow if valuable

Common Pitfalls

  • Expecting perfect results without iteration
  • Not providing enough context in prompts
  • Using skill for tasks outside its intended scope
  • Accepting outputs without review and validation

Best Practices

✓ Do

  • +Start with clear, specific prompts
  • +Provide relevant context and constraints
  • +Review and refine all outputs before using
  • +Iterate to improve output quality
  • +Document successful prompt patterns

✗ Don't

  • Don't use without understanding skill limitations
  • Don't skip validation of outputs
  • Don't share sensitive information in prompts
  • Don't expect skill to replace human judgment

💡 Pro Tips

  • Be specific about desired format and style
  • Ask for multiple options to choose from
  • Request explanations to understand reasoning
  • Combine AI efficiency with human expertise

When to Use This

✓ Use When

Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.

✗ Avoid When

Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.

Learning Path

  1. 1Familiarize yourself with skill capabilities and limitations
  2. 2Start with low-risk, non-critical tasks
  3. 3Progress to more complex and valuable use cases
  4. 4Build expertise through regular use and experimentation

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.642 reviews
  • Amelia Rao· Dec 20, 2024

    interpro-database has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Luis Gill· Dec 16, 2024

    Solid pick for teams standardizing on skills: interpro-database is focused, and the summary matches what you get after install.

  • Alexander Chen· Dec 8, 2024

    We added interpro-database from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Sophia Garcia· Nov 27, 2024

    Solid pick for teams standardizing on skills: interpro-database is focused, and the summary matches what you get after install.

  • Kabir Chawla· Nov 23, 2024

    interpro-database has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Arya Chen· Nov 11, 2024

    interpro-database fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Diya White· Nov 7, 2024

    We added interpro-database from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Diya Robinson· Oct 26, 2024

    interpro-database fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Sophia Johnson· Oct 18, 2024

    interpro-database has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Ren Rahman· Oct 14, 2024

    Solid pick for teams standardizing on skills: interpro-database is focused, and the summary matches what you get after install.

showing 1-10 of 42

1 / 5