interpro-database▌
google-deepmind/science-skills · updated Jun 4, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
### Interpro Database
- ›name: "interpro-database"
- ›description: "Identify domains, families, and sites in proteins; find all proteins in a family or sharing a domain; explore species distribution for a domain; annotate genomes with protein families and GO terms. In..."
| name | interpro-database |
| description | > Identify domains, families, and sites in proteins; find all proteins in a family or sharing a domain; explore species distribution for a domain; annotate genomes with protein families and GO terms. InterPro combines 14 databases (e.g., Pfam, CDD) into one searchable resource. InterPro-N significantly expands annotation and sequence coverage with deep learning. Includes domain architecture (IDA) search. |
InterPro Database Access
Prerequisites
uv: Read theuvskill and follow its Setup instructions to ensureuvis installed and on PATH.- User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://www.ebi.ac.uk/interpro/ and https://www.ebi.ac.uk/about/terms-of-use/, then (2) create the file recording the notification text and timestamp.
Overview
InterPro combines signatures from multiple, diverse databases into a single searchable resource, reducing redundancy and helping users interpret their sequence analysis results. By uniting these member databases (e.g., Pfam, CDD, SMART), InterPro capitalises on their individual strengths to produce a powerful diagnostic tool and integrated resource.
Use interpro-database to:
- Identify what domains, families, and sites are found in a particular protein.
- Identify all proteins that belong to a protein family or contain a particular domain, even when the names and activities of the proteins are highly variable.
- Examine the species in which a particular protein family or domain is found.
- Annotate genomes with protein family information and Gene Ontology (GO) terms.
This skill provides a robust utility, interpro_client.py, to interact with the
InterPro API seamlessly. It natively handles rate limiting (HTTP 429),
background query sleep tracking (HTTP 408), terminal errors (HTTP 404/410), and
lazy pagination.
Core Rules
- Use the Wrapper: ALWAYS execute the
scripts/interpro_client.pyhelper script to query the database rather than accessing the database directly. The scripts automatically enforce fair use and implement retry logic. - For exploratory queries: ALWAYS use the CLI with a strict
--limit. This allows you to rapidly understand the data schema without polluting your context window or fetching millions of results. - Output to file: Use the CLI with --output to output to a file rather than attempting to print it all to the console. Process the output using jq or code.
- For more complex pipelines import the module natively into your Python scripts to consume the generator directly, preventing the need to deserialize CLI strings in large workflows.
- Notification: If this skill is used, ensure this is mentioned in the output.
Examples:
uv run ./scripts/interpro_client.py fetch protein --source_db reviewed --limit 2 --query_params tax_id=9606 --output exploratory_results.jsonl
import sys
sys.path.append('scripts')
from interpro_client import fetch_interpro_data
import itertools
# fetch_interpro_data lazily yields results page-by-page
results = fetch_interpro_data(
endpoint="entry",
source_db="pfam",
query_params={"page_size": 10}
)
for match in itertools.islice(results, 10):
print(match["metadata"]["accession"])
4 Ways to Construct Endpoints:
The arguments strictly map to the four common API path constructions. Do not
format your own / separated strings:
/{endpoint}(e.g./entry)uv run ./scripts/interpro_client.py fetch entry --limit 10 --output entries.jsonl/{endpoint}/{sourceDB}(e.g./entry/pfam)uv run ./scripts/interpro_client.py fetch entry --source_db pfam --limit 10 --output pfam_entries.jsonl/{endpoint}/{sourceDB}/{accession}(e.g./entry/pfam/PF00001)uv run ./scripts/interpro_client.py fetch entry --source_db pfam --accession PF00001 --limit 10 --output pf00001_entry.jsonl/{endpoint}/{sourceDB}/{linked_endpoint}/{sourceDB}/{accession}(e.g./entry/interpro/protein/uniprot/P04637)uv run ./scripts/interpro_client.py fetch entry \ --source_db interpro \ --linked_endpoint protein \ --linked_source_db uniprot \ --linked_accession P04637 \ --limit 10 --output p04637_entries.jsonl
Valid Source Databases (--source_db)
Each endpoint only accepts specific source_db values. Using an invalid value
returns a 404 error.
/entry(16 values):interpro,pfam,cathgene3d,ssf,panther,cdd,profile,smart,ncbifam,prosite,prints,hamap,pirsf,sfld,antifam./protein(3 values):uniprot(all),reviewed(SwissProt),unreviewed(TrEMBL)./structure(1 value):pdb./taxonomy(1 value):uniprot./proteome(1 value):uniprot./set(2 values):pfam,cdd.
Quick Reference / Core Endpoints & Parameters
For a complete, exhaustive list of all query parameters, see the Full API Reference.
The API is fully open and supports 6 core endpoints. You can combine them using the linked parameters described above. Below is a nested list of the specific query parameters available for each endpoint:
-
/entry(Domain, family, active site, repeat, or homologous superfamily entries)integrated: Filter by integrated status (e.g.,pfam).type: Filter by type (e.g.,family,domain,homologous_superfamily).go_term/go_category: Filter by Gene Ontology.ida_search/ida_ignore/exact/ordered: Filter by domain architecture (see IDA Search section).extra_fields: Request additional data (e.g.,countersfor match coordinates).group_by/sort_by: Aggregate or sort results (valid values depend on context, see Full API Reference).- Example:
uv run ./scripts/interpro_client.py count entry --source_db pfam --query_params type=domain --output count.jsonl
-
/protein(Protein records matching entries or domains)tax_id: Filter by taxonomy ID (does not search lineage).match_presence: Filter by proteins having InterPro matches (true/false).is_fragment: Filter complete vs. fragment sequences.group_by: Aggregate results (e.g.,taxonomy).extra_fields: Request sequence or match details.isoforms/residues/structureinfo: Include specific sub-features.conservation/extra_features: Append residue conservation flags or Mobidb/coil features (only valid for/protein/{source_db}/{accession}).- Example:
uv run ./scripts/interpro_client.py fetch protein --source_db uniprot --limit 20 --query_params tax_id=9606 --output human_proteins.jsonl
-
/structure(PDB structures linked to InterPro entries)experiment_type: Filter by experimental method (e.g.,X-RAY DIFFRACTION).resolution: Filter by resolution limit.extra_fields: Include additional structural metadata.group_by: Aggregate results.- Example:
./scripts/interpro_client.py fetch structure --source_db pdb --accession 1ATP --limit 10 --output 1atp_structures.jsonl
-
/taxonomy(Taxonomy distribution nodes)key_species: Filter to limit to key species.with_names: Include scientific names.filter_by_entry/filter_by_entry_db: Filter intersection with specific entries.extra_fields: Additional taxonomic metadata.- Example:
./scripts/interpro_client.py fetch taxonomy --source_db uniprot --accession 9606 --limit 10 --output human_taxonomy.jsonl
-
/proteome(Complete proteomes linked to InterPro)extra_fields: General query expansion.- Example:
uv run ./scripts/interpro_client.py fetch proteome --source_db uniprot --accession UP000005640 --limit 10 --output proteome.jsonl
-
/set(Curated sets of related entries, e.g., Pfam clans)extra_fields: Additional metadata (only valid for/set/{sourceDB}).- Example:
uv run ./scripts/interpro_client.py fetch set --source_db pfam --accession CL0001 --limit 10 --output pfam_clan.jsonl
InterPro Domain Architecture (IDA) Search
InterPro provides powerful tools for searching proteins by their domain architecture (the exact combination and order of domains). Because the API does not allow querying proteins directly by multiple domains at once (e.g., "give me proteins with PF00069 AND PF00017"), finding proteins with specific domain combinations requires a two-step process.
Step 1: Find matching architectures (ida_search)
The ida_search parameter is used on the root /entry endpoint to find all
Domain Architectures (IDAs) containing the domains you specify.
- Constraints:
- Valid ONLY on the root
/entryendpoint. - Cannot be combined with non-IDA parameters.
- Valid ONLY on the root
- Modifiers (Only valid with
ida_search):ida_ignore: Ignores the given domains in the search (query param).ordered: Ensures domains appear in the exact specified order (flag).exact: Ensures the architecture matches exactly (no additional domains) (flag). Requiresorderedflag to be present.
Example: Find architectures containing both a kinase domain (PF00069) and an SH2 domain (PF00017), in that exact order:
uv run scripts/interpro_client.py fetch entry
--query_params ida_search=PF00069,PF00017
--flags ordered exact
--output architectures.jsonl
Note: This returns the architectures and their unique ida_ids, not all
individual proteins.
Step 2: Fetch proteins for those architectures (ida)
Once you have the ida_ids (e.g., 619edbb...) from Step 1, you can fetch all
the actual proteins that share that precise layout by filtering the /protein
endpoint.
Constraints:
- Valid on
/proteinand/entry/{sourceDB}/{accession}endpoints.
Example: Fetch proteins matching one of the architecture IDs from Step 1:
uv run scripts/interpro_client.py fetch protein
--source_db uniprot
--query_params ida=619edbb2b445bfa3ad51bd894e3c115b025a5f25
--output matching_proteins.jsonl
(When building pipelines or querying comprehensively, you would loop through
all the ida_ids from Step 1 and run Step 2 for each one).
InterPro Entry Types
Each InterPro entry is assigned a type indicating what you can infer when a protein matches the entry:
- Domain: Distinct functional, structural or sequence units that may exist in a variety of biological contexts. Example: PH domain or classical C2H2 zinc finger.
- Family: A group of proteins sharing a common evolutionary origin reflected by related functions, sequence similarities, or primary/secondary/tertiary structures.
- Homologous Superfamily: Proteins sharing an evolutionary origin reflected by structural similarity but often displaying very low sequence similarity. Usually comprises signatures from the SUPERFAMILY and CATH-Gene3D databases.
- Repeat: A short sequence that is typically repeated within a protein, often <50 amino acids long. Example: Leucine Rich Repeats or WD40 repeats.
- Site: Includes
Active site(sequence containing conserved residues for catalytic activity) andBinding site(sequence containing conserved residues forming a protein interaction site).
InterPro-N Predictions
InterPro-N is a deep-learning-based extension of the standard InterPro database. It utilizes an AI architecture inspired by computer vision to treat protein sequence annotation as a "panoptic segmentation" task, labeling residues and distinguishing between domains.
When to use InterPro-N
Standard InterPro signatures are the "gold standard" and should not be discarded in favor of InterPro-N predictions. Use InterPro-N primarily to fill in gaps or refine results.
In addition to standard InterPro:
- Analyzing "Dark Matter" (Uncharacterised Proteins): Use when a protein returns no hits in standard InterPro. InterPro-N excels at identifying remote homologs.
- Resolving Complex Repeats: Use for proteins with multiple tandem repeats (e.g., TPR or WD40) where standard HMMs might merge or miss them.
- Predicting Discontinuous Domains: Use when a domain sequence is interrupted by a completely different inserted sequence.
Instead of standard InterPro (specific scenarios):
- Precise Boundary Delineation: When you need more accurate start-and-stop coordinates for a domain than fuzzy standard hits provide.
- Large-Scale Metagenomic Screening: For initial high-recovery screening of fragmented or highly divergent sequences.
Fallback Strategy: Checking InterPro-N
When you are asked to find annotations for a protein and standard InterPro queries return no results or no annotations, you MUST check InterPro-N as a fallback.
Example Scenario: If a user asks to "List the SSF annotations for protein X"
and the standard query returns no hits, you should retry the query with the
interpro_n flag.
This fallback is crucial because InterPro-N can identify remote homologs and domains in "dark matter" proteins that standard methods miss.
If found, ALWAYS report to the user that these annotations are deep learning predictions from InterPro-N.
How to Use
InterPro-N predictions are accessed by passing the interpro_n flag to the
protein endpoint with uniprot as the source database.
Via CLI:
uv run ./scripts/interpro_client.py fetch protein
--source_db uniprot
--accession A0A096LNN2
--flags interpro_n
--output A0A096LNN2_interpro_n.jsonl
Via Python Pipeline:
results = fetch_interpro_data(
endpoint="protein",
source_db="uniprot",
accession="A0A096LNN2",
flags=["interpro_n"])
Strict Lookup Rules
-
Always Use UniProt Accessions, NEVER Gene Names: When looking up proteins in InterPro, you MUST use their UniProt Accessions (e.g.
P04637). InterPro does not natively support or reliably map gene names (e.g.TP53). If the user provides a gene name, you must use a database like Ensembl or UniProt first to resolve it to an accession. -
NEVER Iterate to Count: When asked for an aggregate count (e.g., "How many domains are there?"), you MUST read the
countfield from the initial API JSON response using theget_interpro_count()helper. NEVER iterate over thefetch_interpro_datagenerator to tally elements. Iterating over an endpoint with 50,000+ entries just to count them silently hangs the agent and abuses the API. Every time. No exceptions.✅ Correct:
Via CLI:
uv run ./scripts/interpro_client.py count entry --source_db interpro --query_params type=domain --output count.jsonVia Python Pipeline:
from interpro_client import get_interpro_count cnt = get_interpro_count( endpoint="entry", source_db="interpro", query_params={"type": "domain"}, )❌ Wrong (Iterating over fetch):
# NEVER DO THIS: uv run ./scripts/interpro_client.py fetch entry --source_db interpro --query_params type=domain --output output.jsonl && wc -l output.jsonl
Quick examples
For detailed examples of the invocations and JSON output schemas returned by various endpoints, see the Example Responses Reference. This TSV contains command-line calls, Python equivalents, and the corresponding JSON payload structures.
1. Determining all protein domains
# Fetches InterPro Entries within UniProt protein P04637
# URL equivalent: /entry/interpro/protein/uniprot/P04637
uv run ./scripts/interpro_client.py fetch entry
--source_db interpro
--linked_endpoint protein
--linked_source_db uniprot
--linked_accession P04637
--output p04637_domains.jsonl
2. Fetching all PDB structures for an Entry
# URL equivalent: /structure/pdb/entry/interpro/IPR011615
# Only fetch the first 5 structures
uv run ./scripts/interpro_client.py fetch structure
--source_db pdb
--linked_endpoint entry
--linked_source_db interpro
--linked_accession IPR011615
--output ipr011615_structures.jsonl
How to use interpro-database on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add interpro-database
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches interpro-database from GitHub repository google-deepmind/science-skills and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate interpro-database. Access the skill through slash commands (e.g., /interpro-database) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
Task Automation & Efficiency
Automate repetitive workflows and reduce manual effort
Example
Generate reports, summarize documents, draft communications
Save 3-5 hours per week on routine tasks
Knowledge Enhancement
Learn new skills, understand complex topics, get expert guidance
Example
Explain concepts, provide examples, suggest learning resources
Accelerate learning and skill development by 2x
Quality Improvement
Enhance output quality through reviews, suggestions, and refinements
Example
Review drafts, suggest improvements, catch errors
Improve work quality by 30-40% with less effort
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client with skill support
- ›Clear understanding of task or problem to solve
- ›Willingness to iterate and refine outputs
Time Estimate
15-45 minutes depending on use case complexity
Installation Steps
- 1.Install skill using provided installation command
- 2.Test with simple use case relevant to your work
- 3.Evaluate output quality and relevance
- 4.Iterate on prompts to improve results
- 5.Integrate into regular workflow if valuable
Common Pitfalls
- ⚠Expecting perfect results without iteration
- ⚠Not providing enough context in prompts
- ⚠Using skill for tasks outside its intended scope
- ⚠Accepting outputs without review and validation
Best Practices▌
✓ Do
- +Start with clear, specific prompts
- +Provide relevant context and constraints
- +Review and refine all outputs before using
- +Iterate to improve output quality
- +Document successful prompt patterns
✗ Don't
- −Don't use without understanding skill limitations
- −Don't skip validation of outputs
- −Don't share sensitive information in prompts
- −Don't expect skill to replace human judgment
💡 Pro Tips
- ★Be specific about desired format and style
- ★Ask for multiple options to choose from
- ★Request explanations to understand reasoning
- ★Combine AI efficiency with human expertise
When to Use This▌
✓ Use When
Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.
✗ Avoid When
Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.
Learning Path▌
- 1Familiarize yourself with skill capabilities and limitations
- 2Start with low-risk, non-critical tasks
- 3Progress to more complex and valuable use cases
- 4Build expertise through regular use and experimentation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.6★★★★★42 reviews- ★★★★★Amelia Rao· Dec 20, 2024
interpro-database has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Luis Gill· Dec 16, 2024
Solid pick for teams standardizing on skills: interpro-database is focused, and the summary matches what you get after install.
- ★★★★★Alexander Chen· Dec 8, 2024
We added interpro-database from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Sophia Garcia· Nov 27, 2024
Solid pick for teams standardizing on skills: interpro-database is focused, and the summary matches what you get after install.
- ★★★★★Kabir Chawla· Nov 23, 2024
interpro-database has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Arya Chen· Nov 11, 2024
interpro-database fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Diya White· Nov 7, 2024
We added interpro-database from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Diya Robinson· Oct 26, 2024
interpro-database fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Sophia Johnson· Oct 18, 2024
interpro-database has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Ren Rahman· Oct 14, 2024
Solid pick for teams standardizing on skills: interpro-database is focused, and the summary matches what you get after install.
showing 1-10 of 42