Scienceofficial

ncbi-sequence-fetch

google-deepmind/science-skills · updated Jun 4, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills add https://github.com/google-deepmind/science-skills --skill ncbi-sequence-fetch
0 commentsdiscussion
summary

### Ncbi Sequence Fetch

  • name: "ncbi-sequence-fetch"
  • description: "Retrieve protein and nucleotide sequences from NCBI databases using E-utilities. Supports direct accession lookup, CDS translation, gene+organism search, locus lookup, PubMed-linked sequences, patent ..."
skill.md
name
ncbi-sequence-fetch
description
> Retrieve protein and nucleotide sequences from NCBI databases using E-utilities. Supports direct accession lookup, CDS translation, gene+organism search, locus lookup, PubMed-linked sequences, patent protein extraction, and organism+length fallback search. Use when you need to fetch biological sequences by accession, gene name, locus tag, PubMed ID, or patent number.

NCBI Sequence Fetch

Prerequisites

  1. uv: Read the uv skill and follow its Setup instructions to ensure uv is installed and on PATH.

  2. User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://www.ncbi.nlm.nih.gov/ and https://www.ncbi.nlm.nih.gov/home/about/policies/, then (2) create the file recording the notification text and timestamp.

  3. .env file: Make sure the .env file exists in your home directory. Create one if it does not exist.

  4. NCBI_API_KEY (optional): Raises the NCBI rate limit from 3 to 10 requests/second. The skill works without it, but a key is recommended if the user plans many queries or encounters a 429 error. The user can obtain one for free by registering at https://www.ncbi.nlm.nih.gov/account/settings/. If the variable is missing from .env, do NOT ask the user to paste it into the chat (this would leak the key into the agent's context). Instead, give the user this command — substituting ENV_FILE with the resolved literal path to the .env file:

    printf "Enter NCBI API key (typing hidden): " && read -s key && echo && echo "NCBI_API_KEY=$key" >> "ENV_FILE" && echo "Saved."
    

    The scripts load credentials automatically via dotenv. NEVER read, print, or inspect the .env file or its variables (e.g. no cat, grep, echo, printenv, or os.environ.get on keys). Credentials must stay out of the agent's context.

Core Rules

  • Use the Wrapper: ALWAYS execute the provided helper scripts to query the database rather than accessing the database directly. The scripts automatically enforce the required rate limit gracefully.
  • API Key Support: If the user provides an NCBI_API_KEY in their environment, the query speed limits are automatically increased significantly.
  • Notification: If this skill is used, ensure this is mentioned in the output.

Overview

Wraps NCBI's Entrez E-utilities (efetch, esearch, elink, esummary) for retrieving protein and nucleotide sequences. Provides 10 subcommands covering the full range of sequence retrieval workflows:

  • fetch-protein — Direct protein accession lookup (GenPept, RefSeq)
  • fetch-nucleotide — Direct nucleotide accession lookup
  • cds-translate — Fetch CDS and translate to protein (3 methods)
  • search — Free-text search of any NCBI database
  • elink — Follow cross-database links (PubMed→Protein, etc.)
  • gene-protein — Search protein by gene name + organism
  • locus-protein — Search protein by locus tag + organism
  • pubmed-proteins — Find proteins linked to a PubMed article
  • patent-search — Extract protein sequences from patents
  • organism-length — Last-resort search by organism + exact AA length

Utility Scripts

scripts/ncbi_fetch.py — Single script with subcommands.

All subcommands write structured JSON output. Use --output FILE to save to a file, or omit it to print to stdout. A human-readable summary is always printed to stdout.

1. Fetch Protein by Accession

Fetches protein FASTA from NCBI by accession (XP_, NP_, GenPept, etc.)

uv run scripts/ncbi_fetch.py fetch-protein XP_022033624 -o /tmp/result.json
uv run scripts/ncbi_fetch.py fetch-protein NP_001234567 ABC12345.1

2. Fetch Nucleotide by Accession

Fetches nucleotide FASTA from NCBI by accession.

uv run scripts/ncbi_fetch.py fetch-nucleotide MK034466 -o /tmp/result.json

3. CDS Translate

Fetches a CDS/nucleotide accession and translates to protein sequence. Tries three approaches in order: 1. NCBI's pre-translated CDS protein (fasta_cds_aa) 2. GenBank XML CDS annotation translations 3. Raw nucleotide → 6-frame ORF finding

uv run scripts/ncbi_fetch.py cds-translate MK034466 -o /tmp/result.json
uv run scripts/ncbi_fetch.py cds-translate HQ662330 --target-length 1043

If the accession is a genomic record (not mRNA/CDS), the tool will report is_genomic: true so you can fall back to a homology-based approach instead.

4. Search Any Database

Free-text search using Entrez query syntax. Supports all NCBI databases.

# Search protein database
uv run scripts/ncbi_fetch.py search "WRR4B[Gene Name] AND Arabidopsis[Organism]" \
  --database protein --retmax 5 --fetch-sequences

# Search nucleotide database
uv run scripts/ncbi_fetch.py search "Rz2[Gene Name] AND Beta vulgaris[Organism]" \
  --database nuccore --retmax 10

# Search with patent filter
uv run scripts/ncbi_fetch.py search "disease resistance AND Solanum[Organism] AND patent[Properties]" \
  --database protein --fetch-sequences

# Search by sequence length
uv run scripts/ncbi_fetch.py search '"Oryza sativa"[Organism] AND 1043[SLEN]' \
  --database protein --fetch-sequences --retmax 50

5. Cross-Database Links (elink)

Follow NCBI's cross-database links (e.g., PubMed article → linked proteins).

uv run scripts/ncbi_fetch.py elink 24896089 --dbfrom pubmed --db protein \
  --fetch-sequences -o /tmp/linked.json

6. Gene + Organism Search

Searches for protein sequences by gene name and organism. Searches NCBI Protein with [Gene Name] and [Organism] qualifiers.

uv run scripts/ncbi_fetch.py gene-protein WRR4B --organism "Arabidopsis thaliana"
uv run scripts/ncbi_fetch.py gene-protein Pikh-2 --organism "Oryza sativa" \
  --target-length 1043 -o /tmp/result.json

7. Locus Tag Search

Searches by locus tag in both NCBI Protein and Nuccore databases. Extracts CDS translations from GenBank XML when direct protein hits aren't available.

uv run scripts/ncbi_fetch.py locus-protein At1g56540 --organism "Arabidopsis thaliana"
uv run scripts/ncbi_fetch.py locus-protein Niben101Scf02422g02015.1 \
  --organism "Nicotiana benthamiana" -o /tmp/result.json

8. PubMed-Linked Proteins

Finds protein sequences linked to a PubMed article. Searches NCBI Protein by PMID, follows elink PubMed→Protein, and extracts CDS translations from linked Nuccore records.

uv run scripts/ncbi_fetch.py pubmed-proteins 30692254 --identifier WRR4B
uv run scripts/ncbi_fetch.py pubmed-proteins 24896089 --identifier "K2" \
  -o /tmp/result.json

9. Patent Sequence Search

Two modes:

By patent number — fetches all protein sequences from a specific patent: bash uv run scripts/ncbi_fetch.py patent-search --patent-number US10123456 -o /tmp/patent.json

By keywords — searches NCBI Protein with patent[Properties] filter: bash uv run scripts/ncbi_fetch.py patent-search --keywords WRR4B Albugo --organism "Arabidopsis thaliana" -o /tmp/patent.json

[!IMPORTANT] Patent convention: In molecular biology patents, SEQ ID NO: 1 is typically the DNA sequence and SEQ ID NO: 2 is the primary protein. Higher SEQ ID NOs are variants or related sequences. Prefer Sequence 2 when selecting the primary protein of interest.

10. Organism + Length Search

Last-resort search when only organism and expected protein length are known. Uses NCBI's [SLEN] filter for exact length matching.

uv run scripts/ncbi_fetch.py organism-length \
  --organism "Arabidopsis thaliana" --length 1048 --retmax 50 \
  -o /tmp/result.json

[!NOTE] This often returns multiple candidates. Use the JSON output headers to identify the correct protein.

Workflow

Standard Sequence Retrieval Cascade

When trying to find a protein sequence, follow this priority order:

  1. Direct accessionfetch-protein with GenPept/RefSeq accession
  2. CDS translationcds-translate with nucleotide/CDS accession
  3. PubMed-linkedpubmed-proteins with PMID + gene name
  4. Locus lookuplocus-protein with locus tag + organism
  5. Gene + organismgene-protein with gene name + organism
  6. Patent searchpatent-search with patent number or keywords
  7. Organism + lengthorganism-length as last resort

Interpreting Results

  • All subcommands return JSON with a results array
  • Each result has sequence (AA string), length, and header/metadata
  • When multiple results are returned, select by:
    • Closest match to expected length (target_length)
    • Header relevance (matching gene name, "disease resistance" keywords)
    • Source priority (RefSeq > GenPept > patent)

Reference

  • NCBI E-utilities docs: https://www.ncbi.nlm.nih.gov/books/NBK25499/
  • Entrez search syntax: https://www.ncbi.nlm.nih.gov/books/NBK49540/
  • Database list: protein, nuccore, gene, pubmed, pmc, biosample, etc.
  • Common accession formats:
    • XP_ / NP_ — NCBI RefSeq protein
    • AAA to AZZ + digits — GenPept (translated GenBank)
    • MK, MN, HQ, etc. + digits — GenBank nucleotide
    • ENSG, ENST, ENSP — Ensembl (use ensembl-database skill instead)
    • Q, P, O + digits — UniProt (use uniprot-database skill instead)
how to use ncbi-sequence-fetch

How to use ncbi-sequence-fetch on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add ncbi-sequence-fetch
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills add https://github.com/google-deepmind/science-skills --skill ncbi-sequence-fetch

The skills CLI fetches ncbi-sequence-fetch from GitHub repository google-deepmind/science-skills and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/ncbi-sequence-fetch

Reload or restart Cursor to activate ncbi-sequence-fetch. Access the skill through slash commands (e.g., /ncbi-sequence-fetch) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

Task Automation & Efficiency

Automate repetitive workflows and reduce manual effort

Example

Generate reports, summarize documents, draft communications

Save 3-5 hours per week on routine tasks

Knowledge Enhancement

Learn new skills, understand complex topics, get expert guidance

Example

Explain concepts, provide examples, suggest learning resources

Accelerate learning and skill development by 2x

Quality Improvement

Enhance output quality through reviews, suggestions, and refinements

Example

Review drafts, suggest improvements, catch errors

Improve work quality by 30-40% with less effort

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client with skill support
  • Clear understanding of task or problem to solve
  • Willingness to iterate and refine outputs

Time Estimate

15-45 minutes depending on use case complexity

Installation Steps

  1. 1.Install skill using provided installation command
  2. 2.Test with simple use case relevant to your work
  3. 3.Evaluate output quality and relevance
  4. 4.Iterate on prompts to improve results
  5. 5.Integrate into regular workflow if valuable

Common Pitfalls

  • Expecting perfect results without iteration
  • Not providing enough context in prompts
  • Using skill for tasks outside its intended scope
  • Accepting outputs without review and validation

Best Practices

✓ Do

  • +Start with clear, specific prompts
  • +Provide relevant context and constraints
  • +Review and refine all outputs before using
  • +Iterate to improve output quality
  • +Document successful prompt patterns

✗ Don't

  • Don't use without understanding skill limitations
  • Don't skip validation of outputs
  • Don't share sensitive information in prompts
  • Don't expect skill to replace human judgment

💡 Pro Tips

  • Be specific about desired format and style
  • Ask for multiple options to choose from
  • Request explanations to understand reasoning
  • Combine AI efficiency with human expertise

When to Use This

✓ Use When

Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.

✗ Avoid When

Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.

Learning Path

  1. 1Familiarize yourself with skill capabilities and limitations
  2. 2Start with low-risk, non-critical tasks
  3. 3Progress to more complex and valuable use cases
  4. 4Build expertise through regular use and experimentation

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.868 reviews
  • Li Ramirez· Dec 28, 2024

    ncbi-sequence-fetch is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Amina Abebe· Dec 24, 2024

    ncbi-sequence-fetch fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Ganesh Mohane· Dec 20, 2024

    ncbi-sequence-fetch reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Min Johnson· Dec 12, 2024

    Registry listing for ncbi-sequence-fetch matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Dev Reddy· Dec 8, 2024

    Solid pick for teams standardizing on skills: ncbi-sequence-fetch is focused, and the summary matches what you get after install.

  • Neel Taylor· Nov 27, 2024

    We added ncbi-sequence-fetch from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Tariq Choi· Nov 15, 2024

    ncbi-sequence-fetch has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Sakshi Patil· Nov 11, 2024

    I recommend ncbi-sequence-fetch for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Neel Mehta· Nov 3, 2024

    Useful defaults in ncbi-sequence-fetch — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Li Ndlovu· Oct 22, 2024

    I recommend ncbi-sequence-fetch for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

showing 1-10 of 68

1 / 7