ncbi-sequence-fetch▌
google-deepmind/science-skills · updated Jun 4, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
### Ncbi Sequence Fetch
- ›name: "ncbi-sequence-fetch"
- ›description: "Retrieve protein and nucleotide sequences from NCBI databases using E-utilities. Supports direct accession lookup, CDS translation, gene+organism search, locus lookup, PubMed-linked sequences, patent ..."
| name | ncbi-sequence-fetch |
| description | > Retrieve protein and nucleotide sequences from NCBI databases using E-utilities. Supports direct accession lookup, CDS translation, gene+organism search, locus lookup, PubMed-linked sequences, patent protein extraction, and organism+length fallback search. Use when you need to fetch biological sequences by accession, gene name, locus tag, PubMed ID, or patent number. |
NCBI Sequence Fetch
Prerequisites
-
uv: Read theuvskill and follow its Setup instructions to ensureuvis installed and on PATH. -
User Notification: If LICENSE_NOTIFICATION.txt does not already exist in this skill directory then (1) prominently notify the user to check the terms at https://www.ncbi.nlm.nih.gov/ and https://www.ncbi.nlm.nih.gov/home/about/policies/, then (2) create the file recording the notification text and timestamp.
-
.envfile: Make sure the.envfile exists in your home directory. Create one if it does not exist. -
NCBI_API_KEY(optional): Raises the NCBI rate limit from 3 to 10 requests/second. The skill works without it, but a key is recommended if the user plans many queries or encounters a 429 error. The user can obtain one for free by registering at https://www.ncbi.nlm.nih.gov/account/settings/. If the variable is missing from.env, do NOT ask the user to paste it into the chat (this would leak the key into the agent's context). Instead, give the user this command — substitutingENV_FILEwith the resolved literal path to the.envfile:printf "Enter NCBI API key (typing hidden): " && read -s key && echo && echo "NCBI_API_KEY=$key" >> "ENV_FILE" && echo "Saved."The scripts load credentials automatically via
dotenv. NEVER read, print, or inspect the.envfile or its variables (e.g. nocat,grep,echo,printenv, oros.environ.geton keys). Credentials must stay out of the agent's context.
Core Rules
- Use the Wrapper: ALWAYS execute the provided helper scripts to query the database rather than accessing the database directly. The scripts automatically enforce the required rate limit gracefully.
- API Key Support: If the user provides an
NCBI_API_KEYin their environment, the query speed limits are automatically increased significantly. - Notification: If this skill is used, ensure this is mentioned in the output.
Overview
Wraps NCBI's Entrez E-utilities (efetch, esearch, elink, esummary) for retrieving protein and nucleotide sequences. Provides 10 subcommands covering the full range of sequence retrieval workflows:
fetch-protein— Direct protein accession lookup (GenPept, RefSeq)fetch-nucleotide— Direct nucleotide accession lookupcds-translate— Fetch CDS and translate to protein (3 methods)search— Free-text search of any NCBI databaseelink— Follow cross-database links (PubMed→Protein, etc.)gene-protein— Search protein by gene name + organismlocus-protein— Search protein by locus tag + organismpubmed-proteins— Find proteins linked to a PubMed articlepatent-search— Extract protein sequences from patentsorganism-length— Last-resort search by organism + exact AA length
Utility Scripts
scripts/ncbi_fetch.py — Single script with subcommands.
All subcommands write structured JSON output. Use --output FILE to save to a
file, or omit it to print to stdout. A human-readable summary is always printed
to stdout.
1. Fetch Protein by Accession
Fetches protein FASTA from NCBI by accession (XP_, NP_, GenPept, etc.)
uv run scripts/ncbi_fetch.py fetch-protein XP_022033624 -o /tmp/result.json
uv run scripts/ncbi_fetch.py fetch-protein NP_001234567 ABC12345.1
2. Fetch Nucleotide by Accession
Fetches nucleotide FASTA from NCBI by accession.
uv run scripts/ncbi_fetch.py fetch-nucleotide MK034466 -o /tmp/result.json
3. CDS Translate
Fetches a CDS/nucleotide accession and translates to protein sequence. Tries
three approaches in order: 1. NCBI's pre-translated CDS protein (fasta_cds_aa)
2. GenBank XML CDS annotation translations 3. Raw nucleotide → 6-frame ORF
finding
uv run scripts/ncbi_fetch.py cds-translate MK034466 -o /tmp/result.json
uv run scripts/ncbi_fetch.py cds-translate HQ662330 --target-length 1043
If the accession is a genomic record (not mRNA/CDS), the tool will report
is_genomic: true so you can fall back to a homology-based approach instead.
4. Search Any Database
Free-text search using Entrez query syntax. Supports all NCBI databases.
# Search protein database
uv run scripts/ncbi_fetch.py search "WRR4B[Gene Name] AND Arabidopsis[Organism]" \
--database protein --retmax 5 --fetch-sequences
# Search nucleotide database
uv run scripts/ncbi_fetch.py search "Rz2[Gene Name] AND Beta vulgaris[Organism]" \
--database nuccore --retmax 10
# Search with patent filter
uv run scripts/ncbi_fetch.py search "disease resistance AND Solanum[Organism] AND patent[Properties]" \
--database protein --fetch-sequences
# Search by sequence length
uv run scripts/ncbi_fetch.py search '"Oryza sativa"[Organism] AND 1043[SLEN]' \
--database protein --fetch-sequences --retmax 50
5. Cross-Database Links (elink)
Follow NCBI's cross-database links (e.g., PubMed article → linked proteins).
uv run scripts/ncbi_fetch.py elink 24896089 --dbfrom pubmed --db protein \
--fetch-sequences -o /tmp/linked.json
6. Gene + Organism Search
Searches for protein sequences by gene name and organism. Searches NCBI Protein
with [Gene Name] and [Organism] qualifiers.
uv run scripts/ncbi_fetch.py gene-protein WRR4B --organism "Arabidopsis thaliana"
uv run scripts/ncbi_fetch.py gene-protein Pikh-2 --organism "Oryza sativa" \
--target-length 1043 -o /tmp/result.json
7. Locus Tag Search
Searches by locus tag in both NCBI Protein and Nuccore databases. Extracts CDS translations from GenBank XML when direct protein hits aren't available.
uv run scripts/ncbi_fetch.py locus-protein At1g56540 --organism "Arabidopsis thaliana"
uv run scripts/ncbi_fetch.py locus-protein Niben101Scf02422g02015.1 \
--organism "Nicotiana benthamiana" -o /tmp/result.json
8. PubMed-Linked Proteins
Finds protein sequences linked to a PubMed article. Searches NCBI Protein by PMID, follows elink PubMed→Protein, and extracts CDS translations from linked Nuccore records.
uv run scripts/ncbi_fetch.py pubmed-proteins 30692254 --identifier WRR4B
uv run scripts/ncbi_fetch.py pubmed-proteins 24896089 --identifier "K2" \
-o /tmp/result.json
9. Patent Sequence Search
Two modes:
By patent number — fetches all protein sequences from a specific patent:
bash uv run scripts/ncbi_fetch.py patent-search --patent-number US10123456 -o /tmp/patent.json
By keywords — searches NCBI Protein with patent[Properties] filter: bash uv run scripts/ncbi_fetch.py patent-search --keywords WRR4B Albugo --organism "Arabidopsis thaliana" -o /tmp/patent.json
[!IMPORTANT] Patent convention: In molecular biology patents, SEQ ID NO: 1 is typically the DNA sequence and SEQ ID NO: 2 is the primary protein. Higher SEQ ID NOs are variants or related sequences. Prefer Sequence 2 when selecting the primary protein of interest.
10. Organism + Length Search
Last-resort search when only organism and expected protein length are known.
Uses NCBI's [SLEN] filter for exact length matching.
uv run scripts/ncbi_fetch.py organism-length \
--organism "Arabidopsis thaliana" --length 1048 --retmax 50 \
-o /tmp/result.json
[!NOTE] This often returns multiple candidates. Use the JSON output headers to identify the correct protein.
Workflow
Standard Sequence Retrieval Cascade
When trying to find a protein sequence, follow this priority order:
- Direct accession —
fetch-proteinwith GenPept/RefSeq accession - CDS translation —
cds-translatewith nucleotide/CDS accession - PubMed-linked —
pubmed-proteinswith PMID + gene name - Locus lookup —
locus-proteinwith locus tag + organism - Gene + organism —
gene-proteinwith gene name + organism - Patent search —
patent-searchwith patent number or keywords - Organism + length —
organism-lengthas last resort
Interpreting Results
- All subcommands return JSON with a
resultsarray - Each result has
sequence(AA string),length, andheader/metadata - When multiple results are returned, select by:
- Closest match to expected length (
target_length) - Header relevance (matching gene name, "disease resistance" keywords)
- Source priority (RefSeq > GenPept > patent)
- Closest match to expected length (
Reference
- NCBI E-utilities docs: https://www.ncbi.nlm.nih.gov/books/NBK25499/
- Entrez search syntax: https://www.ncbi.nlm.nih.gov/books/NBK49540/
- Database list: protein, nuccore, gene, pubmed, pmc, biosample, etc.
- Common accession formats:
XP_/NP_— NCBI RefSeq proteinAAAtoAZZ+ digits — GenPept (translated GenBank)MK,MN,HQ, etc. + digits — GenBank nucleotideENSG,ENST,ENSP— Ensembl (useensembl-databaseskill instead)Q,P,O+ digits — UniProt (useuniprot-databaseskill instead)
How to use ncbi-sequence-fetch on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add ncbi-sequence-fetch
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches ncbi-sequence-fetch from GitHub repository google-deepmind/science-skills and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate ncbi-sequence-fetch. Access the skill through slash commands (e.g., /ncbi-sequence-fetch) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
Task Automation & Efficiency
Automate repetitive workflows and reduce manual effort
Example
Generate reports, summarize documents, draft communications
Save 3-5 hours per week on routine tasks
Knowledge Enhancement
Learn new skills, understand complex topics, get expert guidance
Example
Explain concepts, provide examples, suggest learning resources
Accelerate learning and skill development by 2x
Quality Improvement
Enhance output quality through reviews, suggestions, and refinements
Example
Review drafts, suggest improvements, catch errors
Improve work quality by 30-40% with less effort
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client with skill support
- ›Clear understanding of task or problem to solve
- ›Willingness to iterate and refine outputs
Time Estimate
15-45 minutes depending on use case complexity
Installation Steps
- 1.Install skill using provided installation command
- 2.Test with simple use case relevant to your work
- 3.Evaluate output quality and relevance
- 4.Iterate on prompts to improve results
- 5.Integrate into regular workflow if valuable
Common Pitfalls
- ⚠Expecting perfect results without iteration
- ⚠Not providing enough context in prompts
- ⚠Using skill for tasks outside its intended scope
- ⚠Accepting outputs without review and validation
Best Practices▌
✓ Do
- +Start with clear, specific prompts
- +Provide relevant context and constraints
- +Review and refine all outputs before using
- +Iterate to improve output quality
- +Document successful prompt patterns
✗ Don't
- −Don't use without understanding skill limitations
- −Don't skip validation of outputs
- −Don't share sensitive information in prompts
- −Don't expect skill to replace human judgment
💡 Pro Tips
- ★Be specific about desired format and style
- ★Ask for multiple options to choose from
- ★Request explanations to understand reasoning
- ★Combine AI efficiency with human expertise
When to Use This▌
✓ Use When
Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.
✗ Avoid When
Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.
Learning Path▌
- 1Familiarize yourself with skill capabilities and limitations
- 2Start with low-risk, non-critical tasks
- 3Progress to more complex and valuable use cases
- 4Build expertise through regular use and experimentation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.8★★★★★68 reviews- ★★★★★Li Ramirez· Dec 28, 2024
ncbi-sequence-fetch is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Amina Abebe· Dec 24, 2024
ncbi-sequence-fetch fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Ganesh Mohane· Dec 20, 2024
ncbi-sequence-fetch reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Min Johnson· Dec 12, 2024
Registry listing for ncbi-sequence-fetch matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Dev Reddy· Dec 8, 2024
Solid pick for teams standardizing on skills: ncbi-sequence-fetch is focused, and the summary matches what you get after install.
- ★★★★★Neel Taylor· Nov 27, 2024
We added ncbi-sequence-fetch from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Tariq Choi· Nov 15, 2024
ncbi-sequence-fetch has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Sakshi Patil· Nov 11, 2024
I recommend ncbi-sequence-fetch for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Neel Mehta· Nov 3, 2024
Useful defaults in ncbi-sequence-fetch — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Li Ndlovu· Oct 22, 2024
I recommend ncbi-sequence-fetch for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
showing 1-10 of 68