UMAP (Uniform Manifold Approximation and Projection) is a dimensionality reduction technique for visualization and general non-linear dimensionality reduction. Apply this skill for fast, scalable embeddings that preserve local and global structure, supervised learning, and clustering preprocessing.
Confirm successful installation by checking the skill directory location:
.cursor/skills/umap-learn
Restart Cursor to activate umap-learn. Access via /umap-learn in your agent's command palette.
β
Security Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your environment. Always review source, verify the publisher, and test in isolation before production.
UMAP (Uniform Manifold Approximation and Projection) is a dimensionality reduction technique for visualization and general non-linear dimensionality reduction. Apply this skill for fast, scalable embeddings that preserve local and global structure, supervised learning, and clustering preprocessing.
Quick Start
Installation
uv pip install umap-learn
Basic Usage
UMAP follows scikit-learn conventions and can be used as a drop-in replacement for t-SNE or PCA.
import umap
from sklearn.preprocessing import StandardScaler
# Prepare data (standardization is essential)scaled_data = StandardScaler().fit_transform(data)# Method 1: Single step (fit and transform)embedding = umap.UMAP().fit_transform(scaled_data)# Method 2: Separate steps (for reusing trained model)reducer = umap.UMAP(random_state=42)reducer.fit(scaled_data)embedding = reducer.embedding_ # Access the trained embedding
Critical preprocessing requirement: Always standardize features to comparable scales before applying UMAP to ensure equal weighting across dimensions.
Typical Workflow
import umap
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
# 1. Preprocess datascaler = StandardScaler()scaled_data = scaler.fit_transform(raw_data)# 2. Create and fit UMAPreducer = umap.UMAP( n_neighbors=15, min_dist=0.1, n_components=2, metric='euclidean', random_state=42)embedding = reducer.fit_transform(scaled_data)# 3. Visualizeplt.scatter(embedding[:,0], embedding[:,1], c=labels, cmap='Spectral', s=5)plt.colorbar()plt.title('UMAP Embedding')plt.show()
Parameter Tuning Guide
UMAP has four primary parameters that control the embedding behavior. Understanding these is crucial for effective usage.
n_neighbors (default: 15)
Purpose: Balances local versus global structure in the embedding.
How it works: Controls the size of the local neighborhood UMAP examines when learning manifold structure.
Effects by value:
Low values (2-5): Emphasizes fine local detail but may fragment data into disconnected components
Medium values (15-20): Balanced view of both local structure and global relationships (recommended starting point)
High values (50-200): Prioritizes broad topological structure at the expense of fine-grained details
Recommendation: Start with 15 and adjust based on results. Increase for more global structure, decrease for more local detail.
min_dist (default: 0.1)
Purpose: Controls how tightly points cluster in the low-dimensional space.
How it works: Sets the minimum distance apart that points are allowed to be in the output representation.
Effects by value:
Low values (0.0-0.1): Creates clumped embeddings useful for clustering; reveals fine topological details
High values (0.5-0.99): Prevents tight packing; emphasizes broad topological preservation over local structure
Recommendation: Use 0.0 for clustering applications, 0.1-0.3 for visualization, 0.5+ for loose structure.
n_components (default: 2)
Purpose: Determines the dimensionality of the embedded output space.
Key feature: Unlike t-SNE, UMAP scales well in the embedding dimension, enabling use beyond visualization.
Common uses:
2-3 dimensions: Visualization
5-10 dimensions: Clustering preprocessing (better preserves density than 2D)
10-50 dimensions: Feature engineering for downstream ML models
Recommendation: Use 2 for visualization, 5-10 for clustering, higher for ML pipelines.
metric (default: 'euclidean')
Purpose: Specifies how distance is calculated between input data points.
Custom metrics: User-defined distance functions via Numba
Recommendation: Use euclidean for numeric data, cosine for text/document vectors, hamming for binary data.
Parameter Tuning Example
# For visualization with emphasis on local structureumap.UMAP(n_neighbors=15, min_dist=0.1, n_components=2, metric='euclidean')# For clustering preprocessingumap.UMAP(n_neighbors=30, min_dist=0.0, n_components=10, metric='euclidean')# For document embeddingsumap.UMAP(n_neighbors=15, min_dist=0.1, n_components=2, metric='cosine')# For preserving global structureumap.UMAP(n_neighbors=100, min_dist=0.5, n_components=2, metric='euclidean')
Supervised and Semi-Supervised Dimension Reduction
UMAP supports incorporating label information to guide the embedding process, enabling class separation while preserving internal structure.
Supervised UMAP
Pass target labels via the y parameter when fitting:
βΊAccess to product documentation and roadmap tools (Jira, Notion, etc.)
βΊUnderstanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
βΊStakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Steps
1Install product management skill
2Start with user story generation for known feature
3Progress to competitive analysis: research 2-3 competitors
4Use for roadmap prioritization: apply RICE/ICE scoring
5Draft stakeholder communications and refine based on feedback
6Build template library for recurring PM tasks
7Share effective prompts with product team
Common Pitfalls
β Not validating competitive researchβverify facts before sharing
β Accepting user stories without involving engineering team
β Over-relying on frameworks without qualitative judgment
β Not customizing outputs to company culture and communication style
β Skipping stakeholder validation of generated requirements
Best Practices
β Do
+Validate research and competitive analysis with real data
+Collaborate with engineering when generating technical requirements
+Customize frameworks and templates to your company context
+Use skill for first drafts, refine with stakeholder input
+Document successful prompt patterns for PM tasks
+Combine AI efficiency with human judgment and intuition
β Don't
βDon't publish competitive analysis without fact-checking
βDon't finalize user stories without engineering review
βDon't make prioritization decisions solely on AI scoring
βDon't skip customer validation of generated requirements
βDon't ignore company-specific context and culture
π‘ Pro Tips
β Provide context: company goals, constraints, customer feedback
β Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
β Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
β Use skill for 70% generation + 30% customization to company needs
When to Use This
β Use when
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
β Avoid when
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path
1Basic: user stories, feature specs, status updates