Photo Content Recognition & Curation Expert
Expert in photo content analysis and intelligent curation. Combines classical computer vision with modern deep learning for comprehensive photo analysis.
When to Use This Skill
โ
Use for:
- Face recognition and clustering (identifying important people)
- Animal/pet detection and clustering
- Near-duplicate detection using perceptual hashing (DINOHash, pHash, dHash)
- Burst photo selection (finding best frame from 10-50 shots)
- Screenshot vs photo classification
- Meme/download filtering
- NSFW content detection
- Quick indexing for large photo libraries (10K+)
- Aesthetic quality scoring (NIMA)
โ NOT for:
- GPS-based location clustering โ
event-detection-temporal-intelligence-expert
- Color palette extraction โ
color-theory-palette-harmony-expert
- Semantic image-text matching โ
clip-aware-embeddings
- Video analysis or frame extraction
Quick Decision Tree
What do you need to recognize/filter?
โ
โโ Duplicate photos? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Perceptual Hashing
โ โโ Exact duplicates? โโโโโโโโโโโโโโโโโโโโโโโโโโโโ dHash (fastest)
โ โโ Brightness/contrast changes? โโโโโโโโโโโโโโโโโ pHash (DCT-based)
โ โโ Heavy crops/compression? โโโโโโโโโโโโโโโโโโโโโ DINOHash (2025 SOTA)
โ โโ Production system? โโโโโโโโโโโโโโโโโโโโโโโโโโโ Hybrid (pHash โ DINOHash)
โ
โโ People in photos? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Face Clustering
โ โโ Known thresholds? โโโโโโโโโโโโโโโโโโโโโโโโโโโโ Apple-style Agglomerative
โ โโ Unknown data distribution? โโโโโโโโโโโโโโโโโโโ HDBSCAN
โ
โโ Pets/Animals? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Pet Recognition
โ โโ Detection? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ YOLOv8
โ โโ Individual clustering? โโโโโโโโโโโโโโโโโโโโโโโ CLIP + HDBSCAN
โ
โโ Best from burst? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Burst Selection
โ โโ Score: sharpness + face quality + aesthetics
โ
โโ Filter junk? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Content Detection
โโ Screenshots? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Multi-signal classifier
โโ NSFW? โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Safety classifier
Core Concepts
1. Perceptual Hashing for Near-Duplicate Detection
Problem: Camera bursts, re-saved images, and minor edits create near-duplicates.
Solution: Perceptual hashes generate similar values for visually similar images.
Method Comparison:
| Method |
Speed |
Robustness |
Best For |
| dHash |
Fastest |
Low |
Exact duplicates |
| pHash |
Fast |
Medium |
Brightness/contrast changes |
| DINOHash |
Slower |
High |
Heavy crops, compression |
| Hybrid |
Medium |
Very High |
Production systems |
Hybrid Pipeline (2025 Best Practice):
- Stage 1: Fast pHash filtering (eliminates obvious non-duplicates)
- Stage 2: DINOHash refinement (accurate detection)
- Stage 3: Optional Siamese ViT verification
Hamming Distance Thresholds:
- Conservative: โค5 bits different = duplicates
- Aggressive: โค10 bits different = duplicates
โ Deep dive: references/perceptual-hashing.md
2. Face Recognition & Clustering
Goal: Group photos by person without user labeling.
Apple Photos Strategy (2021-2025):
- Extract face + upper body embeddings (FaceNet, 512-dim)
- Two-pass agglomerative clustering
- Conservative first pass (threshold=0.4, high precision)
- HAC second pass (threshold=0.6, increase recall)
- Incremental updates for new photos
HDBSCAN Alternative:
- No threshold tuning required
- Robust to noise
- Better for unknown data distributions
Parameters:
| Setting |
Agglomerative |
HDBSCAN |
| Pass 1 threshold |
0.4 (cosine) |
- |
| Pass 2 threshold |
0.6 (cosine) |
- |
| Min cluster size |
- |
3 photos |
| Metric |
cosine |
cosine |
โ Deep dive: references/face-clustering.md
3. Burst Photo Selection
Problem: Burst mode creates 10-50 nearly identical photos.
Multi-Criteria Scoring:
| Criterion |
Weight |
Measurement |
| Sharpness |
30% |
Laplacian variance |
| Face Quality |
35% |
Eyes open, smiling, face sharpness |
| Aesthetics |
20% |
NIMA score |
| Position |
10% |
Middle frames bonus |
| Exposure |
5% |
Histogram clipping check |
Burst Detection: Photos within 0.5 seconds of each other.
โ Deep dive: references/content-detection.md
4. Screenshot Detection
Multi-Signal Approach:
| Signal |
Confidence |
Description |
| UI elements |
0.85 |
Status bars, buttons detected |
| Perfect rectangles |
0.75 |
>5 UI buttons (90ยฐ angles) |
| High text |
0.70 |
>25% text coverage (OCR) |
| No camera EXIF |
0.60 |
Missing Make/Model/Lens |
| Device aspect |
0.60 |
Exact phone screen ratio |
| Perfect sharpness |
0.50 |
>2000 Laplacian variance |
Decision: Confidence >0.6 = screenshot
โ Deep dive: references/content-detection.md
5. Quick Indexing Pipeline
Goal: Index 10K+ photos efficiently with caching.
Features Extracted:
- Perceptual hashes (de-duplication)
- Face embeddings (people clustering)
- CLIP embeddings (semantic search)
- Color palettes
- Aesthetic scores
Performance (10K photos, M1 MacBook Pro):
| Operation |
Time |
| Perceptual hashing |
2 min |
| CLIP embeddings |
3 min (GPU) |
| Face detection |
4 min |
| Color palettes |
1 min |
| Aesthetic scoring |
2 min (GPU) |
| Clustering + dedup |
1 min |
| Total (first run) |
~13 min |
| Incremental |
<1 min |
โ Deep dive: references/photo-indexing.md
Common Anti-Patterns
Anti-Pattern: Euclidean Distance for Face Embeddings
What it looks like:
distance = np.linalg.norm(embedding1 - embedding2)
Why it's wrong: Face embeddings are normalized; cosine similarity is the correct metric.
What to do instead:
from scipy.spatial.distance import cosine
distance = cosine(embedding1, embedding2)
Anti-Pattern: Fixed Clustering Thresholds
What it looks like: Using same distance threshold for all face clusters.
Why it's wrong: Different people have varying intra-class variance (twins vs. diverse ages).
What to do instead: Use HDBSCAN for automatic threshold discovery, or two-pass clustering with conservative + relaxed passes.
Anti-Pattern: Raw Pixel Comparison for Duplicates
What it looks like:
is_duplicate = np.allclose(img1, img2)
Why it's wrong: Re-saved JPEGs, crops, brightness changes create pixel differences.
What to do instead: Perceptual hashing (pHash or DINOHash) with Hamming distance.
Anti-Pattern: Sequential Face Detection
What it looks like: Processing faces one photo at a time without batching.
Why it's wrong: GPU underutilization, 10x slower than batched.
What to do instead: Batch process images (batch_size=32) with GPU acceleration.
Anti-Pattern: No Confidence Filtering
What it looks like:
for face in all_detected_faces:
cluster(face)
Why it's wrong: Low-confidence detections create noise clusters (hands, objects).
What to do instead: Filter by confidence (threshold 0.9 for faces).
Anti-Pattern: Forcing Every Photo into Clusters
What it looks like: Assigning noise points to nearest cluster.
Why it's wrong: Solo appearances shouldn't pollute person clusters.
What to do instead: HDBSCAN/DBSCAN naturally identifies noise (label=-1). Keep noise separate.
Quick Start
from photo_curation import PhotoCurationPipeline
pipeline = PhotoCurationPipeline()
index = pipeline.index_library('/path/to/photos')
duplicates = index.find_duplicates()
print(f"Found {len(duplicates)} duplicate groups")
face_clusters = index.cluster_faces()
print(f"Found {len(face_clusters)} people")
best_photos = pipeline.select_best_from_bursts(index)
real_photos = pipeline.filter_screenshots(index)
collage_photos = pipeline.curate_for_collage(index, target_count=100)
Python Dependencies
torch transformers facenet-pytorch ultralytics hdbscan opencv-python scipy numpy scikit-learn pillow pytesseract
Integration Points
- event-detection-temporal-intelligence-expert: Provides temporal event clustering for event-aware curation
- color-theory-palette-harmony-expert: Extracts color palettes for visual diversity
- collage-layout-expert: Receives curated photos for assembly
- clip-aware-embeddings: Provides CLIP embeddings for semantic search and DeepDBSCAN
References
- DINOHash (2025): "Adversarially Fine-Tuned DINOv2 Features for Perceptual Hashing"
- Apple Photos (2021): "Recognizing People in Photos Through Private On-Device ML"
- HDBSCAN: "Hierarchical Density-Based Spatial Clustering" (2013-2025)
- Perceptual Hashing: dHash (Neal Krawetz), DCT-based pHash
Version: 2.0.0
Last Updated: November 2025