tao-train-foundation-stereo
Stereo depth estimation using FoundationStereo. Predicts disparity maps from stereo image pairs for 3D
Works with
0
total installs
0
this week
1.7K
GitHub stars
0
upvotes
Install Skill
Run in your terminal
0
installs
0
this week
1.7K
stars
Installation Guide
How to use tao-train-foundation-stereo on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your machine
- ›Node.js 16+ with npm — verify with
node --version - ›Active project directory where you want to add
tao-train-foundation-stereo
Run the install command
Execute the skills CLI command in your project's root directory to begin installation:
Fetches tao-train-foundation-stereo from nvidia/skills and configures it for Cursor.
Select Cursor when prompted
The CLI shows a list of agents. Use arrow keys and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Restart Cursor to activate tao-train-foundation-stereo. Access via /tao-train-foundation-stereo in your agent's command palette.
Security Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your environment. Always review source, verify the publisher, and test in isolation before production.
Documentation
| name | tao-train-foundation-stereo |
| description | Stereo depth estimation using FoundationStereo. Predicts disparity maps from stereo image pairs for 3D reconstruction. Use when training, evaluating, exporting, or running inference for a TAO FoundationStereo model. Trigger phrases include "train stereo depth", "FoundationStereo", "stereo disparity estimation", "3D reconstruction from stereo". |
| license | Apache-2.0 |
| compatibility | Requires docker + nvidia-container-toolkit. |
| metadata | version: "0.1.0" author: NVIDIA Corporation |
| allowed-tools | Read Bash |
| tags | - stereo - depth - estimation |
Depth Net Stereo
Stereo depth estimation using FoundationStereo architecture. Predicts disparity maps from stereo image pairs for 3D reconstruction.
Uses pretrained Depth Anything v2 and EdgeNeXt encoders. Set model.stereo_backbone.depth_anything_v2_pretrained_path and model.stereo_backbone.edgenext_pretrained_path.
The mono and stereo skills both invoke the unified TAO depth_net CLI inside the container; the mono/stereo family is selected via model.model_type (e.g., FoundationStereo).
PyT actions packaged by this model skill: train, evaluate, inference, export, and quantize. The PyT depth_net entrypoint does not accept a gen_trt_engine action in the current TAO image; build TensorRT engines only through the deploy workflow.
For TAO Deploy TensorRT actions (gen_trt_engine, TensorRT evaluate, and TensorRT inference), read references/tao-deploy-foundation-stereo.md first. The deploy spec template lives in this skill's references/spec_template_deploy.yaml.
Train Action Policy
This model is AutoML-enabled at the model layer. Before handling any train-stage request, read references/skill_info.yaml and resolve the run override from either an explicit automl_policy value or the user's workflow request. Use automl_policy: on by default and only expose on / off in new launch prompts. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as automl_policy: off for this run only. When automl_policy: on, automl_enabled: true, and both schemas/train.schema.json and references/spec_template_train.yaml are packaged, route the train action through tao-skill-bank:tao-run-automl by default with this model's skill_dir. Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and automl_policy. Use direct model training only when automl_policy: off or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.
Non-train actions such as evaluate, inference, export, and deploy flows stay in this model skill. The per-run automl_policy override does not change model metadata.
Workflow
Prerequisites — data accessibility
Your dataset (left + right images + GT disparity) must be reachable from inside the container:
- SDK runner: place files at the S3 paths the runner resolves (the
S3_TRAIN/S3_EVALplaceholders shown in the spec overrides). The runner handles S3 → container-path mounting transparently. - Direct
docker run(e.g. local testing): mount the host dataset root read-only at the same in-container path:
docker run ... -v <host_data_root>:<host_data_root>:ro <container> ...
The same accessibility requirement applies to the <output_dir> written by all actions.
Step 1 — Annotation file
Per-line annotation file referenced by data_sources[*].data_file:
| Columns | Format | Use |
|---|---|---|
| 2 | <left> <right> | Stereo inference (no GT) |
| 3 | <left> <right> <disparity> | Stereo with GT |
| 4 | <left> <right> <disparity> <occlusion_mask> | Stereo with GT and occlusion mask |
If you already have one, point to it. Otherwise generate via depth_net convert:
depth_net convert -e <convert_spec.yaml>
convert_spec.yaml template (stereo):
results_dir: <directory where generated annotation files are written>
data_root: <directory whose immediate children are scene folders that contain your image+depth files; convert walks data_root recursively but expects per-scene subdirectories at one level below>
image_dir_pattern: [<substring matching left image paths>]
right_dir_pattern: [<substring matching right image paths>]
depth_dir_pattern: [<substring matching GT disparity paths>]
nocc_dir_pattern: [] # optional, occlusion mask paths
image_extension: '.png' # always include the leading dot
depth_extension: '.png' # form must match image_extension (the swap is a substring replace)
nocc_extension: ''
split_ratio: 0.0 # 0.0/1.0 = test-only; 0.8 = 80/20 train+val
convert walks data_root recursively, selects paths whose path-string contains all substrings in image_dir_pattern (AND-filter), then derives right / depth / mask paths by replacing image_dir_pattern[0] with the corresponding pattern's first element plus extension swap. Inspect your dataset's directory layout and identify the substrings distinguishing left, right, and GT (e.g. im0 vs im1 vs disp0GT for Middlebury).
Step 2 — Pair model_type and dataset_name based on your data
Prefer the dataset-specific class when your layout matches a supported one — it applies class-specific path conventions, evaluation crops, and (where applicable) occlusion-mask handling. Fall back to GenericDataset only for layouts that do not match any registered class.
| Data category | model_type | dataset_name |
|---|---|---|
| Middlebury data | FoundationStereo | Middlebury |
| KITTI data | FoundationStereo | Kitti |
| ETH3D data | FoundationStereo | Eth3d |
| FSD synthetic data | FoundationStereo | FSD |
| IsaacReal synthetic data | FoundationStereo | IsaacRealDataset |
| Crestereo synthetic data | FoundationStereo | Crestereo |
| Other / non-canonical layout | FoundationStereo | GenericDataset |
Valid dataset_name values for stereo data_sources (case-insensitive): FSD, IsaacRealDataset, Crestereo, Middlebury, Eth3d, Kitti, GenericDataset.
The same dataset_name value applies across train and evaluate actions (all of which use 3-column or 4-column annotations with GT disparity). The deploy-side evaluate action follows the same rule — see references/tao-deploy-foundation-stereo.md. For inference with 2-column annotations (left + right, no GT), use dataset_name: GenericDataset regardless of data layout — the dataset-specific classes (Middlebury / Kitti / Eth3d / FSD / IsaacRealDataset / Crestereo) require 3-column input and reject 2-column annotations at the dataloader level. For inference with 3-column annotations (left + right + GT), the dataset-specific class is fine.
Step 3 — Write spec yaml from the spec overrides
Copy the action block from references/spec-overrides-foundation-stereo.md. Replace:
model.model_typefrom Step 2 (typicallyFoundationStereo)dataset.<...>.data_sources[*].dataset_namefrom Step 2dataset.<...>.data_sources[*].data_filewith the path from Step 1- For deploy-side
evaluate: enforcedataset.test_dataset.batch_size: 1(seereferences/tao-deploy-foundation-stereo.md).
Shape consistency: the crop_size in dataset.test_dataset.augmentation.crop_size should match export.input_height / input_width so the trained-model evaluator and the deploy-side TensorRT evaluator operate at the same shape. Note that crop_size is decorative on the pyt evaluate path but authoritative on the deploy evaluate side — see references/troubleshooting-foundation-stereo.md and references/tao-deploy-foundation-stereo.md.
Fresh-install smoke runs are validated at crop_size: [128, 128] with dataset.max_disparity: 128 and model.max_disparity: 128. Avoid 112×112 crops and avoid setting max_disparity smaller than the square crop side for smoke tests: those combinations can fail inside FoundationStereo with feature-map or loss-mask shape mismatches before a checkpoint is produced.
Data source overrides are mandatory for every action. Each data_sources entry is a dict with two mandatory fields: data_file and dataset_name. See references/spec-overrides-foundation-stereo.md for the per-action dataset-requirements table, every action's override block, and the quantize known-issue note.
Step 4 — Run
Create writable home/cache directories inside the mounted output path before using
--user. Some TAO containers do not have an /etc/passwd entry for the host UID,
and PyTorch / matplotlib need writable cache paths when running as that UID.
mkdir -p <output_dir>/home \
<output_dir>/.cache/matplotlib \
<output_dir>/.cache/torchinductor \
<output_dir>/.cache/xdg
docker run --gpus 'device=0' --shm-size 16G --ipc=host \
--user "$(id -u):$(id -g)" \
-e USER="$(id -un)" \
-e LOGNAME="$(id -un)" \
-e HOME=<output_dir>/home \
-e MPLCONFIGDIR=<output_dir>/.cache/matplotlib \
-e TORCHINDUCTOR_CACHE_DIR=<output_dir>/.cache/torchinductor \
-e XDG_CACHE_HOME=<output_dir>/.cache/xdg \
-v <data_root>:<data_root>:ro \
-v <output_dir>:<output_dir> \
<container> \
depth_net <action> -e <spec.yaml>
Without --user "$(id -u):$(id -g)" the container writes outputs as nobody:nogroup, blocking host-side cleanup / retry.
Step 5 — Verify
- Container exit code 0
status.jsonkpiblock populated- For
train: inspect per-steptrain_lossdirectly (the entrypoint reportsExecution status: PASSeven when loss is NaN) - For
evaluate: rely onepe/bp1/bp2/bp3/d1/rmse(the evaluator also emitsabs_rel/sq_rel/rmse_logwhich are non-meaningful for stereo — seereferences/parameters-foundation-stereo.md) - For
inference: artifacts underresults_dir
For TAO Deploy TensorRT actions (gen_trt_engine, TensorRT evaluate, and TensorRT inference), read references/tao-deploy-foundation-stereo.md first. Deploy spec templates live in this skill's references/ folder with the spec_template_deploy_*.yaml prefix.
Training Requirements
- Monitoring metric: val/loss
- Eval dataset: optional. Val dataset configured via
dataset.val_dataset.data_sources(each entry needsdata_fileanddataset_name).
See references/spec-overrides-foundation-stereo.md for the per-action dataset-requirements table and every action's mandatory data-source override block.
Parameters, Metrics, Multi-GPU, Export/TRT, Hardware
See references/parameters-foundation-stereo.md for the full Important Parameters list (incl. model.encoder vits override, model.max_disparity default 416, model.volume_dim no-op note, dataset.baseline, dataset.focal_x, train.precision, export.batch_size), the Evaluation Metrics table, Multi-GPU / Multi-Node launch keys, Export / TRT Defaults (opset_version/on_cpu pairing, NGC 576×960 settings), and Hardware requirements.
Error Patterns and Troubleshooting
See references/troubleshooting-foundation-stereo.md for disparity overflow, smoke-test shape mismatch, missing pretrained paths, the encoder / dataset_name struct errors, the depth_net_stereo: not found entrypoint note, the pyt-vs-deploy crop_size discussion, and the deploy evaluate scalar-conversion failure.
Spec Param / Parent Model Inference
See references/checkpoint-inference-mappings-foundation-stereo.md for the checkpoint-resolution rules (model_epoch_<epoch>_step_<step>.pth, dn_model_latest.pth policy), the absence of parent PyT gen_trt_engine, and the full per-action inference-mapping table from depth_net_stereo.config.json (including parent_model / parent_job_id resolution).
Deployment
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases
Task Automation & Efficiency
Automate repetitive workflows and reduce manual effort
Example
Generate reports, summarize documents, draft communications
Save 3-5 hours per week on routine tasks
Knowledge Enhancement
Learn new skills, understand complex topics, get expert guidance
Example
Explain concepts, provide examples, suggest learning resources
Accelerate learning and skill development by 2x
Quality Improvement
Enhance output quality through reviews, suggestions, and refinements
Example
Review drafts, suggest improvements, catch errors
Improve work quality by 30-40% with less effort
Implementation Guide
Prerequisites
- ›Claude Desktop or compatible AI client with skill support
- ›Clear understanding of task or problem to solve
- ›Willingness to iterate and refine outputs
Time Estimate
15-45 minutes depending on use case complexity
Steps
- 1Install skill using provided installation command
- 2Test with simple use case relevant to your work
- 3Evaluate output quality and relevance
- 4Iterate on prompts to improve results
- 5Integrate into regular workflow if valuable
Common Pitfalls
- ⚠Expecting perfect results without iteration
- ⚠Not providing enough context in prompts
- ⚠Using skill for tasks outside its intended scope
- ⚠Accepting outputs without review and validation
Best Practices
✓ Do
- +Start with clear, specific prompts
- +Provide relevant context and constraints
- +Review and refine all outputs before using
- +Iterate to improve output quality
- +Document successful prompt patterns
✗ Don't
- −Don't use without understanding skill limitations
- −Don't skip validation of outputs
- −Don't share sensitive information in prompts
- −Don't expect skill to replace human judgment
💡 Pro Tips
- ★Be specific about desired format and style
- ★Ask for multiple options to choose from
- ★Request explanations to understand reasoning
- ★Combine AI efficiency with human expertise
When to Use This
✓ Use when
Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.
✗ Avoid when
Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.
Learning Path
- 1Familiarize yourself with skill capabilities and limitations
- 2Start with low-risk, non-critical tasks
- 3Progress to more complex and valuable use cases
- 4Build expertise through regular use and experimentation
Related Skills
dynamo-router-starter
0nvidia/skills
cuopt-install
0nvidia/skills
holoscan-install-container
0nvidia/skills
jetson-print-bsp-info
0nvidia/skills
jetson-memory-audit
0nvidia/skills
jetson-speculative-decoding
0nvidia/skills
Reviews
- AAva Thomas★★★★★Dec 28, 2024
tao-train-foundation-stereo is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- HHana Chawla★★★★★Dec 24, 2024
Keeps context tight: tao-train-foundation-stereo is the kind of skill you can hand to a new teammate without a long onboarding doc.
- CChaitanya Patil★★★★★Dec 20, 2024
Keeps context tight: tao-train-foundation-stereo is the kind of skill you can hand to a new teammate without a long onboarding doc.
- SSakura Menon★★★★★Dec 16, 2024
tao-train-foundation-stereo is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- LLuis Flores★★★★★Dec 12, 2024
tao-train-foundation-stereo reduced setup friction for our internal harness; good balance of opinion and flexibility.
- WWilliam Choi★★★★★Nov 19, 2024
tao-train-foundation-stereo reduced setup friction for our internal harness; good balance of opinion and flexibility.
- SSophia Liu★★★★★Nov 15, 2024
Registry listing for tao-train-foundation-stereo matched our evaluation — installs cleanly and behaves as described in the markdown.
- AAva Li★★★★★Nov 15, 2024
tao-train-foundation-stereo fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- PPiyush G★★★★★Nov 11, 2024
Registry listing for tao-train-foundation-stereo matched our evaluation — installs cleanly and behaves as described in the markdown.
- RRen Zhang★★★★★Nov 11, 2024
We added tao-train-foundation-stereo from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
showing 1-10 of 62
Discussion
Comments — not star reviews- No comments yet — start the thread.