Tuesday, June 23, 2026
Merged timeline of 267 items — blog publish times and listing timestamps, cut at midnight . Page 1 of 6.
- LLMMistral AIMistral OCR 4
Mistral OCR 4 extracts and structures content from documents, featuring bounding boxes, block classification, and inline confidence scores in 170 languages. It excels in multilingual document processing and is designed…
by Yash @ Explainx0 comments - LLMBaidu Inc.Unlimited OCR Works
Unlimited OCR is designed for one-shot long-horizon parsing of documents. It enhances the capabilities of previous OCR models, enabling efficient document processing.
by Yash @ Explainx0 comments - Skilldesignuser-experience
Apply UX thinking to improve product decisions and user flows.
by Yash @ Explainx0 comments - Skilltaotao-train-pose-classification
Pose classification using ST-GCN (Spatial Temporal Graph Convolutional Network). Classifies skeleton sequences
by Yash @ Explainx0 comments - Skilltilegymtilegym-adding-cutile-kernel
Add a new cuTile GPU kernel operator to TileGym. Covers dispatch registration in ops.py, cuTile backend implementation, __init__.py exports, test creation, and benchmark in tests/benchmark. Use when adding, creating, or…
by Yash @ Explainx0 comments - Skilltilegymtilegym-converting-cutile-to-julia
Converts cuTile Python GPU kernels (@ct.kernel) to cuTile.jl Julia equivalents. Handles kernel syntax translation, 0-indexed to 1-indexed conversion, broadcasting differences, memory layout (row-major to column-major),…
by Yash @ Explainx0 comments - Skilltaotao-train-sparse4d
Sparse4D for multi-camera temporal 3D object detection and tracking. Uses sparse queries with deformable
by Yash @ Explainx0 comments - Skilltaotao-validate-dataset-format
Run `tao-daft validate` to check NVIDIA TAO DAFT datasets for structure, schema, and cross-reference errors. Do
by Yash @ Explainx0 comments - Skilltaotao-train-single-step
Standard single-step train/eval/export workflow for any TAO model. Use when training a TAO model on a dataset
by Yash @ Explainx0 comments - Skilltilegymtilegym-cutile-python
Expert cuTile programming assistant. Write high-performance GPU kernels using cuTile's tile-based programming model with proper validation and optimization. Supports deep agent orchestration for complex multi-kernel tas…
by Yash @ Explainx0 comments - Skillvssvss-summarize-video
Use to summarize a recorded video via the LVS summarization microservice (HITL-gated) with a VLM fallback. Not for report generation or live RTSP captioning.
by Yash @ Explainx0 comments - Skilltilegymtilegym-cutile-autotuning
Use when adding, modifying, optimizing, or debugging CuTile autotuning code. Trigger signals: `exhaustive_search` / `replace_hints` / `hints_fn` / `cuda.tile.tune` in code, `autotune` in filenames, or correctness/perfor…
by Yash @ Explainx0 comments - Skillvssvss-setup-behavior-analytics
Use to deploy the vss-behavior-analytics service standalone (entrypoint, config-source, optional calibration). Not for the full warehouse deploy.
by Yash @ Explainx0 comments - Skillvssvss-setup-video-analytics-api
Use to deploy the vss-video-analytics-api REST service standalone (config-source, data-log bind, Elasticsearch, optional Kafka). Not for full warehouse deploy.
by Yash @ Explainx0 comments - Skillvssvss-search-archive
Use this skill to run top-level VSS fusion search on archived video, or to ingest video files / RTSP streams for search. Do NOT use for ad-hoc visual Q&A (use vss-ask-video), live captioning (use vss-deploy-dense-captio…
by Yash @ Explainx0 comments - Skillvssvss-query-analytics
Use this skill when reading video-analytics metrics, incidents, alerts, and sensor data via the VA-MCP server (port 9901). Not for live VLM or incident-range narrative reports.
by Yash @ Explainx0 comments - Skillvssvss-manage-video-io-storage
Use to call the VIOS REST API (sensor list, timelines, clip extraction, snapshots, add/delete sensors and streams). Not for VLM inference or search.
by Yash @ Explainx0 comments - Skillvssvss-manage-alerts
Use for VSS alert workflows — real-time monitoring, Alert-Bridge subscriptions, Slack notifications, incident queries, camera onboarding. Not for non-alert analytics.
by Yash @ Explainx0 comments - Skillvssvss-generate-video-calibration
Use to run AutoMagicCalib on local MP4s, RTSP, or the bundled sample dataset, and to deploy vss-auto-calibration when needed. Do not use for non-AMC calibration or runtime analytics.
by Yash @ Explainx0 comments - Skilltaotao-train-oneformer
OneFormer for universal image segmentation. Unifies panoptic, instance, and semantic segmentation with a
by Yash @ Explainx0 comments - Skilltaotao-train-visual-changenet
Visual ChangeNet for binary image classification and segmentation in AOI defect detection. Use when training,
by Yash @ Explainx0 comments - Skilltaotao-train-optical-inspection
Optical Inspection for defect detection using Siamese networks. Compares image pairs to detect manufacturing
by Yash @ Explainx0 comments - Skilltaotao-train-ocrnet
OCRNet for scene text recognition. Recognizes text content from cropped text-region images and supports CTC
by Yash @ Explainx0 comments - Skilltaotao-train-reid
Person re-identification (ReID). Learns discriminative embeddings to match the same person across different
by Yash @ Explainx0 comments - Skilltaotao-train-rtdetr
RT-DETR (Real-Time DEtection TRansformer) for 2D object detection. Designed for real-time inference with
by Yash @ Explainx0 comments - Skilltilegymtilegym-improve-cutile-kernel-perf
Iteratively optimize cuTile kernel performance through systematic profiling, bottleneck analysis, IR comparison, and targeted tuning. Covers tile sizes, occupancy, autotune configs, TMA, latency hints, persistent schedu…
by Yash @ Explainx0 comments - Skilltilegymtilegym-converting-cutile-to-triton
Converts cuTile GPU kernels (@ct.kernel) to Triton (@triton.jit). Handles standard in-repo conversion, debugging (cudaErrorIllegalAddress, shape mismatch, numerical mismatch), and mapping cuTile idioms (ct.load/ct.store…
by Yash @ Explainx0 comments - Skilltilegymtilegym-monkey-patch-kernels-to-transformers
Integrate TileGym kernels into Hugging Face `transformers` models by replacing the library's submodule(s) and certain class(es)' implementations, and patching certain class(es)' init/forward/load weight methods prior to…
by Yash @ Explainx0 comments - Skillvssvss-deploy-dense-captioning
Use this skill when deploying standalone RT-VLM dense captioning or calling its REST API (uploads, captions, streams, chat-completions, Kafka). Not for VSS profile deploy or video-search ingestion.
by Yash @ Explainx0 comments - Skillvssvss-ask-video
Use this skill to ask the VSS agent's video_understanding tool a fresh visual question about a recorded clip. Not for prior tool output, search hits, or metadata-answerable questions.
by Yash @ Explainx0 comments - Skillvssvss-deploy-profile
Use to select, configure, deploy, verify, debug, or tear down a VSS profile (base, search, lvs, warehouse, edge). Not for standalone microservices — use the vss-deploy-* skill.
by Yash @ Explainx0 comments - Skillvssvss-deploy-detection-tracking-2d
Use this skill when the user wants to deploy, run, debug, tear down, or call the REST API of the RTVI-CV 2D detection / tracking microservice. Trigger when the user says things like 'deploy rtvi-cv', 'start warehouse 2d…
by Yash @ Explainx0 comments - Skilltaotao-train-pointpillars
PointPillars for 3D object detection from LiDAR point clouds. Encodes point clouds into a pseudo-image via a
by Yash @ Explainx0 comments - Skillvssvss-generate-video-report
Use this skill when producing a VSS analysis report — Mode A per-clip VLM, Mode B incident-range via video-analytics. Not for standalone video summarization, real-time alerts or ad-hoc Q&A.
by Yash @ Explainx0 comments - Skilltaotao-train-segformer
SegFormer for semantic segmentation. Lightweight transformer-based architecture with hierarchical feature
by Yash @ Explainx0 comments - Skilltaotao-generate-referring-expressions
Four-step image referring-expression pipeline: turns images plus KITTI bounding-box labels into region
by Yash @ Explainx0 comments - Skilltaotao-train-ocdnet
OCDNet for scene text detection. Detects arbitrary-oriented text regions in natural images using a
by Yash @ Explainx0 comments - Skilltaotao-convert-dataset-format
Run `tao-daft convert` to convert NVIDIA TAO DAFT datasets between supported formats. Do not use for non-DAFT data.
by Yash @ Explainx0 comments - Skilltaotao-train-nvpanoptix3d
NVPanoptix3D for panoptic 3D scene reconstruction from posed RGB images. Produces 3D panoptic segmentation
by Yash @ Explainx0 comments - Skilltaotao-train-nvdinov2
NVDINOv2 for self-supervised visual representation learning. Trains vision transformers via self-distillation
by Yash @ Explainx0 comments - Skilltaotao-train-metric-learning-recognition
Metric-learning recognition (ml-recog) for fine-grained visual recognition. Learns embeddings for
by Yash @ Explainx0 comments - Skilltaotao-train-mask2former
Mask2Former for universal image segmentation (panoptic, instance, and semantic). Transformer-based with
by Yash @ Explainx0 comments - Skilltaotao-train-mask-grounding-dino
Mask Grounding DINO for grounded instance segmentation. Extends Grounding DINO with a mask-prediction head for
by Yash @ Explainx0 comments - Skilltaotao-train-mask-auto-label
MAL (Mask Auto-Label) for weakly-supervised segmentation. Produces segmentation masks from minimal annotations
by Yash @ Explainx0 comments - Skilltaotao-analyze-gaps-vlm-bcq
Extract false-positive and false-negative gaps from VLM binary-classification-question (BCQ, yes/no) predictions.
by Yash @ Explainx0 comments - Skilltaotao-generate-image-grounding
Two-step image grounding pipeline: extracts referring expressions from (image, caption) pairs and grounds them
by Yash @ Explainx0 comments - Skilltaotao-train-fast-foundation-stereo
Real-time stereo depth estimation using FastFoundationStereo (FFS), the distilled bp2 commercial variant of
by Yash @ Explainx0 comments - Skilltaotao-mine-aoi-images
Runs the DEFT embed-then-mine workflow for VCN AOI iterations — embeds the gap-analysis target parquet, embeds a source pool, and mines nearest-neighbour source images for downstream augmentation. Use as the immediate n…
by Yash @ Explainx0 comments