Tuesday, June 23, 2026

Merged timeline of 267 items — blog publish times and listing timestamps, cut at midnight UTC. Page 1 of 6.

← 2026-06-22 2026-06-24 →Calendar

LLM

Mistral AI

Mistral OCR 4 extracts and structures content from documents, featuring bounding boxes, block classification, and inline confidence scores in 170 languages. It excels in multilingual document processing and is designed…

by Yash @ Explainx0 comments

listed Jun 23, 15:58 UTC

LLMBaidu Inc.

Unlimited OCR Works

Unlimited OCR is designed for one-shot long-horizon parsing of documents. It enhances the capabilities of previous OCR models, enabling efficient document processing.

by Yash @ Explainx0 comments

listed Jun 23, 13:18 UTC

Skilldesign

user-experience

Apply UX thinking to improve product decisions and user flows.

by Yash @ Explainx0 comments

listed Jun 23, 13:09 UTC

Skilltao

tao-train-pose-classification

Pose classification using ST-GCN (Spatial Temporal Graph Convolutional Network). Classifies skeleton sequences

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltilegym

tilegym-adding-cutile-kernel

Add a new cuTile GPU kernel operator to TileGym. Covers dispatch registration in ops.py, cuTile backend implementation, __init__.py exports, test creation, and benchmark in tests/benchmark. Use when adding, creating, or…

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltilegym

tilegym-converting-cutile-to-julia

Converts cuTile Python GPU kernels (@ct.kernel) to cuTile.jl Julia equivalents. Handles kernel syntax translation, 0-indexed to 1-indexed conversion, broadcasting differences, memory layout (row-major to column-major),…

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-sparse4d

Sparse4D for multi-camera temporal 3D object detection and tracking. Uses sparse queries with deformable

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-validate-dataset-format

Run `tao-daft validate` to check NVIDIA TAO DAFT datasets for structure, schema, and cross-reference errors. Do

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-single-step

Standard single-step train/eval/export workflow for any TAO model. Use when training a TAO model on a dataset

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltilegym

tilegym-cutile-python

Expert cuTile programming assistant. Write high-performance GPU kernels using cuTile's tile-based programming model with proper validation and optimization. Supports deep agent orchestration for complex multi-kernel tas…

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-summarization

vss-summarize-video

Use to summarize a recorded video via the LVS summarization microservice (HITL-gated) with a VLM fallback. Not for report generation or live RTSP captioning.

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltilegym

tilegym-cutile-autotuning

Use when adding, modifying, optimizing, or debugging CuTile autotuning code. Trigger signals: `exhaustive_search` / `replace_hints` / `hints_fn` / `cuda.tile.tune` in code, `autotune` in filenames, or correctness/perfor…

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-setup

vss-setup-behavior-analytics

Use to deploy the vss-behavior-analytics service standalone (entrypoint, config-source, optional calibration). Not for the full warehouse deploy.

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-setup

vss-setup-video-analytics-api

Use to deploy the vss-video-analytics-api REST service standalone (config-source, data-log bind, Elasticsearch, optional Kafka). Not for full warehouse deploy.

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-search

vss-search-archive

Use this skill to run top-level VSS fusion search on archived video, or to ingest video files / RTSP streams for search. Do NOT use for ad-hoc visual Q&A (use vss-ask-video), live captioning (use vss-deploy-dense-captio…

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-deployment

vss-deploy-detection-tracking-3d

Deploy and operate the RTVI-CV-3D microservice as MV3DT (`MODE=mv3dt`): per-camera DeepStream perception plus BEV Fusion over calibrated cameras. Supports the bundled sample dataset, custom video files, and RTSP streams…

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-search

vss-query-analytics

Use this skill when reading video-analytics metrics, incidents, alerts, and sensor data via the VA-MCP server (port 9901). Not for live VLM or incident-range narrative reports.

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-management

vss-manage-video-io-storage

Use to call the VIOS REST API (sensor list, timelines, clip extraction, snapshots, add/delete sensors and streams). Not for VLM inference or search.

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-management

vss-manage-alerts

Use for VSS alert workflows — real-time monitoring, Alert-Bridge subscriptions, Slack notifications, incident queries, camera onboarding. Not for non-alert analytics.

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-generation

vss-generate-video-calibration

Use to run AutoMagicCalib on local MP4s, RTSP, or the bundled sample dataset, and to deploy vss-auto-calibration when needed. Do not use for non-AMC calibration or runtime analytics.

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-oneformer

OneFormer for universal image segmentation. Unifies panoptic, instance, and semantic segmentation with a

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-visual-changenet

Visual ChangeNet for binary image classification and segmentation in AOI defect detection. Use when training,

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-optical-inspection

Optical Inspection for defect detection using Siamese networks. Compares image pairs to detect manufacturing

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-ocrnet

OCRNet for scene text recognition. Recognizes text content from cropped text-region images and supports CTC

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-reid

Person re-identification (ReID). Learns discriminative embeddings to match the same person across different

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-rtdetr

RT-DETR (Real-Time DEtection TRansformer) for 2D object detection. Designed for real-time inference with

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltilegym

tilegym-improve-cutile-kernel-perf

Iteratively optimize cuTile kernel performance through systematic profiling, bottleneck analysis, IR comparison, and targeted tuning. Covers tile sizes, occupancy, autotune configs, TMA, latency hints, persistent schedu…

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltilegym

tilegym-converting-cutile-to-triton

Converts cuTile GPU kernels (@ct.kernel) to Triton (@triton.jit). Handles standard in-repo conversion, debugging (cudaErrorIllegalAddress, shape mismatch, numerical mismatch), and mapping cuTile idioms (ct.load/ct.store…

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltilegym

tilegym-monkey-patch-kernels-to-transformers

Integrate TileGym kernels into Hugging Face `transformers` models by replacing the library's submodule(s) and certain class(es)' implementations, and patching certain class(es)' init/forward/load weight methods prior to…

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-deployment

vss-deploy-dense-captioning

Use this skill when deploying standalone RT-VLM dense captioning or calling its REST API (uploads, captions, streams, chat-completions, Kafka). Not for VSS profile deploy or video-search ingestion.

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-query

vss-ask-video

Use this skill to ask the VSS agent's video_understanding tool a fresh visual question about a recorded clip. Not for prior tool output, search hits, or metadata-answerable questions.

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-deployment

vss-deploy-profile

Use to select, configure, deploy, verify, debug, or tear down a VSS profile (base, search, lvs, warehouse, edge). Not for standalone microservices — use the vss-deploy-* skill.

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-deployment

vss-deploy-detection-tracking-2d

Use this skill when the user wants to deploy, run, debug, tear down, or call the REST API of the RTVI-CV 2D detection / tracking microservice. Trigger when the user says things like 'deploy rtvi-cv', 'start warehouse 2d…

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-pointpillars

PointPillars for 3D object detection from LiDAR point clouds. Encodes point clouds into a pseudo-image via a

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-segformer

SegFormer for semantic segmentation. Lightweight transformer-based architecture with hierarchical feature

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-deployment

vss-deploy-video-embedding

Use this skill when deploying, operating, or integrating the VSS 3.2 GA RT-Embed Video Embedding microservice. Covers Docker Compose bring-up, GPU and storage prerequisites, the `/v1` REST API (file uploads, text and vi…

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skillvss-generation

vss-generate-video-report

Use this skill when producing a VSS analysis report — Mode A per-clip VLM, Mode B incident-range via video-analytics. Not for standalone video summarization, real-time alerts or ad-hoc Q&A.

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-generate-referring-expressions

Four-step image referring-expression pipeline: turns images plus KITTI bounding-box labels into region

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-ocdnet

OCDNet for scene text detection. Detects arbitrary-oriented text regions in natural images using a

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-convert-dataset-format

Run `tao-daft convert` to convert NVIDIA TAO DAFT datasets between supported formats. Do not use for non-DAFT data.

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-nvpanoptix3d

NVPanoptix3D for panoptic 3D scene reconstruction from posed RGB images. Produces 3D panoptic segmentation

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-nvdinov2

NVDINOv2 for self-supervised visual representation learning. Trains vision transformers via self-distillation

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-metric-learning-recognition

Metric-learning recognition (ml-recog) for fine-grained visual recognition. Learns embeddings for

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-mask2former

Mask2Former for universal image segmentation (panoptic, instance, and semantic). Transformer-based with

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-mask-grounding-dino

Mask Grounding DINO for grounded instance segmentation. Extends Grounding DINO with a mask-prediction head for

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-mask-auto-label

MAL (Mask Auto-Label) for weakly-supervised segmentation. Produces segmentation masks from minimal annotations

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-analyze-gaps-vlm-bcq

Extract false-positive and false-negative gaps from VLM binary-classification-question (BCQ, yes/no) predictions.

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-generate-image-grounding

Two-step image grounding pipeline: extracts referring expressions from (image, caption) pairs and grounds them

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-train-fast-foundation-stereo

Real-time stereo depth estimation using FastFoundationStereo (FFS), the distilled bp2 commercial variant of

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC

Skilltao

tao-mine-aoi-images

Runs the DEFT embed-then-mine workflow for VCN AOI iterations — embeds the gap-analysis target parquet, embeds a source pool, and mines nearest-neighbour source images for downstream augmentation. Use as the immediate n…

by Yash @ Explainx0 comments

listed Jun 23, 08:41 UTC