nowait-reasoning-optimizer

davila7/claude-code-templates · updated Apr 10, 2026

$npx skills add https://github.com/davila7/claude-code-templates --skill nowait-reasoning-optimizer
0 commentsdiscussion
summary

Implements the NOWAIT technique from the paper "Wait, We Don't Need to 'Wait'! Removing Thinking Tokens Improves Reasoning Efficiency" (Wang et al., 2025).

skill.md

NOWAIT Reasoning Optimizer

Implements the NOWAIT technique from the paper "Wait, We Don't Need to 'Wait'! Removing Thinking Tokens Improves Reasoning Efficiency" (Wang et al., 2025).

Overview

NOWAIT is a training-free inference-time intervention that suppresses self-reflection tokens (e.g., "Wait", "Hmm", "Alternatively") during generation, reducing chain-of-thought (CoT) trajectory length by 27-51% without compromising model utility.

When to Use

  • Deploying R1-style reasoning models with limited compute
  • Reducing inference latency for production systems
  • Optimizing token costs for reasoning tasks
  • Working with verbose CoT outputs that need streamlining

Supported Models

Model Series Type Token Reduction
QwQ-32B RL-based 16-31%
Phi4-Reasoning-Plus RL-based 23-28%
Qwen3-32B RL-based 13-16%
Kimi-VL-A3B Multimodal 40-60%
QvQ-72B-Preview Multimodal 20-30%

Important: NOWAIT works best with RL-based models. Distilled models (Qwen3-4B/8B/14B) show degraded performance when reflection tokens are suppressed.

Quick Start

1. Basic Implementation

from scripts.nowait_processor import NOWAITLogitProcessor

# Initialize processor for your model's tokenizer
processor = NOWAITLogitProcessor(tokenizer)

# Use during generation
outputs = model.generate(
    inputs,
    logits_processor=[processor],
    max_new_tokens=32768
)

2. Keywords Suppressed

See references/keywords.md for the complete list. Core keywords:

wait, alternatively, hmm, but, however, check, 
double-check, maybe, verify, again, oh, ah

How It Works

  1. Initialize Keywords: Identify reflection keywords from empirical analysis
  2. Expand to Token Variants: Map keywords to all token variants in vocabulary (e.g., "wait" → " wait", "Wait", " Wait", ".wait", "WAIT")
  3. Suppress During Inference: Set logits of reflection tokens to large negative values during decoding
Logits (Before)         Logits (After)
Wait     0.8     →     Wait     -inf
First    0.6     →     First    0.6
Hmm      0.5     →     Hmm      -inf
Let      0.4     →     Let      0.4

Key Findings

Why It Works

  • NOWAIT doesn't eliminate self-reflection entirely—it guides models to skip unnecessary "waiting" reasoning
  • Models still perform essential verification at key decision points
  • Results in more linear, straightforward reasoning paths

RL vs Distilled Models

Model Type NOWAIT Effect Recommendation
RL-based (QwQ, Phi4, Qwen3-32B) Stable accuracy, significant token reduction ✅ Recommended
Distilled (Qwen3-4B/8B/14B) Accuracy degradation on hard tasks ⚠️ Use with caution

Distilled models rely heavily on CoT structure from training data—removing reflection tokens disrupts their reasoning patterns.

Integration Examples

HuggingFace Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
from scripts.nowait_processor import NOWAITLogitProcessor

model = AutoModelForCausalLM.from_pretrained("Qwen/QwQ-32B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/QwQ-32B")

processor = NOWAITLogitProcessor(tokenizer)

response = model.generate(
    tokenizer(prompt, return_tensors="pt").input_ids,
    logits_processor=[processor],
    max_new_tokens=32768,
    do_sample=True,
    temperature=0.7
)

vLLM

from vllm import LLM, SamplingParams
from scripts.nowait_processor import get_nowait_bad_words_ids

llm = LLM(model="Qwen/QwQ-32B")
bad_words_ids = get_nowait_bad_words_ids(llm.get_tokenizer())

sampling_params = SamplingParams(
    max_tokens=32768,
    bad_words_ids=bad_words_ids
)

Expected Results

Task Type Original Tokens NOWAIT Tokens Reduction
Math (AIME) 15,000 10,500 30%
Visual QA (MMMU) 2,900 1,450 50%
Video QA (MMVU) 1,700 1,250 27%

Limitations

  • Less effective on very simple problems where CoT overhead is already minimal
  • Distilled models may suffer accuracy loss on challenging tasks
  • Some domains may require model-specific keyword tuning

References

  • Paper: arXiv:2506.08343v2
  • Complete keyword list: references/keywords.md
  • Implementation: scripts/nowait_processor.py

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.669 reviews
  • Yuki Khan· Dec 28, 2024

    Registry listing for nowait-reasoning-optimizer matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Dhruvi Jain· Dec 24, 2024

    Solid pick for teams standardizing on skills: nowait-reasoning-optimizer is focused, and the summary matches what you get after install.

  • Arjun Diallo· Dec 24, 2024

    nowait-reasoning-optimizer is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Sophia Martinez· Dec 24, 2024

    nowait-reasoning-optimizer fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Kiara Diallo· Dec 16, 2024

    We added nowait-reasoning-optimizer from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Naina Gupta· Nov 19, 2024

    Useful defaults in nowait-reasoning-optimizer — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Oshnikdeep· Nov 15, 2024

    We added nowait-reasoning-optimizer from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Noor Choi· Nov 15, 2024

    nowait-reasoning-optimizer has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Maya Huang· Nov 7, 2024

    Solid pick for teams standardizing on skills: nowait-reasoning-optimizer is focused, and the summary matches what you get after install.

  • Maya Rahman· Oct 26, 2024

    nowait-reasoning-optimizer has been reliable in day-to-day use. Documentation quality is above average for community skills.

showing 1-10 of 69

1 / 7