Confirm successful installation by checking the skill directory location:
.cursor/skills/rwkv-architecture
Restart Cursor to activate rwkv-architecture. Access via /rwkv-architecture in your agent's command palette.
β
Security Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your environment. Always review source, verify the publisher, and test in isolation before production.
import os
from rwkv.model import RWKV
os.environ["RWKV_JIT_ON"]='1'os.environ["RWKV_CUDA_ON"]='1'# Use CUDA kernel for speed# Load modelmodel = RWKV( model='/path/to/RWKV-4-Pile-1B5-20220903-8040', strategy='cuda fp16')# GPT mode (parallel processing)out, state = model.forward([187,510,1563,310,247],None)print(out.detach().cpu().numpy())# Logits# RNN mode (sequential processing, same result)out, state = model.forward([187,510],None)# First 2 tokensout, state = model.forward([1563], state)# Next tokenout, state = model.forward([310,247], state)# Last tokensprint(out.detach().cpu().numpy())# Same logits as above!
Common workflows
Workflow 1: Text generation (streaming)
Efficient token-by-token generation:
from rwkv.model import RWKV
from rwkv.utils import PIPELINE
model = RWKV(model='RWKV-4-Pile-14B-20230313-ctx8192-test1050', strategy='cuda fp16')pipeline = PIPELINE(model,"20B_tokenizer.json")# Initial promptprompt ="The future of AI is"state =None# Generate token by tokenfor token in prompt: out, state = pipeline.model.forward(pipeline.encode(token), state)# Continue generationfor _ inrange(100): out, state = pipeline.model.forward(None, state) token = pipeline.sample_logits(out)print(pipeline.decode(token), end='', flush=True)
Key advantage: Constant memory per token (no growing KV cache)
Workflow 2: Long context processing (infinite context)
Process million-token sequences:
model = RWKV(model='RWKV-4-Pile-14B', strategy='cuda fp16')# Process very long documentstate =Nonelong_document = load_document()# e.g., 1M tokens# Stream through entire documentfor chunk in chunks(long_document, chunk_size=1024): out, state = model.forward(chunk, state)# State now contains information from entire 1M token document# Memory usage: O(1) (constant, not O(n)!)
Workflow 3: Fine-tuning RWKV
Standard fine-tuning workflow:
# Training scriptimport pytorch_lightning as pl
from rwkv.model import RWKV
from rwkv.trainer import RWKVTrainer
# Configure modelconfig ={'n_layer':24,'n_embd':1024,'vocab_size':50277,'ctx_len':1024}# Setup trainertrainer = pl.Trainer( accelerator='gpu', devices=8, precision='bf16', strategy='deepspeed_stage_2', max_epochs=1)# Trainmodel = RWKV(config)trainer.fit(model, train_dataloader)
# Transformer: O(n) per token (quadratic overall)# First token: 1 computation# Second token: 2 computations# ...# 1000th token: 1000 computations# RWKV: O(1) per token (linear overall)# Every token: 1 computation# 1000th token: 1 computation (same as first!)
When to use vs alternatives
Use RWKV when:
Need very long context (100K+ tokens)
Want constant memory usage
Building streaming applications
Need RNN efficiency with Transformer performance
Memory-constrained deployment
Key advantages:
Linear time: O(n) vs O(nΒ²) for Transformers
No KV cache: Constant memory per token
Infinite context: No fixed window limit
Parallelizable training: Like GPT
Sequential inference: Like RNN
Use alternatives instead:
Transformers: Need absolute best performance, have compute
Mamba: Want state-space models
RetNet: Need retention mechanism
Hyena: Want convolution-based approach
Common issues
Issue: Out of memory during training
Use gradient checkpointing and DeepSpeed:
trainer = pl.Trainer( strategy='deepspeed_stage_3',# Full ZeRO-3 precision='bf16')
Issue: Slow inference
Enable CUDA kernel:
os.environ["RWKV_CUDA_ON"]='1'
Issue: Model not loading
Check model path and strategy:
model = RWKV( model='/absolute/path/to/model.pth', strategy='cuda fp16'# Or 'cpu fp32' for CPU)
Issue: State management in RNN mode
Always pass state between forward calls:
# WRONG: State lostout1, _ = model.forward(tokens1,None)out2, _ = model.forward(tokens2,None)# No context from tokens1!# CORRECT: State preservedout1, state = model.forward(tokens1,None)out2, state = model.forward(tokens2, state)# Has context from tokens1