alicloud-ai-audio-asr▌
cinience/alicloud-skills · updated Apr 8, 2026
Category: provider
Category: provider
Model Studio Qwen ASR (Non-Realtime)
Validation
mkdir -p output/alicloud-ai-audio-asr
python -m py_compile skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py && echo "py_compile_ok" > output/alicloud-ai-audio-asr/validate.txt
Pass criteria: command exits 0 and output/alicloud-ai-audio-asr/validate.txt is generated.
Output And Evidence
- Store transcripts and API responses under
output/alicloud-ai-audio-asr/. - Keep one command log or sample response per run.
Use Qwen ASR for recorded audio transcription (non-realtime), including short audio sync calls and long audio async jobs.
Critical model names
Use one of these exact model strings:
qwen3-asr-flashqwen3-asr-flash-2026-02-10qwen-audio-asrqwen3-asr-flash-filetransqwen3-asr-flash-filetrans-2025-11-17
Selection guidance:
- Use
qwen3-asr-flash,qwen3-asr-flash-2026-02-10, orqwen-audio-asrfor short/normal recordings (sync). - Use
qwen3-asr-flash-filetransorqwen3-asr-flash-filetrans-2025-11-17for long-file transcription (async task workflow).
Prerequisites
- Install SDK dependencies (script uses Python stdlib only):
python3 -m venv .venv
. .venv/bin/activate
- Set
DASHSCOPE_API_KEYin environment, or adddashscope_api_keyto~/.alibabacloud/credentials.
Normalized interface (asr.transcribe)
Request
audio(string, required): public URL or local file path.model(string, optional): defaultqwen3-asr-flash.language_hints(array, optional): e.g.zh,en.sample_rate(number, optional)vocabulary_id(string, optional)disfluency_removal_enabled(bool, optional)timestamp_granularities(array, optional): e.g.sentence.async(bool, optional): default false for sync models, true forqwen3-asr-flash-filetrans.
Response
text(string): normalized transcript text.task_id(string, optional): present for async submission.status(string):SUCCEEDEDor submission status.raw(object): original API response.
Quick start (official HTTP API)
Sync transcription (OpenAI-compatible protocol):
curl -sS --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3-asr-flash",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}
]
}
],
"stream": false,
"asr_options": {
"enable_itn": false
}
}'
Async long-file transcription (DashScope protocol):
curl -sS --location 'https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'X-DashScope-Async: enable' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}'
Poll task result:
curl -sS --location "https://dashscope.aliyuncs.com/api/v1/tasks/<task_id>" \
--header "Authorization: Bearer $DASHSCOPE_API_KEY"
Local helper script
Use the bundled script for URL/local-file input and optional async polling:
python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py \
--audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" \
--model qwen3-asr-flash \
--language-hints zh,en \
--print-response
Long-file mode:
python skills/ai/audio/alicloud-ai-audio-asr/scripts/transcribe_audio.py \
--audio "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3" \
--model qwen3-asr-flash-filetrans \
--async \
--wait
Operational guidance
- For local files, use
input_audio.data(data URI) when direct URL is unavailable. - Keep
language_hintsminimal to reduce recognition ambiguity. - For async tasks, use 5-20s polling interval with max retry guard.
- Save normalized outputs under
output/alicloud-ai-audio-asr/transcripts/.
Output location
- Default output:
output/alicloud-ai-audio-asr/transcripts/ - Override base dir with
OUTPUT_DIR.
Workflow
- Confirm user intent, region, identifiers, and whether the operation is read-only or mutating.
- Run one minimal read-only query first to verify connectivity and permissions.
- Execute the target operation with explicit parameters and bounded scope.
- Verify results and save output/evidence files.
References
references/api_reference.mdreferences/sources.md- Realtime synthesis is provided by
skills/ai/audio/alicloud-ai-audio-tts-realtime/.
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.5★★★★★39 reviews- ★★★★★Arjun Jain· Dec 20, 2024
alicloud-ai-audio-asr is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Chaitanya Patil· Dec 8, 2024
Keeps context tight: alicloud-ai-audio-asr is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Piyush G· Nov 27, 2024
alicloud-ai-audio-asr has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Arjun Martinez· Nov 11, 2024
Solid pick for teams standardizing on skills: alicloud-ai-audio-asr is focused, and the summary matches what you get after install.
- ★★★★★Shikha Mishra· Oct 18, 2024
Solid pick for teams standardizing on skills: alicloud-ai-audio-asr is focused, and the summary matches what you get after install.
- ★★★★★Ama Park· Oct 2, 2024
alicloud-ai-audio-asr has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Kwame Brown· Sep 21, 2024
alicloud-ai-audio-asr fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Rahul Santra· Sep 9, 2024
We added alicloud-ai-audio-asr from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Ama Kim· Sep 5, 2024
Solid pick for teams standardizing on skills: alicloud-ai-audio-asr is focused, and the summary matches what you get after install.
- ★★★★★Carlos Tandon· Sep 1, 2024
Registry listing for alicloud-ai-audio-asr matched our evaluation — installs cleanly and behaves as described in the markdown.
showing 1-10 of 39