More caption languages for YouTube transcripts — and clearer errors when one is missing
The YouTube to Transcript and YouTube to Article tools now include many more Asian and regional Chinese caption codes, and the API returns a short message listing languages that actually exist on the video.
We have shipped two improvements aimed at anyone who works with YouTube captions in languages beyond the usual Western defaults.
A wider language picker
In YouTube to Transcript and YouTube to Article, the caption language dropdown is driven by the same list of YouTube-style language codes our backend expects. We expanded that list so you can align the request with how the video’s captions are actually published.
What we added, in plain terms:
- Regional and variant Chinese, for example Cantonese (
yue), China / Hong Kong / Macau / Taiwan tags (zh-CN,zh-HK,zh-MO,zh-TW), alongside existing Simplified and Traditional entries, plus Hakka, Min Nan (Hokkien / Taiwanese), Wu (Shanghainese), Classical Chinese, and Zhuang where YouTube exposes them. - Japanese and Korean were already available (
ja,ko); they remain the defaults for those markets. - South and Southeast Asia, including Bengali, Burmese, Cebuano, Dhivehi, Filipino, Gujarati, Hindi, Indonesian, Javanese, Kannada, Khmer, Lao, Malay, Malayalam, Marathi, Mongolian, Nepali, Odia, Pashto, Punjabi, Sinhala, Sundanese, Tamil, Telugu, Thai, Tibetan, Urdu, Vietnamese, Uyghur, Uzbek, Tajik, Kyrgyz, and Kazakh — in addition to what we already supported.
- East Asian / Pacific edge cases some catalogs omit, such as Ryukyuan (
ryu).
The menu stays sorted by language name so it is easy to scan. Not every video will offer every code; YouTube only lists what exists for that upload.
Clearer errors when the requested language is not on the video
If you ask for a caption track that the video does not have (for example English when only Japanese auto-captions exist, with English available only as a translation in YouTube’s UI), the underlying library used to surface a long, technical exception.
We now catch that case in our transcript backend and return a short, readable message: the selected language is not available, with a comma-separated summary of caption and translation options YouTube reported for that video. The HTTP API treats that as a client error (400) instead of a generic failure, so the site can show something actionable instead of a wall of stack-trace text.