External Publication
Visit Post

ChatGPT continues to spontaneously switch to Welsh language

OpenAI Developer Community February 19, 2026
Source
My gut feeling is that the problem stems from a polluted Welsh language voice data set from VoxLingua107, which is used to train the language detector in Whisper. It’s got lots of English in it (in both Welsh and English accents), some Welsh in an English accent, and some code-switching ‘Wenglish’. The Whisper whitepaper mentions the issue, but doesn’t really elucidate its origin well. I think it’s a classic case of ‘spwriel mewn, spwriel mas’ at the heart of it.

Discussion in the ATmosphere

Loading comments...