How are subtitles created usually? Are they provided by the source material team, some professional third party that manually transcribes the video, or just fans doing it for free?
See that’s the kicker, for the longest time it was basically all fan translated subtitles, and only recently have payed for translation become the norm.
So it’s really quite pathetic for them to try and save a few bucks by replacing a proper translator with a LLM, given that there are still plenty of passionate fans who would have done it for free. Especially given that translating between Japanese and English in a cultural context heavy situation is something these LLMs are really bad at.
given that there are still plenty of passionate fans who would have done it for free
I’d imagine this is a non-starter from a corporate standpoint. I know if I were in charge I’d be terrified of the idea of just trusting community-submitted subtitles to not have random slurs or something inserted. That said I still think it would be super cool if they’d let people source and use their own subtitle files; I now it’s possible because I have a tampermonkey script that lets me do just that.
That’s the core of the issue, crunchy roll has sat its self as a corporate middleman, buying the rights to distribute shows and then charging consumers a subscription for access.
But they can’t be bothered to do the only actual damn work their position would realistically demand, beyond renting server space; providing translations for the foreign media they’re distributing.
That’s without even discussing the fact that not a single penny users give them will end up in the hands of any of the exploited artists who actually made the shows, since the industry doesn’t work on residuals or any other kind of profit sharing, the licensing fees crunchy roll pays essentially going straight to financiers.
So pirate the shit and use whatever subtitles you want.
In terms of anime fansubs, it’s normally just great folks in the community. Some got hired by studios. But the studio is meant to provide the subs.
It seems that they have, or at least had in 2023, internal teams that handled the translations. https://www.crunchyroll.com/news/interviews/2023/9/30/international-translation-day-2023
I maintain my own media library and I ensure every file has English and German subtitles. There are a variety of ways to source srt files but when all else fails a machine with enough compute can transcribe video files using open source whisper. After I generate an English srt file from the video I send it to OpenAI to create the German translation.
Is there something similar for manga? Something that can overlay Japanese text on images, similar to what we have on smartphones but for the PC?
I feel like this is a reasonable use of chat gpt.
For YouTube tutorial videos I have no issue with relying on GPT, but I think it’s important to recognize that the translation of art is art. I don’t feel good about the idea of something without a soul or perspective interpolating a work of art from one culture and language into another that might be wildly different from where it started.
That all said, I think Crunchyroll and anyone else using AI art without disclosing it absolutely should be honest about it.
As someone who is able to speak Japanese, I’d notice the drop in quality of translation almost instantly.
I never turn on subs anyway when I watch my anime though.
I have to since my partner doesn’t speak Japanese, but half the time I end up having to correct lines for them once or twice, to make things make sense. The non-egregious stuff I don’t even bother with. It’s crazy how amateurish some of the mistakes are, or even what are clearly choices to omit entire sentences, for no reason.
おい、ゆうじ君、海行こうぜ
“Hi Yuji!”
君
As someone who learns japanese. Is that a kanji for a honorific? probably kun? ゆうじ is the name, although weird that it is written in hiragana I guess… But I fail at this one 海行こうぜ
The first Kanji has the one for mother as part of it I think… And the second one is pronounced it ‘i’ so …iikouze ? Let’s go somewhere?
Yes, 君 is ‘kun’ when used as an honorific.
海 is ‘umi’, or sea/ocean. You are correct that the second half of the kanji (母) is the same as the standalone character for mother, but it’s base radical is ⽏, which also just means mother. The first radical, ⺡, means water/ liquid, so you can sort of infer that “water mother” = ocean. Not all kanji work out this nicely with their radical structure, though.
Last part is spot on, ikou (行こう) is the shortened (conjugation?) of iku or ‘to go’ that expresses a suggestion to do, i.e. “let’s (go)”.
Thanks for the feedback, seems my efforts weren’t entirely wasted :D Interesting, that the Kanji for water itself does not contain that rqficale (unless you squint heavily) What’s the difference to Ikkimashou? Isn’t that the suggestive form? As in ‘we should go’
The radical for water is actually derived from the standalone kanji. It’s basically an extremely short-stroke version of the kanji.
Ikimashou is just the ‘formal’, full-length version. No difference in meaning. Just as “iku” is the casual version of “ikimasu”.
Ikimasu -> iku
Ikimashou -> ikou
Fascinating. That explains the similarity. Since watching that episode of Witch Watch I definitely feel bad about my formal “Duolingo” Japanese :D
By the way, is there a rule to how these short forms are formed?
By the way, is there a rule to how these short forms are formed?
Yep! Most Japanese verbs (with a few exceptions like ‘shimasu’ becoming suru) use one of the ‘i’ variants (‘i’, ‘ki’, ‘ni’, ‘mi’, or ‘ri’) after the kanji, that indicates they are verbs.
Yakimasu (to burn/ cook), shirimasu (to know), arukimasu (to walk), arimasu (to be), shinimasu (to die), yogimasu (to read).
Ki will become ku in the shortened version, ri will become ru, ni -> nu, etc:
yaku, shiru, aruku, aru, shinu, yomu
I believe the verbs that don’t end in one of those like tabemasu (to eat) will default to ‘ru’ (taberu), but I don’t know if that’s a rule off the top of my head, or if I just can’t think of any others right now.
In the cases where rendaku applies, such as oyogimasu (to swim), the end kana will also have rendaku applied, e.g. oyogu. Ki -> ku, gi -> gu.
Although it seems likely that Crunchyroll uses an LLM for translation in some way, I wouldn’t call that “confirmed” since that might be the result of an individual translator using it.
The actions of an employee, when reviewed and released by a company, are the actions of that company. A company is just the sum of its employees’ actions.
Also, LLM have been there for a while. So there are a few possible situations
- LLM used is authorized or even encouraged. In this case it’s the company
- LLM use is controlled, and this falls into one of the authorized cases. Same thing really. Also their authorized use cases need review
- LLM use is forbidden, or restricted and this is not an authorized use. In this case it falls on the company to review what’s being done. It’s their responsibility.
So yeah, whatever the situation, it’s on Crunchyroll.
Both translation and subtitles have highly efficient tooling when in the hands of a professional. Translators nowadays use a mix and will build up a dynamic database as they go through a corpus that needs coherence. What’s bad in this instance is not the usage of some AI, but of a badly adapted AI and ultimately of mediocre results which gives an amateurish impression.
Pretty obvious if you’re used any recently but confirmation is nice. Their closed captions are generally pretty terrible as well.