Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>Chrome now provides on-device powered live captions [...] which could help alleviate some of the limitations for audio impaired viewers

That's a great feature! But it also highlights the limited accuracy of the AI machine learning algorithm for technical topics with jargon. E.g., at 27m00s, the caption algorithm incorrectly transcribes it as as "APL is joked about as a right only language" -- but we know the speaker actually said, "APL is joked about as a write-only language". And the algorithm incorrectly transcribes "oversonian languages" when it's actually "Iversonian languages".

The algorithm also doesn't differentiate multiple speakers and the generated text is just continuously concatenated even as the voices change. Therefore, an audio-impaired wouldn't know which person said a particular string of words.

This is why podcasters still have to pay humans (sometimes with domain knowledge) to carefully listen to the audio and accurately transcribe it.



I know the joke is that APL is a write-only language, but it somehow seems more true to say it is a right-only language.

I am nonplussed about AI/ML in general but this accidental wisdom is worth meditating on even if it didn't come from a human.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: