>Chrome now provides on-device powered live captions [...] which could help alleviate some of the limitations for audio impaired viewers
That's a great feature! But it also highlights the limited accuracy of the AI machine learning algorithm for technical topics with jargon. E.g., at 27m00s, the caption algorithm incorrectly transcribes it as as "APL is joked about as a right only language" -- but we know the speaker actually said, "APL is joked about as a write-only language". And the algorithm incorrectly transcribes "oversonian languages" when it's actually "Iversonian languages".
The algorithm also doesn't differentiate multiple speakers and the generated text is just continuously concatenated even as the voices change. Therefore, an audio-impaired wouldn't know which person said a particular string of words.
This is why podcasters still have to pay humans (sometimes with domain knowledge) to carefully listen to the audio and accurately transcribe it.
That's a great feature! But it also highlights the limited accuracy of the AI machine learning algorithm for technical topics with jargon. E.g., at 27m00s, the caption algorithm incorrectly transcribes it as as "APL is joked about as a right only language" -- but we know the speaker actually said, "APL is joked about as a write-only language". And the algorithm incorrectly transcribes "oversonian languages" when it's actually "Iversonian languages".
The algorithm also doesn't differentiate multiple speakers and the generated text is just continuously concatenated even as the voices change. Therefore, an audio-impaired wouldn't know which person said a particular string of words.
This is why podcasters still have to pay humans (sometimes with domain knowledge) to carefully listen to the audio and accurately transcribe it.