This is something that is usually taken care of by the App that's receiving the ...

hex4def6 · on July 9, 2024

Assuming OP is correct, your last sentence implies this isn't the solution being used.

Additionally, many (citation needed) Youtube videos have people talking in them; this method wouldn't help with that.

Isolating vocals in general is significantly more difficult than just relying on frequency range. Any instrument I can think of can generate notes that are squarely in the common range of a human (see: https://www.dbamfordmusic.com/frequency-range-of-instruments...)

eonpi · on July 10, 2024

Was trying to informally describe the use of Fourier transformations to achieve the isolation. Success will vary depending on the situation, but ML is also used in more recent cases with more uniform end results for the particular use case.

The initial question may be specific to the way one particular browser handles things to certain degree, but the comment was also trying to communicate that it can go beyond the browser and can actually be handled by the application. However, the microphone itself can also be participating at some level if it features noise suppression or some other enhancements.

The surprise about things being different when using a separate browser, come from assuming that any audio reaching the microphone should be processed equally if using FTs (or machine learning if applicable), so the audio source shouldn't matter.

References:

- https://www.nti-audio.com/en/support/know-how/fast-fourier-t...

- https://pseeth.github.io/public/papers/seetharaman_2dft_wasp...