Hacker Newsnew | past | comments | ask | show | jobs | submit | TachyonicBytes's commentslogin

It seems to be pre-built on github, in releases.


Yes, but I guess it includes the "payment check" in the pre-built releases? Not sure but perhaps


It has the payment button at least


I use whisperfile[1] directly. The whisper-large-v3 model seems good with non-English transcription, which is my main use-case.

I am also eyeing whisperX[2], because I want to play some more with speaker diarization.

Your use-case seems to be batch transcription, so I'd suggest you go ahead and just use whisperfile, it should work well on an M4 mini, and it also has an HTTP API if you just start it without arguments.

If you want more interactivity, I have been using Vibe[3] as an open-source replacement of SuperWhisper[4], but VoiceInk from a sibling comment seems better.

Aside: It seems that so many of the mentioned projects use whisper at the core, that it would be interesting to explicitly mark the projects that don't use whisper, so we can have a real fundamental comparison.

[1] https://huggingface.co/Mozilla/whisperfile

[2] https://github.com/m-bain/whisperX

[3] https://github.com/thewh1teagle/vibe/

[4] https://superwhisper.com/


I have used whisperX with success in a variety of languages, but not with diarization. If the goal is to use the transcript for something else, you can often feed the transcript into a text LLM and say "this is an audio transcript and might have some mistakes, please correct them." I played around with transcribing in original language vs. having whisper translate it, and it seems to work better transcribing in the original language, then feeding into an LLM and having that model do the translation. At least for french, spanish, italian, and norwegian. I imagine a text-based LLM could also clean up any diarization weirdness.


Yes, this is exactly where I am going. The LLM also has an advantage, because you can give it the context of the audio (e.g. "this is an audio transcript from a radio show about etc. etc."). I can foresee this working for a future whisper-like model as well.

There are two ways to parse your first sentence. Are you saying that you used whisperX and it doesn't do well with diarization? Because I am curious of alternative ways of doing that.


Whisper is amazing. It's better than any other speech recognition system I've seen, and it can be run locally.

I've run it on my ThinkPad P14s Gen 4, which doesn't have much of a GPU (Radeon 780M). It processes approximately in realtime.


DiCoW-v2 seems to work better than whisperX for diarization, by the way.

https://pccnect.fit.vutbr.cz/gradio-demo/


It seems that both use / leverage pyannote. I wonder if the whisperX pipeline can be combined with DiCoW-v2.


There was an article around here about how battery life actually improves if you ultra-fast charge the battery when you make it.

Maybe it will deteriorate it, but it seems that the effect that different charge types have on batteries may not be complete yet.


I have to add https://nullprogram.com, just because of the care the author took to have it work better in lynx[1]:

    Just in case you haven’t tried it, the blog also works really well with terminal-based browsers, such as Lynx and ELinks. Go ahead and give it a shot. The header that normally appears at the top of the page is actually at the bottom of the HTML document structure. It’s out of the way for browsers that ignore CSS.
[1] https://nullprogram.com/blog/2017/09/01/


Is this a different method from the httptap [1] that was on hackernews a few weeks ago? Somebody in that post seemed to say that it also generates CA certificates on the fly.

[1] https://news.ycombinator.com/item?id=42919909


httptap is really cool! Their technique is different (they do a filesystem mount instead of intercepting syscalls like Subtrace does) but both tools effectively reach the same goal using different routes.


I use Zotero[1] as a personal web archiver. It downloads the page locally, placing most of the resources inside a single html file (pictures become base64 encoded pngs, for example). I find it the best way to have the content available offline and also to be able to reference it easily, seeing as it is a citation manager first.

[1] https://www.zotero.org/


It can, but that's another type of "shallow", or more exactly "not-deep" cloning, called blobless cloning [1]. There is also treeless cloning, with other tradeoffs, but much to the same effect.

I found this[2] very enlightening.

[1] https://github.blog/open-source/git/get-up-to-speed-with-par...

[2] https://www.howtogeek.com/devops/how-to-use-git-shallow-clon...


I assume that framework is not open-source or somewhere I can look at it?


You could take a look at the open-source dbt framework. They have a good implementation of unit testing for sql

https://docs.getdbt.com/docs/build/unit-tests


I guarantee you wouldn't want to use it


Not the OP, but they are "Headers". Probably coming from the <h1> tag in html. What outsiders probably call "Headlines".


What libraries have you seen that do this?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: