Hacker Newsnew | past | comments | ask | show | jobs | submit | malborodog's commentslogin

I am going to ask gpt 4 to implement this for me so I can see what it looks like. Amazing.


I have to go to dinner so this isn't finished, but this is most of the code to prove it works: https://gist.github.com/jcalvinowens/0d7a5c327d863fca7c84daa...


How did it go?


why not just have a tonne of html files on disk?? I'm confused at why this requires a sophisticated backend.


(1) could you hash everyone's mp3 files so that if person a had the same songs as person b, you only need to store them once? (2) people obviously storing copyright stuff in there -- do you have legal issues?


+1 for hermitix -- funny to see it come up on HN!


How could this underlying idea be implemented in a python script from a running selenium driver?


What’s the infamous Dropbox comment? Sounds hilarious.


> I have a few qualms with this app:

> 1. For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.

> 2. It doesn't actually replace a USB drive. Most people I know e-mail files to themselves or host them somewhere online to be able to perform presentations, but they still carry a USB drive in case there are connectivity problems. This does not solve the connectivity issue.

> 3. It does not seem very "viral" or income-generating. I know this is premature at this point, but without charging users for the service, is it reasonable to expect to make money off of this?

https://news.ycombinator.com/item?id=8863#9224


Absolutely glorious. thx


I have to transcribe a tonne of Chinese interviews soonish -- any further thoughts or experiments you can think of? Maybe some preprocessing steps to the audio? For example, cut it into one minute chunks with some overlap, then transcribe those, so that it can't skip those bits...? Or can we finetune it on a library of Chinese mp3s + transcripts?


Whisper cuts the audio into chunks of 30 seconds. So If you have a one-minute recording where the first half has conversation in it and the remainder nothing, then it will think that it has to find something in that second 30 seconds block without knowing how "speech" actually sounded like as it did in the first chunk.

Try to pre-process it where just "voice" is detected, not the meaning, just some speaking, and cut the audio into snippets which only contain speech, so that Whisper doesn't have to guess if the segment will contain speech or not.

Also, if you cut it up into chunks and let it transcribe each chunk and expect JSON as the output, instead of the other output methods, then you'll get a bunch of extra parameters with it which will help you identifying problematic sections. For example hallucinated repetitions usually have a higher "no_speech_prob" parameter, or segments with lower "compression_ratio" will also not be that accurate.


If cost isn’t an issue, I’d use one of the established commercial offerings. Otherwise, you could try splitting into shorter chunks (20 minutes maybe?), do multiple runs on each chunk and pick out the best run according to some criteria, e.g. character count after removing repetitions. Whisper isn’t deterministic so some runs can be better than others; you could also tweak parameters like compression ratio or silence thresholds between runs, but in my experience there’s not going to be an obvious winner leading to a marked improvement in quality. Anyway, I’m no expert, and maybe you’ll have better luck than me. My recordings do have background music and conversations in some places that might confuse the model.


Can't use third party service -- compliance stuff. This is helpful though. Using another model to tidy it up -- maybe Alpaca -- could be an option too. Then we'll just do speaker separation etc. manually later.


why not just run whisper from the command line directly? Why put it into a docker container??


Why not keep everything tightly contained?


Hm, I'm on Mac so it takes up a bunch of ram and I'm not used to this workflow. good point though.


Unless you actually use the memory (e.g. allocate it), it won’t impact system performance, but yeah, it definitely is overhead.


some people just love making their environments needlessly complicated.


complexity is in the eye of the beholder, some people just get docker enough that it's not a friction

Now installing the dependencies of every git repo I want to try on my host system, that's how an environment becoming needlessly complicated


Nah it was done kinda tastefully here and it’s helpful to know the comps


Anyone know best way to punch up his function a little so it tees >> out both the q and a to a file? Like history but for chat gpt.


https://github.com/npiv/chatblade

this has built in history and lots of other features that a serious user needs, like checking token limits and estimating costs, etc.


I don't, but you may be interested in something that improves upon that with the history optionally in outline format:

https://github.com/rksm/org-ai


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: