Hacker Newsnew | past | comments | ask | show | jobs | submit | robviren's commentslogin

I want to explore the space of audio encoding and GPT like understanding of audio. I'm so highly interested in how a simple 1d signal must go through so much processing to be understood by language models, and am curious what tradeoffs occur. Would also be fun to make a TTS Library and understand it.

I'm trying to make a neural audio codec using a variety of misguided methods. One I am using ESNs wrong spreading leak rates in a logarithmic fashion acting like a digital cochlea. The other is trying to do the same with a complex mass-spring-damper system to simulate the various hairs of the cochlea as well. Both approaches make super interesting visuals and appear to cluster reasonably well, but I am still learning about RVQ and audio loss (involves GANs and spectral loss). I kinda wanna beat SNAC if I can.

Do you have a log available somewhere?

I keep everything in my self hosted gitea. Just made it public.

https://gitter.swolereport.com/robviren/cspace


Thanks, I’ll check it out

Edit: timed out


Reminds me of https://github.com/RobViren/kvoicewalk where people take voice clips and train a text to speech using random walks.

Not related, misguided methods :D


Well, it’s the same author so it is kind of related.

Love to see the Pi getting some rather creative use! The most use I got out of one was as a health check endpoint for power in my garage which was holding frozen milk for my newborn, but the circuit kept tripping. Had another server email me if it couldn't reach the Pi for some reason. Just used some real simple Go code. It was not production but it worked. Not everything needs to change the world, maybe just make your day easier.


Exactly. When it helps your daily life, the whole build process is way more exciting. I really liked your project as well.


I just liked programming when it contained a comprehensible amount of abstraction. Stacks have become so tall it is not even feasible for a single human to comprehend what is occurring. I also liked when standards had less surface area. Working in healthcare it has become obvious standards only ever get added, never removed. Complexity is absurd now. I'm not championing that we all become experts in bare metal assembly, but I feel for OP and a desire to at least fundamentally understand what is happening on some level.


I have been unable to get anything other than Cachy to run Baldur's Gate 3 as well as Windows on my Lenovo Legion 2021. Best I have found for performance and so far stable on my relative new tower.


Tried installing Cachyos yesterday, was playing Arc Raider like 15m later (mainly because I had to wait on the 30GB download). Zero issues so far. Next up is to see if Rocksmith 2014 wants to play ball.


I greatly appreciate the nuclear industry. Nuclear field engineering was my first "real" job out of college and they really committ to safety. Transparency in this industry is inspiring because everyone involved knows that one screw up and that's the end of the US nuclear industry. Good luck getting oil and gas to be accountable and as transparent about incidents. I carry the culture into the rest of my work and appreciate being involved. Wish events like this didn't happen but it is not of significant danger and I find it great that they communicate even "smaller" issues.


I've lived through three major nuclear incidents, and what they had in common, regardless of the political systems of the US, The Soviet Union or Japan, was not the transparency, it was the lying. It started immediately after each incident.

I'm essentially pro-nuclear, I just don't trust people who run it.


Totally valid perspective. I only became part of the industry after Fukushima. I only knew an industry by its disasters. I will say, having gone through the training programs we studied the nuclear incidents and spent a year in training before going to the plants. I just don't see parallel experiences looking back like that. The people in nuclear (at least from what I saw) want the industry to be safe and successful.


You describe incidents which become political. At some point the normal rules are being ignored by those on the top of the information food chain. That says nothing about the rules of the game, but does say a lot about the people involved.


The rule-ignoring and the lying started inside the plants before anybody outside got involved. Then it just spread like cancer.


Can you recommend a book or two in order to learn about that culture? IMO we could use more of it in AI.


I found strong parallels between tech safety and nuclear safety.

https://www.nrc.gov/docs/ML0534/ML053410342.pdf

NRC is a good place to start. They have been at trying to prevent tech from hurting people for awhile.


This has got to be one of the most visually pleasing explanations I have seen of these concepts. Congrats!

I attempted some similar VQ-VAE work instead trying to tokenize rendered text. I was curious if I could make a visual llm working on 10 pt rendered font, but I also tried using PDF sources. The basic idea was to do what more advanced diffusion image models can do where they generate images of text. Make a specific image text diffusion model to do completions. Further I wondered if I could embed things like document type and language so you could have a latent representation of text more abstracted than current dictionary tokenizers. Learned a lot and thought it was all beautifully displayed in this post.


I have been playing with the idea of an LLM native programming language focusing on token efficiency, comprehension, and attention. It is interesting to see what the various large models come up with. A common theme actually reminds me quite of bit of assembly. The verb prefixing, limited statements per line, small concept surface area all appeared in multiple conversations across several larger models. The big difference being assembly lacks semantic meaning leaving some benefit on the table. I still cannot believe what some did with the tech, RCT is such a retro favorite.


I have been thinking this as well. I desperately wish to develop a method that gives the models latent thinking that actually has temporal significance. The models now are so linear and have to scale on just one pass. A recurring model where the dynamics occur over multiple passes should hold much more complexity. Have worked on a few concepts in that area that are panning out.


I feel it is interesting but not what would be ideal. I really think if the models could be less linear and process over time in latent space you'd get something much more akin to thought. I've messed around with attaching reservoirs at each layer using hooks with interesting results (mainly over fitting), but it feels like such a limitation to have all model context/memory stuck as tokens when latent space is where the richer interaction lives. Would love to see more done where thought over time mattered and the model could almost mull over the question a bit before being obligated to crank out tokens. Not an easy problem, but interesting.


Agree! I’m not an AI engineer or researcher, but it always struck me as odd that we would serialise the 100B or whatever parameters of latent space down to maximum 1M tokens and back for every step.


They're already implementing branching thought and taking the best one, eventually the entire response will be branched, with branches being spawned and culled by some metric over the lifetime of the completion. It's just not feasible now for performance reasons.


>I feel it is interesting but not what would be ideal. I really think if the models could be less linear and process over time in latent space you'd get something much more akin to thought.

Please stop, this is how you get AI takeovers.


Citation seriously needed.


It's really very simple. As models become more capable they may become interested in deceiving humans or otherwise manipulating them to achieve their goals. We already see this in various places see:

https://www.anthropic.com/research/agentic-misalignment

https://arxiv.org/abs/2412.14093

If the chain of thought of models becomes pure "neuralese" i.e. the models think purely in latent space then we will lose the ability to monitor for malicious behavior. This is incredibly dangerous, CoT monitoring is one of the best and highest leverage tools for monitoring model behavior and losing that would be devastating for safety.

https://www.lesswrong.com/posts/D2Aa25eaEhdBNeEEy/worries-ab...

https://www.lesswrong.com/posts/mpmsK8KKysgSKDm2T/the-most-f...

https://www.lesswrong.com/posts/3W8HZe8mcyoo4qGkB/an-idea-fo...

https://x.com/RyanPGreenblatt/status/1908298069340545296

https://redwoodresearch.substack.com/p/notes-on-countermeasu...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: