Hacker Newsnew | past | comments | ask | show | jobs | submit | MutedEstate45's commentslogin

Agree with your pain points. One thing id add is GitHub makes you reapprove every PR after each push. As an OSS contributor it’s exhausting to chase re-approvals for minor tweaks.


mmmm this is up to each repo/maintainer's settings.

To be fair you don't know if one line change is going to absolutely compromise a flow. OSS needs to maintain a level of disconnect to be safe vs fast.


Good to know! Never been a maintainer before so I thought that was required.


Adding fixup commits (specifying the specific commit they will be squashed into), to be squashed by the bot before merge, handles that.


This is a security setting that the author has chosen to enable.


Hm that’s not the case for my repositories? Maybe you have a setting enabled for that?


Great work! Definitely feels a lot more technical than the typical internship project.


The headline feature isn’t the 25 MB footprint alone. It’s that KittenTTS is Apache-2.0. That combo means you can embed a fully offline voice in Pi Zero-class hardware or even battery-powered toys without worrying about GPUs, cloud calls, or restrictive licenses. In one stroke it turns voice everywhere from a hardware/licensing problem into a packaging problem. Quality tweaks can come later; unlocking that deployment tier is the real game-changer.


yeah, we are super excited to build tiny ai models that are super high quality. local voice interfaces are inevitable and we want to power those in the future. btw, this model is just a preview, and the full release next week will be of much higher quality, along w another ~80M model ;)


> It’s that KittenTTS is Apache-2.0

Have you seen the code[1] in the repo? It uses phonemizer[2] which is GPL-3.0 licensed. In its current state, it's effectively GPL licensed.

[1]: https://github.com/KittenML/KittenTTS/blob/main/kittentts/on...

[2]: https://github.com/bootphon/phonemizer

Edit: It looks like I replied to an LLM generated comment.


The issue is even bigger: phonemizer is using espeak-ng, which isn't very good at turning graphemes into phonemes. In other TTS which rely on phonemes (e.g. Zonos) it turned out to be one of the key issues which cause bad generations.

And it isn't something you can fix, because the model was trained on bad phonemes (everyone uses Whisper + then phonemizes the text transcript).



> IANAL, but AFAICS this leaves 2 options, switching the license or removing that dependency.

There is a third option: asking the project for an exception.

Though that is unlikely to be granted¹ leaving you back with just the other two options.

And of course a forth choice: just ignore the license. This is the option taken by companies like Onyx, whose products I might otherwise be interested in…

----

[1] Those of us who pick GPL3 or AGPL generally do so to keep things definite and an exception would muddy the waters, also it might not even be possible if the project has many maintainers as relicensing would require agreement from all who have provided code that is in the current release. Furthermore, if it has inherited the license from one of its dependencies, an exception is even less practical.


> There is a third option: asking the project for an exception.

IIUC, the project isn't at the liberty to grant such an exception because it inherits its GPL license from espeak-ng.


Ah, yes, good catch, I didn't look deeper into the dependency tree at all. I'll update my footnote to include that as one of the reasons an exception may be impossible (or at least highly impractical).


A fourth option would be a kind of dual-licensing: the project as-is is available under GPL-3.0, but the source code in this repository excluding any dependencies is also available under Apache 2.0

Any user would still effectively be bound by the GPL-3.0, but if someone can remove the GPL dependencies they could use the project under Apache


That is an option for the publisher of the library, not the consumer of it. If it isn't already done then asking for it to be done is the same as asking for an exception otherwise (option three).


The use of the library is four lines. Three set up the library (`phonemizer.backend.EspeakBackend(language="en-us", preserve_punctuation=True, with_stress=True)`), the other calls it (`phonemes_list = self.phonemizer.phonemize([text])`). Plus I guess the import statements. Even ignoring Google vs Oracle I don't think those lines by themselves meet any threshold of originality.

Obviously you can't run them (with the original library) without complying with the GPL. But I don't see why I couldn't independently of that also give you this text file under Apache 2.0 to do with as you want (which for the record still doesn't allow you to run them with the original library without complying with the GPL, but that'd be phoneme forcing you to do that, not this project)

You would have to be very specific about the dual-licensing to avoid confusion about what you are allowed to do under Apache conditions though. You can't just say "it's dual-licensed"


You could even extract out the parts that do not call the GPL library into an upstream project under the Apache 2.0 licence, and pull in both that and the GPL library in the downstream project, relying on Apache 2.0 -> GPL 3.0 compatibility instead of explicit dual licensing to allow the combined work to be distributed under GPLv3.


Once the license issues are resolved it would nice if you could install it on a distro with the normal package manager.


This would only apply if they were distributing the GPL licensed code alongside their own code.

If my MIT-licensed one-line Python library has this line of code…

  run([“bash”, “-c”, “echo hello”])
…I’m not suddenly subject to bash’s licensing. For anyone wanting to run my stuff though, they’re going to need to make sure they themselves have bash installed.

(But, to argue against my own point, if an OS vendor ships my library alongside a copy of bash, do they have to now relicense my library as GPL?)


The FSF thinks it counts as a derivative work and you have to use the LGPL to allow linking.

However, this has never actually been proven in court, and there's many good arguments that linking doesn't count as a derivative work.

Old post by a lawyer someone else found (version 3 wouldn't affect this) [1]

For me personally I don't really understand how, if dynamic linking was viral, using linux to run code isn't viral. Surely at some level what linux does to run your code calls GPLed code.

It doesn't really matter though, since the FSF stance is enough to scare companies from not using it, and any individual is highly unlikely to be sued.

[1] https://www.linuxjournal.com/article/6366


> For me personally I don't really understand how, if dynamic linking was viral, using linux to run code isn't viral. Surely at some level what linux does to run your code calls GPLed code.

The Linux kernel has an explicit exception for userspace software:

> NOTE! This copyright does not cover user programs that use kernel services by normal system calls


And the GPL also has an explicit exception for "system" software such as kernel, platform libraries etc.:

> The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it.

> The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work.


> This would only apply if they were distributing the GPL licensed code alongside their own code.

As far as I understand the FSF's interpretation of their license, that's not true. Even if you only dynamically link to GPL-licensed code, you create a combined work which has to be licensed, as a whole, under the GPL.

I don't believe that this extends to calling an external program via its CLI, but that's not what the code in question seems to be doing.

(This is not an endorsement, but merely my understanding on how the GPL is supposed to work.)


This is a false analogy. It's quite straightforward.

Running bash (via exec()/fork()/spawn()/etc) isn't the same as (statically or dynamically) linking with its codebase. If your MIT-licensed one-liner links to code that's GPL licensed, then it gets infected by the GPL license.


I've seen people use IPC to workaround the GPL, but I've also seen the FSF interpretations claiming that is still a derived work.

I don't know if this has ever been tested in court.


My interpretation of their FAQ[1] on it is that shelling out and IPC are fine, while linking is not. As you say, it's ultimately up to the courts to decide on.

[1]: https://www.gnu.org/licenses/gpl-faq.html#MereAggregation


you are correct. its about linking as in LD does it, not conceptual linking.


GPL is for boomers at this point. Floppy disks? Distribution? You can use a tool but you cant change it? A DLL call means you need to redistribute your code but forking doesn't?

Sillyness


GPL post-dates network software distribution (we got our first gcc via ftp).


Yes, but if you use open source libraries for your closed source SaaS - thats fine. People get their software _over_ the network delivered to them in a VM (your browser).


Given that the FSF considers Apache-2.0 to be compatible with GPL-3.0 [0], how could the fact that phonemizer is GPL-3.0 possibly be an issue?

[0]: https://www.gnu.org/licenses/license-list.html#apache2


Compatible means they can be linked together, BUT the result is GPL-3.


> the result is GPL-3

The result can only be distributed under the terms of the GPL-3. That's actually a crucial difference: there's nothing preventing Kitten TTS from being Apache licensed, soliciting technical contributions under that license, and parts of its code being re-used in other software under that license. Yes, for the time being, this limits what you can do with Kitten TTS if you want to use the software as a whole (e.g. by embedding it into your product), but the license itself is still Apache and that can have value.


Okay, what's stopping you from feeding the code into an LLM and re-write it and make it yours? You can even add extra steps like make it analyze the code block by block then supervise it as it is rewriting it. Bam. AI age IP freedom.

Morals may stop you but other than that? IMHO all open source code is public domain code if anyone is willing to spend some AI tokens.


That would be a derivative work, and still be subject to the license terms and conditions, at best.

There are standard ways to approach this called clean room engineering.

https://en.m.wikipedia.org/wiki/Clean-room_design

One person reads the code and produces a detailed technical specification. Someone reviews it to ensure that there is nothing in there that could be classified as copyrighted material, then a third person (who has never seen the original code) implements the spec.

You could use an LLM at both stages, but you'd have to be able to prove that the LLM that does the implementation had no prior knowledge of the code in question... Which given how LLMs have been trained seems to me to be very dubious territory for now until that legal situation gets resolved.


AI is useful in Chinese walling code, but it’s not as easy as you make it sound. To stay out of legal trouble, you probably should refactor the code into a different language, then back into the target language. In the end, it turns into a process of being forced to understand the codebase and supervising its rewriting. I’ve translated libraries into another language using LLMs, I’d say that process was 1/2 the labor of writing it myself. So in the end, going 2 ways, you may as well rewrite the code yourself… but working with the LLM will make you familiar with the subject matter so you -could- rewrite the code, so I guess you could think of it as a sort of buggy tutorial process?


I am not sure even that is enough. You would really need to do a clean room reimplementation to be safe - for exactly the same reasons that people writing code write clean room reimplementations.


Yeah, the algorithms and program flow would have to be materially distinct to be really safe. Maybe switching language paradigms would get that for you in most cases? Js->haskell->js? Sounds like a nightmare lol.


Tell me you haven't used LLMs on large, non-trivial codebases without telling me... :)


Tell me you don't know how to use LLMs properly without telling me.

You don't give the whole codebase to an LLM and expect it to have one shot output. Instead, you break it down and and write the code block by block. Then the size if the codebase doesn't matter. You use the LLM as a tool, it is not supposed to replace you. You don't try to become George from Jetsons who is just pressing a button and doesn't touch anything, instead you are on top of it as the LLM does the coding. You test the code on every step to see if the implementation behaves as expected. Do enough of this and you have proper, full "bespoke" software.


I'll help you along - this is the core function that Kitten ends up calling. Good luck!

https://github.com/espeak-ng/espeak-ng/blob/a4ca101c99de3534...


A Festival's English model, festvox-kallpc16k, is about 6 MB, and it is a large model; festvox-kallpc8k is about 3.5 MB.

eSpeak NG's data files take about 12 MB (multi-lingual).

I guess this one may generate more natural-sounding speech, but older or lower-end computers were capable of decent speech synthesis previously as well.


Custom voices could be added, but the speed was more important to some users.

$ ls -lh /usr/bin/flite

Listed as 27K last I checked.

I recall some Blind users were able to decode Gordon 8-bit dialogue at speeds most people found incomprehensible. =3


I'm not blind but spoken English it's far more difficult to grasp than written one (I'm a non-native speaker), and Flite runs on n270 netbooks at crazy speeds with really good enough voices.


> KittenTTS is Apache-2.0

What about the training data? Is everyone 100% confident that models are not a derived work of the training inputs now, even if they can reproduce input exactly?


I play around with a nvidia jetson orin nano super right now and its actually pretty usuable with gemma3:4b and quite fast - even image processing is done in like 10-20 seconds but this is with GPU support. When something is not working and ollama is not using the GPU this calls take ages because the cpu is just bad.

Iam curious how fast this is with CPU only.


It depends on espeak-ng which is GPLv3


This opens up voice interfaces for medical devices, offline language learning tools, and accessibility gadgets for the visually impaired - all markets where cloud dependency and proprietary licenses were showstoppers.


But Pi Zero has a GPU, so why not make use of it?


Because then you're stuck on that device only.


The github just has a few KB of python that looks like an install script. How is this used from C++ ?


The repeated safety testing delays might not be purely about technical risks like misuse or jailbreaks. Releasing open weights means relinquishing the control OpenAI has had since GPT-3. No rate limits, no enforceable RLHF guardrails, no audit trail. Unlike API access, open models can't be monitored or revoked. So safety may partly reflect OpenAI's internal reckoning with that irreversible shift in power, not just model alignment per se. What do you guys think?


I think it's pointless: if you SFT even their closed source models on a specific enough task, the guardrails disappear.

AI "safety" is about making it so that a journalist can't get out a recipe for Tabun just by asking.


True, but there's still a meaningful difference in friction and scale. With closed APIs, OpenAI can monitor for misuse, throttle abuse and deploy countermeasures in real-time. With open weights, a single prompt jailbreak or exploit spreads instantly. No need for ML expertise, just a Reddit post.

The risk isn’t that bad actors suddenly become smarter. It’s that anyone can now run unmoderated inference and OpenAI loses all visibility into how the model’s being used or misused. I think that’s the control they’re grappling with under the label of safety.


Given that the best jailbreak for an off-line model is still simple prompt injection, which is a solved issue for the closed source models… I honestly don’t know why they are talking about safety much at all for open source.


OpenAI and Azure both have zero retention options, and the NYT saga has given pretty strong confirmation they meant it when they said zero.


I think you're conflating real-time monitoring with data retention. Zero retention means OpenAI doesn't store user data, but they can absolutely still filter content, rate limit and block harmful prompts in real-time without retaining anything. That's processing requests as they come in, not storing them. The NYT case was about data storage for training/analysis not about real-time safety measures.


Ok you're off in the land of "what if" and I can just flat out say: If you have a ZDR account there is no filtering on inference, no real-time moderation, no blocking.

If you use their training infrastructure there's moderation on training examples, but SFT on non-harmful tasks still leads to a complete breakdown of guardrails very quickly.


The modular ERP/MES/QMS approach is interesting and challenges traditional manufacturing processes. Most manufacturers obsess over single source of truth. (I.e. ensuring a part number means exactly the same thing across planning, production, and quality systems.) On the one hand, breaking these into separate apps creates potential data consistency risks. On the other hand, it could enable much better adoption. Start with MES for shop floor visibility then add QMS for compliance later rather than massive all-in-one ERP implementations that often fail. Curious, how are you handling data consistency across modules? What's been the feedback from your current or potential customers on this approach versus traditional monolithic ERP systems?


hey founder here. they are separate apps, but use the same database, and same api. i'm also a big believer in single-source-of-truth and the compound startup idea


You're responding to an LLM by the way.


Ah gotcha. Makes sense to get the benefits of modular adoption without the headaches. Nice approach.


I really like seeing the segmented buffer approach. It's basically the rope data structure trick I used to hand-roll in userland with libraries like fast-json-stringify, now native and way cleaner. Have you run into the bailout conditions much? Any replacer, space, or custom .toJSON() kicks you back to the slow path?


Really like your approach of using existing Postgres/MySQL instead of dragging in Redis. It feels genuinely drop-in, but still Sidekiq-class. I know it's a bit early to ask about production patterns, but I was curious: if the worker thread flood hits the same Postgres that serves the web API, how do the job-fetch queries avoid contending with OLTP traffic? Does Sidequest follow Oban's advisory-lock approach or use a different throttling strategy?


Hello! One of the creators of Sidequest here.

Great question!

Sidequest uses transaction-level row locks (`SELECT ... FOR UPDATE SKIP LOCKED`) when fetching jobs. This means each worker thread only locks the specific job rows it’s about to process, minimizing lock contention and avoiding blocking other queries. This pattern is inspired by Oban’s advisory-lock approach, but instead of using explicit advisory locks, Sidequest relies on the database’s built-in row locking mechanism.

The only drawback is that Sidequest will require one or two connections to your DB. If you enqueue jobs from within other jobs, then each job that requests an enqueue will also connect to your DB (lazily connected upon request - if your job does not enqueue, no connection is created). However, you can configure the number of concurrent workers per queue and globally, so you control how much load Sidequest puts on your database.

I hope that answers your question :)


Oban doesn't use advisory locks for fetching jobs (unless there is uniqueness involved)—it uses `FOR UPDATE SKIP LOCKED` as well to pull jobs.


Thanks for the clarification. That's a clean approach. I just stared your repo. Looking forward to seeing where sidequest.js goes :)


The real win isn't static vs dynamic typing. It's immediate, structured feedback for LLM iteration. cargo check gives the LLM a perfectly formatted error it can fix in the next iteration. Python's runtime errors are often contextless ('NoneType has no attribute X') and only surface after execution. Even with mypy --strict, you need discipline to check it constantly. The compiler makes good feedback unavoidable.


Printing stack traces generates a lot of useful context but it's not done enough.


Interesting approach, but I'm curious about the practical cost considerations. A 1,000-agent simulation could easily be hundreds of thousands of API calls. The repo recommends gpt-4o-mini over gpt-4 and supports local Llama models, but there's no guidance on the performance trade-offs.

Would love to see cost-per-experiment breakdowns and quality benchmarks across model tiers. Does a local Llama 3.1 8B produce meaningful economic simulations or do you need the reasoning power of frontier models? This could be the difference between $5 and $500 experiments.


Using smaller, cheaper agents is one of the goals of the work. There is a Pareto frontier though: by using smaller, faster, cheaper agents, the number of steps required to converge increases. We touch upon this briefly in the paper


Thanks. That Pareto trade-off is exactly what I'm trying to quantify not just qualify. For example, if I've got a $50 budget, what's the sweet spot?

Scenario A: 100 agents × GPT-4o-mini × 500 steps Scenario B: 500 agents × local Llama 3-8B × 1,000+ steps

A quick table like "X agents × Y model × Z steps → tokens, $, convergence score" in the README would let new users budget experiments without having to read the whole paper plus run expensive experiments just to discover basic resource planning.


We ran each method in under 24 hours on a singular H100. I understand your point and think we will include this in future iterations of our work since this is very interesting from the user perspective. Though, in the paper we focus more on algorithmic concerns.


I'll look out for future iterations. Thanks and good luck with the paper.


I've built a couple of MCPs and what jumps out about this repo is the clean split into three MCP tools—unit, fuzz, and coverage—so an LLM can sequence tests just like any other commands. By keeping the AI layer thin and the orchestration declarative, you could swap in Hypothesis or mutation-testing backends without rewriting the workflow. It's a solid pattern for anyone grafting AI onto an existing unittest stack.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: