Agree with your pain points. One thing id add is GitHub makes you reapprove every PR after each push. As an OSS contributor it’s exhausting to chase re-approvals for minor tweaks.
The headline feature isn’t the 25 MB footprint alone. It’s that KittenTTS is Apache-2.0. That combo means you can embed a fully offline voice in Pi Zero-class hardware or even battery-powered toys without worrying about GPUs, cloud calls, or restrictive licenses. In one stroke it turns voice everywhere from a hardware/licensing problem into a packaging problem. Quality tweaks can come later; unlocking that deployment tier is the real game-changer.
yeah, we are super excited to build tiny ai models that are super high quality. local voice interfaces are inevitable and we want to power those in the future. btw, this model is just a preview, and the full release next week will be of much higher quality, along w another ~80M model ;)
The issue is even bigger: phonemizer is using espeak-ng, which isn't very good at turning graphemes into phonemes. In other TTS which rely on phonemes (e.g. Zonos) it turned out to be one of the key issues which cause bad generations.
And it isn't something you can fix, because the model was trained on bad phonemes (everyone uses Whisper + then phonemizes the text transcript).
> IANAL, but AFAICS this leaves 2 options, switching the license or removing that dependency.
There is a third option: asking the project for an exception.
Though that is unlikely to be granted¹ leaving you back with just the other two options.
And of course a forth choice: just ignore the license. This is the option taken by companies like Onyx, whose products I might otherwise be interested in…
----
[1] Those of us who pick GPL3 or AGPL generally do so to keep things definite and an exception would muddy the waters, also it might not even be possible if the project has many maintainers as relicensing would require agreement from all who have provided code that is in the current release. Furthermore, if it has inherited the license from one of its dependencies, an exception is even less practical.
Ah, yes, good catch, I didn't look deeper into the dependency tree at all. I'll update my footnote to include that as one of the reasons an exception may be impossible (or at least highly impractical).
A fourth option would be a kind of dual-licensing: the project as-is is available under GPL-3.0, but the source code in this repository excluding any dependencies is also available under Apache 2.0
Any user would still effectively be bound by the GPL-3.0, but if someone can remove the GPL dependencies they could use the project under Apache
That is an option for the publisher of the library, not the consumer of it. If it isn't already done then asking for it to be done is the same as asking for an exception otherwise (option three).
The use of the library is four lines. Three set up the library (`phonemizer.backend.EspeakBackend(language="en-us", preserve_punctuation=True, with_stress=True)`), the other calls it (`phonemes_list = self.phonemizer.phonemize([text])`). Plus I guess the import statements. Even ignoring Google vs Oracle I don't think those lines by themselves meet any threshold of originality.
Obviously you can't run them (with the original library) without complying with the GPL. But I don't see why I couldn't independently of that also give you this text file under Apache 2.0 to do with as you want (which for the record still doesn't allow you to run them with the original library without complying with the GPL, but that'd be phoneme forcing you to do that, not this project)
You would have to be very specific about the dual-licensing to avoid confusion about what you are allowed to do under Apache conditions though. You can't just say "it's dual-licensed"
You could even extract out the parts that do not call the GPL library into an upstream project under the Apache 2.0 licence, and pull in both that and the GPL library in the downstream project, relying on Apache 2.0 -> GPL 3.0 compatibility instead of explicit dual licensing to allow the combined work to be distributed under GPLv3.
This would only apply if they were distributing the GPL licensed code alongside their own code.
If my MIT-licensed one-line Python library has this line of code…
run([“bash”, “-c”, “echo hello”])
…I’m not suddenly subject to bash’s licensing. For anyone wanting to run my stuff though, they’re going to need to make sure they themselves have bash installed.
(But, to argue against my own point, if an OS vendor ships my library alongside a copy of bash, do they have to now relicense my library as GPL?)
The FSF thinks it counts as a derivative work and you have to use the LGPL to allow linking.
However, this has never actually been proven in court, and there's many good arguments that linking doesn't count as a derivative work.
Old post by a lawyer someone else found (version 3 wouldn't affect this) [1]
For me personally I don't really understand how, if dynamic linking was viral, using linux to run code isn't viral. Surely at some level what linux does to run your code calls GPLed code.
It doesn't really matter though, since the FSF stance is enough to scare companies from not using it, and any individual is highly unlikely to be sued.
> For me personally I don't really understand how, if dynamic linking was viral, using linux to run code isn't viral. Surely at some level what linux does to run your code calls GPLed code.
The Linux kernel has an explicit exception for userspace software:
> NOTE! This copyright does not cover user programs that use kernel services by normal system calls
And the GPL also has an explicit exception for "system" software such as kernel, platform libraries etc.:
> The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
> The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work.
> This would only apply if they were distributing the GPL licensed code alongside their own code.
As far as I understand the FSF's interpretation of their license, that's not true. Even if you only dynamically link to GPL-licensed code, you create a combined work which has to be licensed, as a whole, under the GPL.
I don't believe that this extends to calling an external program via its CLI, but that's not what the code in question seems to be doing.
(This is not an endorsement, but merely my understanding on how the GPL is supposed to work.)
This is a false analogy. It's quite straightforward.
Running bash (via exec()/fork()/spawn()/etc) isn't the same as (statically or dynamically) linking with its codebase. If your MIT-licensed one-liner links to code that's GPL licensed, then it gets infected by the GPL license.
My interpretation of their FAQ[1] on it is that shelling out and IPC are fine, while linking is not. As you say, it's ultimately up to the courts to decide on.
GPL is for boomers at this point. Floppy disks? Distribution? You can use a tool but you cant change it? A DLL call means you need to redistribute your code but forking doesn't?
Yes, but if you use open source libraries for your closed source SaaS - thats fine. People get their software _over_ the network delivered to them in a VM (your browser).
The result can only be distributed under the terms of the GPL-3. That's actually a crucial difference: there's nothing preventing Kitten TTS from being Apache licensed, soliciting technical contributions under that license, and parts of its code being re-used in other software under that license. Yes, for the time being, this limits what you can do with Kitten TTS if you want to use the software as a whole (e.g. by embedding it into your product), but the license itself is still Apache and that can have value.
Okay, what's stopping you from feeding the code into an LLM and re-write it and make it yours? You can even add extra steps like make it analyze the code block by block then supervise it as it is rewriting it. Bam. AI age IP freedom.
Morals may stop you but other than that? IMHO all open source code is public domain code if anyone is willing to spend some AI tokens.
One person reads the code and produces a detailed technical specification. Someone reviews it to ensure that there is nothing in there that could be classified as copyrighted material, then a third person (who has never seen the original code) implements the spec.
You could use an LLM at both stages, but you'd have to be able to prove that the LLM that does the implementation had no prior knowledge of the code in question... Which given how LLMs have been trained seems to me to be very dubious territory for now until that legal situation gets resolved.
AI is useful in Chinese walling code, but it’s not as easy as you make it sound. To stay out of legal trouble, you probably should refactor the code into a different language, then back into the target language. In the end, it turns into a process of being forced to understand the codebase and supervising its rewriting. I’ve translated libraries into another language using LLMs, I’d say that process was 1/2 the labor of writing it myself. So in the end, going 2 ways, you may as well rewrite the code yourself… but working with the LLM will make you familiar with the subject matter so you -could- rewrite the code, so I guess you could think of it as a sort of buggy tutorial process?
I am not sure even that is enough. You would really need to do a clean room reimplementation to be safe - for exactly the same reasons that people writing code write clean room reimplementations.
Yeah, the algorithms and program flow would have to be materially distinct to be really safe. Maybe switching language paradigms would get that for you in most cases? Js->haskell->js? Sounds like a nightmare lol.
Tell me you don't know how to use LLMs properly without telling me.
You don't give the whole codebase to an LLM and expect it to have one shot output. Instead, you break it down and and write the code block by block. Then the size if the codebase doesn't matter. You use the LLM as a tool, it is not supposed to replace you. You don't try to become George from Jetsons who is just pressing a button and doesn't touch anything, instead you are on top of it as the LLM does the coding. You test the code on every step to see if the implementation behaves as expected. Do enough of this and you have proper, full "bespoke" software.
A Festival's English model, festvox-kallpc16k, is about 6 MB, and it is a large model; festvox-kallpc8k is about 3.5 MB.
eSpeak NG's data files take about 12 MB (multi-lingual).
I guess this one may generate more natural-sounding speech, but older or lower-end computers were capable of decent speech synthesis previously as well.
I'm not blind but spoken English it's far more difficult to grasp than written one (I'm a non-native speaker), and Flite runs on n270 netbooks at crazy speeds with really good enough voices.
What about the training data? Is everyone 100% confident that models are not a derived work of the training inputs now, even if they can reproduce input exactly?
I play around with a nvidia jetson orin nano super right now and its actually pretty usuable with gemma3:4b and quite fast - even image processing is done in like 10-20 seconds but this is with GPU support. When something is not working and ollama is not using the GPU this calls take ages because the cpu is just bad.
This opens up voice interfaces for medical devices, offline language learning tools, and accessibility gadgets for the visually impaired - all markets where cloud dependency and proprietary licenses were showstoppers.
The repeated safety testing delays might not be purely about technical risks like misuse or jailbreaks. Releasing open weights means relinquishing the control OpenAI has had since GPT-3. No rate limits, no enforceable RLHF guardrails, no audit trail. Unlike API access, open models can't be monitored or revoked. So safety may partly reflect OpenAI's internal reckoning with that irreversible shift in power, not just model alignment per se. What do you guys think?
True, but there's still a meaningful difference in friction and scale. With closed APIs, OpenAI can monitor for misuse, throttle abuse and deploy countermeasures in real-time. With open weights, a single prompt jailbreak or exploit spreads instantly. No need for ML expertise, just a Reddit post.
The risk isn’t that bad actors suddenly become smarter. It’s that anyone can now run unmoderated inference and OpenAI loses all visibility into how the model’s being used or misused. I think that’s the control they’re grappling with under the label of safety.
Given that the best jailbreak for an off-line model is still simple prompt injection, which is a solved issue for the closed source models… I honestly don’t know why they are talking about safety much at all for open source.
I think you're conflating real-time monitoring with data retention. Zero retention means OpenAI doesn't store user data, but they can absolutely still filter content, rate limit and block harmful prompts in real-time without retaining anything. That's processing requests as they come in, not storing them. The NYT case was about data storage for training/analysis not about real-time safety measures.
Ok you're off in the land of "what if" and I can just flat out say: If you have a ZDR account there is no filtering on inference, no real-time moderation, no blocking.
If you use their training infrastructure there's moderation on training examples, but SFT on non-harmful tasks still leads to a complete breakdown of guardrails very quickly.
The modular ERP/MES/QMS approach is interesting and challenges traditional manufacturing processes. Most manufacturers obsess over single source of truth. (I.e. ensuring a part number means exactly the same thing across planning, production, and quality systems.) On the one hand, breaking these into separate apps creates potential data consistency risks. On the other hand, it could enable much better adoption. Start with MES for shop floor visibility then add QMS for compliance later rather than massive all-in-one ERP implementations that often fail. Curious, how are you handling data consistency across modules? What's been the feedback from your current or potential customers on this approach versus traditional monolithic ERP systems?
hey founder here. they are separate apps, but use the same database, and same api. i'm also a big believer in single-source-of-truth and the compound startup idea
I really like seeing the segmented buffer approach. It's basically the rope data structure trick I used to hand-roll in userland with libraries like fast-json-stringify, now native and way cleaner. Have you run into the bailout conditions much? Any replacer, space, or custom .toJSON() kicks you back to the slow path?
Really like your approach of using existing Postgres/MySQL instead of dragging in Redis. It feels genuinely drop-in, but still Sidekiq-class. I know it's a bit early to ask about production patterns, but I was curious: if the worker thread flood hits the same Postgres that serves the web API, how do the job-fetch queries avoid contending with OLTP traffic? Does Sidequest follow Oban's advisory-lock approach or use a different throttling strategy?
Sidequest uses transaction-level row locks (`SELECT ... FOR UPDATE SKIP LOCKED`) when fetching jobs. This means each worker thread only locks the specific job rows it’s about to process, minimizing lock contention and avoiding blocking other queries. This pattern is inspired by Oban’s advisory-lock approach, but instead of using explicit advisory locks, Sidequest relies on the database’s built-in row locking mechanism.
The only drawback is that Sidequest will require one or two connections to your DB. If you enqueue jobs from within other jobs, then each job that requests an enqueue will also connect to your DB (lazily connected upon request - if your job does not enqueue, no connection is created). However, you can configure the number of concurrent workers per queue and globally, so you control how much load Sidequest puts on your database.
The real win isn't static vs dynamic typing. It's immediate, structured feedback for LLM iteration. cargo check gives the LLM a perfectly formatted error it can fix in the next iteration. Python's runtime errors are often contextless ('NoneType has no attribute X') and only surface after execution. Even with mypy --strict, you need discipline to check it constantly. The compiler makes good feedback unavoidable.
Interesting approach, but I'm curious about the practical cost considerations. A 1,000-agent simulation could easily be hundreds of thousands of API calls. The repo recommends gpt-4o-mini over gpt-4 and supports local Llama models, but there's no guidance on the performance trade-offs.
Would love to see cost-per-experiment breakdowns and quality benchmarks across model tiers. Does a local Llama 3.1 8B produce meaningful economic simulations or do you need the reasoning power of frontier models? This could be the difference between $5 and $500 experiments.
Using smaller, cheaper agents is one of the goals of the work. There is a Pareto frontier though: by using smaller, faster, cheaper agents, the number of steps required to converge increases. We touch upon this briefly in the paper
A quick table like "X agents × Y model × Z steps → tokens, $, convergence score" in the README would let new users budget experiments without having to read the whole paper plus run expensive experiments just to discover basic resource planning.
We ran each method in under 24 hours on a singular H100. I understand your point and think we will include this in future iterations of our work since this is very interesting from the user perspective. Though, in the paper we focus more on algorithmic concerns.
I've built a couple of MCPs and what jumps out about this repo is the clean split into three MCP tools—unit, fuzz, and coverage—so an LLM can sequence tests just like any other commands. By keeping the AI layer thin and the orchestration declarative, you could swap in Hypothesis or mutation-testing backends without rewriting the workflow. It's a solid pattern for anyone grafting AI onto an existing unittest stack.