Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: ai-tokenizer – 5-7x faster than tiktoken with AI SDK support (coder.github.io)
1 point by kylecarbs 3 months ago | hide | past | favorite
Hey HN! I built an AI tokenizer that's 5-7x faster than tiktoken using pure JavaScript - no WebAssembly needed.

Born from frustration with existing packages:

- Existing libraries don't support AI SDK messages and tools. Tools consume massive tokens (549 for adding a basic Claude tool [1]), but there's no way to count them. ai-tokenizer has native AI SDK support with per-tool breakdowns.

- Most models don't publish exact tokenizers. We run real API calls at build-time to find the most accurate public BPE tokenizer for each model, then apply calibration weights to achieve 97-99% accuracy [2].

- WebAssembly isn't necessary for great performance and reduces portability. ai-tokenizer precompiles BPE vocabularies into optimized hashmaps, achieving 5-7x faster performance than tiktoken [3].

Live Demo: https://coder.github.io/ai-tokenizer

Repository: https://github.com/coder/ai-tokenizer

[1]: https://github.com/coder/ai-tokenizer/blob/main/src/models.j...

[2]: https://github.com/coder/ai-tokenizer/blob/main/scripts/find...

[3]: https://github.com/coder/ai-tokenizer#performance



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: