Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm sure the author is aware, but awk has at least three implementations: nawk (the one true), gawk (what most are using), and mawk (performance-oriented, unmaintained). Plus busybox-awk.

When benchmarking gawk, I've found using LANG=C and avoiding UTF-8 to make a substantial difference for pattern matching.



Mawk is currently maintained: http://invisible-island.net/mawk/mawk.html


> I've found using LANG=C

This is also true for grep and tr and sort - the unicode handling does impact speed quite a bit and I still don't understand why this here https://stackoverflow.com/q/20226851/772013 is not treated as a bug




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: