I'm sure the author is aware, but awk has at least three implementations: nawk (the one true), gawk (what most are using), and mawk (performance-oriented, unmaintained). Plus busybox-awk.
When benchmarking gawk, I've found using LANG=C and avoiding UTF-8 to make a substantial difference for pattern matching.
This is also true for grep and tr and sort - the unicode handling does impact speed quite a bit and I still don't understand why this here https://stackoverflow.com/q/20226851/772013 is not treated as a bug
When benchmarking gawk, I've found using LANG=C and avoiding UTF-8 to make a substantial difference for pattern matching.