Okay but think: If you are searching a very large file for a very few occurrence... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		carapace on June 23, 2015 \| parent \| context \| favorite \| on: Regexes: The Bad, the Better, and the Best Okay but think: If you are searching a very large file for a very few occurrences of the expected match then this optimization is not so bad. If you are running line-by-line through a very large log file to extract just those two pieces of information per line, then throw away the first N characters in each line (where N is the hopefully-constant length of your timestamps plus that space char) and start the regex engine at the beginning of the expected match. Then it doesn't have to waste any time passing over those chars. Even if the exact details above aren't quite right the principal is (and is well-known): Avoid premature optimization! (And the corollary: Measure it. Profile your code, don't guess, you're probably wrong.)

thisrod on June 24, 2015 [–]

There's a higher order solution. Read the AWK book, do the exercises, then ignore blog posts about regexps. Make an exception when it's Russ Cox demonstrating how often this wheel has been reinvented in square form.

astangl on June 24, 2015 | [–]

Which AWK book? The one by A, W, and K?

thisrod on June 29, 2015 | | [–]

Yes.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact