Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Some quick thoughts on the first two problems listed:

Inscrutability - use the `x` modifier and allow whitespace and comments, e.g.

    /^(\\^+|_+|=)?([a-gA-G])(,+|'+)?(\\d*)(\\/)?(\\d*)$
becomes:

    /^
      ( \\^+ | _+ | = )?  # sharps and flats
      ( [a-gA-G] )        # notes
      ( ,+ | '+ )?
      ( \\d* )            # duration
      ( \\/ )?            # fractions
      ( \\d* )            # more duration? I'd need to read the spec
    $/x
and use different delimiters to remove escaping of slashes:

    <^
      ( \\^+ | _+ | = )?  # sharps and flats
      ( [a-gA-G] )        # notes
      ( ,+ | '+ )?
      ( \\d* )            # duration
      ( / )?              # fractions
      ( \\d* )            # more duration? I'd need to read the spec
    $>x
That's already much more readable, but then we have complaint number 2, composability. If you language supports it then use interpolation:

    SHARPS_AND_FLATS = /\\^+ | _+ | =/x
    NOTES = /[a-g]/i

    <^
      (#{SHARPS_AND_FLATS})?  # sharps and flats
      (#{NOTES})              # notes
      ( ,+ | '+ )?
      ( \\d* )                # duration
      ( / )?                  # fractions
      ( \\d* )                # more duration? I'd need to read the spec
    $>x
You see how it could go on. I'm not saying regex are always the right solution and not all languages have access to the same tools but there's often a lot that can be done to mitigate the awfulness (and a lot more here, lookarounds, named captures etc).

The `x` modifier is the #1 fix, I never write anything above the most simple regex without it, and for the life of me I can't understand why it's hardly ever used.



Also, named capture groups. Often I find my code much more readable when I name different fragments of the regex and then refer to those names later on in my code.


Yes, that would be my next top tip too.

    SHARPS_AND_FLATS = /\\^+ | _+ | =/x
    NOTES = /[a-g]/i

    p = <^
      (?<pitch>#{SHARPS_AND_FLATS})?  # sharps and flats
      (?<notes>#{NOTES})              # notes
      ( ,+ | '+ )?
      ( \\d* )                # duration
      ( / )?                  # fractions
      ( \\d* )                # more duration? I'd need to read the spec
    $>x

    match_data = p.match(music)
    match_data[:pitch] # => "#"
    match_data[:notes] # => "ADEADE"
Much more readable.


I got a lot of mileage out of the x modifier, too, until I bumped into the Composed Regex pattern. https://www.martinfowler.com/bliki/ComposedRegex.html


Nice explanation (as expected from Fowler). Yep, mix that with a bit of interpolation and an iterator, maybe wrapped in an object and you have something readable and flexible. Not sure why people want to make regex any harder than they need to be?


in perl's "apocalypses" on pattern matching[0] (I.e. perl6 regexp design documents) Larry Wall immediately states that /x should be the default (and not be an option at all!) for regexes.

It's incredible how useful it is and yet by not being the default every regex implementation has nudged their users to write inscrutable regular expressions for decades.

[0] https://raku.org/archive/doc/design/apo/A05.html


I must give Perl 6 another look, I was using 5 up until about 2009 when I replaced it with Ruby. Be nice to see what's been done in that time.


Please note that Perl 6 has been renamed to Raku (https://raku.org using the #rakulang tag on social media).


Do you have a tutorial or article you recommend to read up on x-modifier?


My favourite regex site is Rexegg[1]. I thought it had died so I was using archive.org but it appears to be back. Unmatched, in my opinion.

Still, there's not much to the `x` modifier, you just need to know that it will ignore implicit whitespace and comments (# blah blah) so you must use `\s` or `\t` etc when you really want to match some whitespace. Otherwise you can do things like:

    "Hello, world!".split(/ [ \s , ]+ # this splits on at least one space or comma /x)
Which is overkill for a simple pattern (then again, why not?) but unbelievably useful for a complex one. Add spaces and newlines until it's readable.

It's a small change but a big effect.

[1] https://www.rexegg.com/regex-modifiers.html


Implementation may vary, check your regex library’s docs (usually listed under “flags”). Generally there’s not much to it; specify the flag and then whitespace and some form of comment within the string is ignored.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: