/^
( \\^+ | _+ | = )? # sharps and flats
( [a-gA-G] ) # notes
( ,+ | '+ )?
( \\d* ) # duration
( \\/ )? # fractions
( \\d* ) # more duration? I'd need to read the spec
$/x
and use different delimiters to remove escaping of slashes:
<^
( \\^+ | _+ | = )? # sharps and flats
( [a-gA-G] ) # notes
( ,+ | '+ )?
( \\d* ) # duration
( / )? # fractions
( \\d* ) # more duration? I'd need to read the spec
$>x
That's already much more readable, but then we have complaint number 2, composability. If you language supports it then use interpolation:
SHARPS_AND_FLATS = /\\^+ | _+ | =/x
NOTES = /[a-g]/i
<^
(#{SHARPS_AND_FLATS})? # sharps and flats
(#{NOTES}) # notes
( ,+ | '+ )?
( \\d* ) # duration
( / )? # fractions
( \\d* ) # more duration? I'd need to read the spec
$>x
You see how it could go on. I'm not saying regex are always the right solution and not all languages have access to the same tools but there's often a lot that can be done to mitigate the awfulness (and a lot more here, lookarounds, named captures etc).
The `x` modifier is the #1 fix, I never write anything above the most simple regex without it, and for the life of me I can't understand why it's hardly ever used.
Also, named capture groups. Often I find my code much more readable when I name different fragments of the regex and then refer to those names later on in my code.
Nice explanation (as expected from Fowler). Yep, mix that with a bit of interpolation and an iterator, maybe wrapped in an object and you have something readable and flexible. Not sure why people want to make regex any harder than they need to be?
in perl's "apocalypses" on pattern matching[0] (I.e. perl6 regexp design documents) Larry Wall immediately states that /x should be the default (and not be an option at all!) for regexes.
It's incredible how useful it is and yet by not being the default every regex implementation has nudged their users to write inscrutable regular expressions for decades.
My favourite regex site is Rexegg[1]. I thought it had died so I was using archive.org but it appears to be back. Unmatched, in my opinion.
Still, there's not much to the `x` modifier, you just need to know that it will ignore implicit whitespace and comments (# blah blah) so you must use `\s` or `\t` etc when you really want to match some whitespace. Otherwise you can do things like:
"Hello, world!".split(/ [ \s , ]+ # this splits on at least one space or comma /x)
Which is overkill for a simple pattern (then again, why not?) but unbelievably useful for a complex one. Add spaces and newlines until it's readable.
Implementation may vary, check your regex library’s docs (usually listed under “flags”). Generally there’s not much to it; specify the flag and then whitespace and some form of comment within the string is ignored.
Inscrutability - use the `x` modifier and allow whitespace and comments, e.g.
becomes: and use different delimiters to remove escaping of slashes: That's already much more readable, but then we have complaint number 2, composability. If you language supports it then use interpolation: You see how it could go on. I'm not saying regex are always the right solution and not all languages have access to the same tools but there's often a lot that can be done to mitigate the awfulness (and a lot more here, lookarounds, named captures etc).The `x` modifier is the #1 fix, I never write anything above the most simple regex without it, and for the life of me I can't understand why it's hardly ever used.