The problem is that email validation is really better suited to being done by a push-down automata (rather than a finite state machine, which is what true regexes are). For better or for worse, perl regular expressions aren't strictly finite state machines though - so you can kind of extend them beyond what is sane.
P.S. Here's a ridiculously cool regex that can be used to determine if a number is prime: /^1?$|^(11+?)\1+$/
If you've got a few hours (a few seconds for cperciva ;-)) to waste, try and figure it out :-D
well, you're right - there are some more details. it needs a bit of wrapper code around it.
It actually tests if n is composite, when presented with a pattern of n ones. E.g., '111' doesn't match so it is prime, and '1111' does match so it's composite.
Wow, this is really amazing (to me at least because I had not seen or thought of this before).
[ Warning: Spoiler! :) ]
It's in base 1 (tally system).
The first part up to the "|" matches the case where number 1 is the whole string, covering the fact of 1 not being considered prime. It also matches no string, I guess to mean 0.
Next part matches a captured string consisting of 2 or more ones, and the captured string must be repeated at least one time. That is, 2+ repetitions of chunks of 2+ ones.
This much is obvious, but it wasn't until I wrote it out that I realized what it is doing:
multiples of 2
11+11 or 11+11+11 or 11+11+11+11 or ... = 4, 6, 8, 10, 12, ...
multiples of 3
111+111 or 111+111+111 or 111+111+111+111 or ... = 6, 9, 12, ...
multiples of 4
1111+1111 or 1111+1111+1111 or ... = 8, 12, 16, 20, ...
multiples of 5
11111+11111 or 11111+11111+11111 or ... = 10, 15, 20, 25, ...
Indeed: technically, it's impossible. Email addresses, per RFC 822, are not a regular language -- though it's doable in Perl because Perl "regular expressions" are unapologetically more powerful than real regular expressions.
RFC 822, section 3.4.3, defines comments in email addresses to be surrounded by matching parentheses, and which nest -- a classic example of a language which is not regular.
Chalk this up to "stupid technicalities I learned one day when I decided to write a regex to match RFC 822 perfectly". It turned out to be a fun but useless exercise, because most email programs don't accept half the crap that's in RFC 822 anyway.
He means "regex fu": "fu" as in "kung fu", not "foo bar". In other words, the construction is "[arbitrary thing] fu". Since hackers often use "foo" for an arbitrary thing, perhaps this is better rendered as "foo fu". In Ruby we might write it as follows:
class String
def foo_fu?
foo = ".*"
self =~ /#{foo} fu/
end
end
The reason why people hate regexes are because people use them as a general purpose programming language instead of a concise way of matching strings. Note that the regex I use just says to "split the string every third digit" and the rest of the logic is actual code.
Actually, writing a correct (per RFC) regex to recognize email addresses is far from simple.