14 February, 2013

Regex: matching some words but not others

Man, I hate with a full heart the inventor of Regex, even though I’ll admit that it really can come in handy at times. The main thing with Regex is that “nobody” that are just somewhat sane can maintain these “cyrilic” scripts and strings that Regex ends up in. It’s just gibberish to say the least!  Some would even argue that it looks like a quote from Captain Haddock in Tintin!

So – how does this beast look?

input: “here we WE go again” (the ‘”’ is not part of the input)

Regex Expression Result
\w

Explanation: all alphanumeric chars
h
e
r
e
w
e
W
E
g
o
a
g
a
i
b
\b\w+\b

Explanation: all alphanumeric chars that combines into a word
(1 <= char count < infinity)
here
we
WE
go
again
\b\w{1,3}\b

Explanation: all alphanumeric words
(1 <= char count <=3)
we
WE
go
\b\w{4,}\b

Explanation: all alphanumeric words
(4 <= char count < infinity)
here
again
\b\w{2}\b

Explanation: all alphanumeric words
(2 = char count)
we
WE
go
\b(?!\bwe\b)(\w{2})\b

Explanation: all alphanumeric words but NOT ‘we’
(2 = char count)
go
WE
\b(?!\bgo\b)(\w{2})\b

Explanation: all alphanumeric words but NOT ‘go’
(2 = char count)
we
WE
(?i)\b(?!\bwe\b)(\w{2})\b

Explanation: all alphanumeric words but NOT ‘we’
(2 = char count). Ignore case
go
(?i)\b(?!\bwe\b)(?!\bgo\b)(\w{2})\b

Explanation: all alphanumeric words but NOT ‘we’ OR ‘go’
(2 = char count). Ignore case
(nothing)

It still looks like gibberish to me!

No comments:

InRiver: Not loading your extensions?

(You really need to in the loop to appreciate the issue this post addresses). Man, I've been fighting this problem for hours before I ...