A fellow Regex Hero user emailed me this morning about an inconsistency between what they were seeing in Regex Hero vs what they were seeing with the RegularExpression DataAnnotation in ASP.NET MVC.
As a test, they were trying the following regular expression...
^A|B|C$
Against this string...
ABC
In Regex Hero you see 3 matches (as you should). But in ASP.NET MVC it doesn't match, and validation fails.
It's most plainly visible if you open up the Javascript file that comes with an MVC project template, MicrosoftMvcValidation.js. Inside you'll find this bit of code...
As a test, they were trying the following regular expression...
^A|B|C$
Against this string...
ABC
In Regex Hero you see 3 matches (as you should). But in ASP.NET MVC it doesn't match, and validation fails.
Why is this?
As it turns out, the RegularExpressionAttribute is hiding an implementation detail. In fact, I couldn't even find any mention of this on MSDN.It's most plainly visible if you open up the Javascript file that comes with an MVC project template, MicrosoftMvcValidation.js. Inside you'll find this bit of code...
var $0=new RegExp(this.$0);
var $1=$0.exec(value);
return (!Sys.Mvc._ValidationUtil.$0($ 1) && $1[0].length===value.length);
I've highlighted the important piece. It's checking to make sure the text value's length is equal to the match's length. In other words, the validation is designed to prevent any extraneous input beyond what a single regular expression match contains. And it does the same thing server side as well.
So for RegularExpressionAttribute testing purposes in Regex Hero you could just always use ^ and $ anchors for every regular expression. And then check to make sure that you're getting 1 match, and 1 match only.
I usually use grouping or non-matching groups around choice operators. I believe Learning Perl suggests doing this to avoid such confusion.
ReplyDeleteI'm not sure Microsoft's implementation is *wrong* per se. It's honoring the start and end anchors with a higher precedence than the choice operator. I looked briefly for the specification of this behavior, but my power went out.
Suppose you want to find any single line which contains A, B, or C. Using the pattern ^A|B|C$ and the input texts:
A
B
C
ABC
One would expect the last line to not match, but the choice operator is greedy and causes the last line to provide 3 matches.
On the other hand, if you use a non-matching group pattern, you have the expected results: ^(?:A|B|C)$ matches only the first three lines.
My advice is to always 'embrace your choices'. Cheesy? Yes. But, it works to avoid confusion.