Skip to main content

When to use and when not to use Regular Expressions

Regular expressions are powerful.  There's no doubt about it.  The .NET and Perl-derived implementations in particular are rich and capable.

For the most part regular expressions are there to save you time from parsing text the hard way.  But if you're spending more time bending regular expressions to your will to accomplish something that could be done more easily and efficiently with procedural code, then that kind of defeats the purpose.

I've wanted to write this article for awhile.  Then today I stumbled across this StackOverflow question which is a prime example where the procedural solution was actually quicker & easier to write, more understandable, and more efficient.  Your ability to identify these situations will improve naturally with experience.  But I thought I'd list a few good & bad scenarios for regular expressions...


Good
Data validation can be done easily and concisely with regular expressions in most cases.


Good
Syntax highlighting can be done with regular expressions, or in some cases a combination of procedural code with regex.


Good
Performing search or search/replace operations on documents can be done with regex internally without much trouble.  There are cases where the traditional wildcard search isn't powerful enough and a regex may be needed to find words near neighboring words or punctuation.


Bad
Public websites should not allow users to enter regular expressions for searching.  Giving the full power of regex to the general public for a website's search engine could have a devastating effect.  There is such a thing as a regular expression denial of service (ReDoS) attack that should be avoided at all costs.


Bad
HTML/XML parsing should not be done with regular expressions.  First of all, regular expressions are designed to parse a regular language which is the simplest among the Chomsky hierarchy.  Now, with the advent of balancing group definitions in the .NET flavor of regular expressions you can venture into slightly more complex territory and do a few things with XML or HTML in controlled situations.  However, there's not much point.  There are parsers available for both XML and HTML which will do the job more easily, more efficiently, and more reliably.  In .NET, XML can be handled the old XmlDocument way or even more easily with Linq to XML.  Or for HTML there's the HTML Agility Pack.


Conclusion
Regular expressions have their uses.  I still contend that in many cases they can save the programmer a lot of time and effort. Of course, given infinite time & resources, one could almost always build a procedural solution that's more efficient than an equivalent regular expression.

Your decision to abandon regex should be based on 3 things:
1.) Is the regular expression so slow in your scenario that it has become a bottleneck?
2.) Is your procedural solution actually quicker & easier to write than the regular expression?
3.) Is there a specialized parser that will do the job better?


There's a general piece of advice that can be applied to many things and this is certainly one of them: when there's a better tool available, use it.

Comments

Popular posts from this blog

Regex Hero for Windows 10 is Underway

Awhile back I began working on an HTML5 / JavaScript version of Regex Hero . However, it was a huge undertaking essentially requiring a complete rewrite of the entire application. I have not had enough time to dedicate to this lately. So I've begun again, this time rewriting Regex Hero to work in WPF. It'll be usable in Windows 10 and downloadable from the Microsoft Store. This is a much easier task that also has the advantage of running the .NET regex library from the application itself. This will allow for the same speedy experience of testing your regular expressions and getting instant feedback that Regex Hero users have always enjoyed. I expect the first release to be ready in Q4 of 2019.

Silverlight 4 Coming in April, or Maybe Sooner

The exact release date has not been announced. But Visual Studio 2010 RTM is coming out in April and I think it's safe to assume that Silverlight 4 will be released no later than that. Each release of Silverlight has brought massive improvements over the previous version. And once again, Silverlight 4 does not disappoint. There is a long list of improvements but the ones that I think that will affect Regex Hero are as follows: RichTextBox My plan is to use this in place of all 4 major textboxes in Regex Hero. The new RichTextBox has built-in multiple undos & redos, so I can ditch my home-brewed code. It should be nice to use for syntax highlighting for the regular expressions I intend to create. It also has a built-in API to determine the pixel position of the text. I should be able to use this API and build a new highlighting scheme based off of it. This should do a couple things. First, I should be able to finally fix the problem I had with the ScrollViewer and

Installer for Desktop version of Regex Hero

As Firefox just dropped support for Silverlight I really needed a solution for Regex Hero. So I created an installer for it. It's still Silverlight, but by using the installer you can install it directly to your computer and never need to open it in a browser. If you visit regexhero.net/tester and don't have a browser that supports Silverlight (IE is the only one left), then you'll see download links for 64-bit and 32-bit versions of the installer. When you install it you'll see a link over on the right hand side that says "Activate Regex Hero". This process connects your desktop version of Regex Hero with your online account. So if you're a licensed user this is how you'll gain access to all of those licensed features. Note: Currently there will be a security warning when you try to run either one of these installers. I'm in the process of obtaining a code signing certificate and will update them as soon as possible. You may be wondering a