Skip to main content

Code Hinting for Regular Expressions : The Theory

I haven't heard much feedback on my code hinting post in January, perhaps because the first demo is not too impressive. And in fact I haven't been able to work on it as much as I wanted. But I'm still thinking about it and planning the user interface. There's still a lot to do to make this as well polished as Intellisense is in Visual Studio or Expression Blend. In fact I'm drawing more inspiration from the Expression Blend interface, but that's another story.

When I first started on this I knew that I needed to parse the regular expression on the fly and display the hints based on the text next to the caret. I begun by researching parsing theory. Now, I know what you're thinking. Why bother? Why not just use regular expressions? Well, I knew that regular expressions are ill-suited to parse HTML, and I thought that they may not be the best choice for parsing themselves, either. The reason I thought this, it turns out, was completely valid. Regular expressions are designed to parse regular languages. The regular expression syntax, however, is a context-sensitive language as listed in Chomsky's hierarchy. Beyond that I was somewhat lost when reading the complex labyrinth of articles on Wikipedia related to parsing theory, seemingly requiring a PhD to comprehend. But I was now equipped with some crucial knowledge with which I could write a parsing algorithm.

As I've often done, I wrote this algorithm completely from scratch so that I can understand it completely. After I wrote it and verified that it worked I determined that what I came up with is an embedded pushdown automaton capable of parsing mildly context sensitive languages. Perfect.

Essentially what this comes down to is that my algorithm is sophisticated enough to show the proper hints in the proper contexts. For example if you're starting a regular expression with a beginning parenthesis "(" then I should show a list of every possible construct that begins with a parenthesis: groups, lookarounds, comments, etc. But a beginning parenthesis "(" does not have the same meaning inside a bracket expression "[]". That's because a bracket expression follows its own rules. Secondly my algorithm is forward-looking and holds a memory as it parses from left-to-right, making it obscenely efficient for this task. Even with a huge 500+ character regular expression it'll know exactly what context the regular expression is in wherever the caret is at the time. And it'll do this in under 10 ms, faster than the human eye could even see it. This is almost too fast for your own good, but that's OK. I'm encouraged by how well it's working more than ever.

Stay tuned for updates.

UPDATE: Code hinting has been released!

Comments

Popular posts from this blog

Regex Hero for Windows 10 is Underway

Awhile back I began working on an HTML5 / JavaScript version of Regex Hero . However, it was a huge undertaking essentially requiring a complete rewrite of the entire application. I have not had enough time to dedicate to this lately. So I've begun again, this time rewriting Regex Hero to work in WPF. It'll be usable in Windows 10 and downloadable from the Microsoft Store. This is a much easier task that also has the advantage of running the .NET regex library from the application itself. This will allow for the same speedy experience of testing your regular expressions and getting instant feedback that Regex Hero users have always enjoyed. I expect the first release to be ready in Q4 of 2019.

Optimizing Your Regular Expressions

Regular expressions will backtrack.  That's an unfortunate thing about them because backtracking can be slow.    And in certain (rare) cases the performance can become so awful that executing the regular expression against a relatively short string could take over a minute.  There's a good article about catastrophic backtracking over at regular-expressions.info . And today I created a video about all of this called  Regex Lesson 5: Optimization .  In the video I start with a very poorly written regular expression and make several improvements to it, using the benchmarking feature along the way.  By the end of the video I make the regular expression over 3 million times faster. In addition, today's update to Regex Hero provides a little message in the event that you encounter a regular expression that takes over 10 seconds to evaluate... And then last of all, I changed the benchmarking feature a bit.  In the past it would simply test your regular expression against

Installer for Desktop version of Regex Hero

As Firefox just dropped support for Silverlight I really needed a solution for Regex Hero. So I created an installer for it. It's still Silverlight, but by using the installer you can install it directly to your computer and never need to open it in a browser. If you visit regexhero.net/tester and don't have a browser that supports Silverlight (IE is the only one left), then you'll see download links for 64-bit and 32-bit versions of the installer. When you install it you'll see a link over on the right hand side that says "Activate Regex Hero". This process connects your desktop version of Regex Hero with your online account. So if you're a licensed user this is how you'll gain access to all of those licensed features. Note: Currently there will be a security warning when you try to run either one of these installers. I'm in the process of obtaining a code signing certificate and will update them as soon as possible. You may be wondering a