Skip to main content

Code Hinting for Regular Expressions : The Theory

I haven't heard much feedback on my code hinting post in January, perhaps because the first demo is not too impressive. And in fact I haven't been able to work on it as much as I wanted. But I'm still thinking about it and planning the user interface. There's still a lot to do to make this as well polished as Intellisense is in Visual Studio or Expression Blend. In fact I'm drawing more inspiration from the Expression Blend interface, but that's another story.

When I first started on this I knew that I needed to parse the regular expression on the fly and display the hints based on the text next to the caret. I begun by researching parsing theory. Now, I know what you're thinking. Why bother? Why not just use regular expressions? Well, I knew that regular expressions are ill-suited to parse HTML, and I thought that they may not be the best choice for parsing themselves, either. The reason I thought this, it turns out, was completely valid. Regular expressions are designed to parse regular languages. The regular expression syntax, however, is a context-sensitive language as listed in Chomsky's hierarchy. Beyond that I was somewhat lost when reading the complex labyrinth of articles on Wikipedia related to parsing theory, seemingly requiring a PhD to comprehend. But I was now equipped with some crucial knowledge with which I could write a parsing algorithm.

As I've often done, I wrote this algorithm completely from scratch so that I can understand it completely. After I wrote it and verified that it worked I determined that what I came up with is an embedded pushdown automaton capable of parsing mildly context sensitive languages. Perfect.

Essentially what this comes down to is that my algorithm is sophisticated enough to show the proper hints in the proper contexts. For example if you're starting a regular expression with a beginning parenthesis "(" then I should show a list of every possible construct that begins with a parenthesis: groups, lookarounds, comments, etc. But a beginning parenthesis "(" does not have the same meaning inside a bracket expression "[]". That's because a bracket expression follows its own rules. Secondly my algorithm is forward-looking and holds a memory as it parses from left-to-right, making it obscenely efficient for this task. Even with a huge 500+ character regular expression it'll know exactly what context the regular expression is in wherever the caret is at the time. And it'll do this in under 10 ms, faster than the human eye could even see it. This is almost too fast for your own good, but that's OK. I'm encouraged by how well it's working more than ever.

Stay tuned for updates.

UPDATE: Code hinting has been released!

Comments

Popular posts from this blog

Regex Hero for Windows 10 is Underway

Awhile back I began working on an HTML5 / JavaScript version of Regex Hero . However, it was a huge undertaking essentially requiring a complete rewrite of the entire application. I have not had enough time to dedicate to this lately. So I've begun again, this time rewriting Regex Hero to work in WPF. It'll be usable in Windows 10 and downloadable from the Microsoft Store. This is a much easier task that also has the advantage of running the .NET regex library from the application itself. This will allow for the same speedy experience of testing your regular expressions and getting instant feedback that Regex Hero users have always enjoyed. I expect the first release to be ready in Q4 of 2019.

Optimizing Your Regular Expressions

Regular expressions will backtrack.  That's an unfortunate thing about them because backtracking can be slow.    And in certain (rare) cases the performance can become so awful that executing the regular expression against a relatively short string could take over a minute.  There's a good article about catastrophic backtracking over at regular-expressions.info . And today I created a video about all of this called  Regex Lesson 5: Optimization .  In the video I start with a very poorly written regular expression and make several improvements to it, using the benchmarking feature along the way.  By the end of the video I make the regular expression over 3 million times faster. In addition, today's update to Regex Hero provides a little message in the event that you encounter a regular expression that takes over 10 seconds to evaluate... And then last of all, I changed the benchmarking feature a bit.  In the past it would simply test your regular expression against

Silverlight 4 Coming in April, or Maybe Sooner

The exact release date has not been announced. But Visual Studio 2010 RTM is coming out in April and I think it's safe to assume that Silverlight 4 will be released no later than that. Each release of Silverlight has brought massive improvements over the previous version. And once again, Silverlight 4 does not disappoint. There is a long list of improvements but the ones that I think that will affect Regex Hero are as follows: RichTextBox My plan is to use this in place of all 4 major textboxes in Regex Hero. The new RichTextBox has built-in multiple undos & redos, so I can ditch my home-brewed code. It should be nice to use for syntax highlighting for the regular expressions I intend to create. It also has a built-in API to determine the pixel position of the text. I should be able to use this API and build a new highlighting scheme based off of it. This should do a couple things. First, I should be able to finally fix the problem I had with the ScrollViewer and