Skip to main content

Code Hinting for Regular Expressions : The Theory

I haven't heard much feedback on my code hinting post in January, perhaps because the first demo is not too impressive. And in fact I haven't been able to work on it as much as I wanted. But I'm still thinking about it and planning the user interface. There's still a lot to do to make this as well polished as Intellisense is in Visual Studio or Expression Blend. In fact I'm drawing more inspiration from the Expression Blend interface, but that's another story.

When I first started on this I knew that I needed to parse the regular expression on the fly and display the hints based on the text next to the caret. I begun by researching parsing theory. Now, I know what you're thinking. Why bother? Why not just use regular expressions? Well, I knew that regular expressions are ill-suited to parse HTML, and I thought that they may not be the best choice for parsing themselves, either. The reason I thought this, it turns out, was completely valid. Regular expressions are designed to parse regular languages. The regular expression syntax, however, is a context-sensitive language as listed in Chomsky's hierarchy. Beyond that I was somewhat lost when reading the complex labyrinth of articles on Wikipedia related to parsing theory, seemingly requiring a PhD to comprehend. But I was now equipped with some crucial knowledge with which I could write a parsing algorithm.

As I've often done, I wrote this algorithm completely from scratch so that I can understand it completely. After I wrote it and verified that it worked I determined that what I came up with is an embedded pushdown automaton capable of parsing mildly context sensitive languages. Perfect.

Essentially what this comes down to is that my algorithm is sophisticated enough to show the proper hints in the proper contexts. For example if you're starting a regular expression with a beginning parenthesis "(" then I should show a list of every possible construct that begins with a parenthesis: groups, lookarounds, comments, etc. But a beginning parenthesis "(" does not have the same meaning inside a bracket expression "[]". That's because a bracket expression follows its own rules. Secondly my algorithm is forward-looking and holds a memory as it parses from left-to-right, making it obscenely efficient for this task. Even with a huge 500+ character regular expression it'll know exactly what context the regular expression is in wherever the caret is at the time. And it'll do this in under 10 ms, faster than the human eye could even see it. This is almost too fast for your own good, but that's OK. I'm encouraged by how well it's working more than ever.

Stay tuned for updates.

UPDATE: Code hinting has been released!

Comments

Popular posts from this blog

Regex Hero for Windows 10 is Underway

Awhile back I began working on an HTML5 / JavaScript version of Regex Hero . However, it was a huge undertaking essentially requiring a complete rewrite of the entire application. I have not had enough time to dedicate to this lately. So I've begun again, this time rewriting Regex Hero to work in WPF. It'll be usable in Windows 10 and downloadable from the Microsoft Store. This is a much easier task that also has the advantage of running the .NET regex library from the application itself. This will allow for the same speedy experience of testing your regular expressions and getting instant feedback that Regex Hero users have always enjoyed. I expect the first release to be ready in Q4 of 2019.

Silverlight 4 Coming in April, or Maybe Sooner

The exact release date has not been announced. But Visual Studio 2010 RTM is coming out in April and I think it's safe to assume that Silverlight 4 will be released no later than that. Each release of Silverlight has brought massive improvements over the previous version. And once again, Silverlight 4 does not disappoint. There is a long list of improvements but the ones that I think that will affect Regex Hero are as follows: RichTextBox My plan is to use this in place of all 4 major textboxes in Regex Hero. The new RichTextBox has built-in multiple undos & redos, so I can ditch my home-brewed code. It should be nice to use for syntax highlighting for the regular expressions I intend to create. It also has a built-in API to determine the pixel position of the text. I should be able to use this API and build a new highlighting scheme based off of it. This should do a couple things. First, I should be able to finally fix the problem I had with the ScrollViewer and

Installer for Desktop version of Regex Hero

As Firefox just dropped support for Silverlight I really needed a solution for Regex Hero. So I created an installer for it. It's still Silverlight, but by using the installer you can install it directly to your computer and never need to open it in a browser. If you visit regexhero.net/tester and don't have a browser that supports Silverlight (IE is the only one left), then you'll see download links for 64-bit and 32-bit versions of the installer. When you install it you'll see a link over on the right hand side that says "Activate Regex Hero". This process connects your desktop version of Regex Hero with your online account. So if you're a licensed user this is how you'll gain access to all of those licensed features. Note: Currently there will be a security warning when you try to run either one of these installers. I'm in the process of obtaining a code signing certificate and will update them as soon as possible. You may be wondering a