Code Hinting for Regular Expressions : The Theory

I haven't heard much feedback on my code hinting post in January, perhaps because the first demo is not too impressive. And in fact I haven't been able to work on it as much as I wanted. But I'm still thinking about it and planning the user interface. There's still a lot to do to make this as well polished as Intellisense is in Visual Studio or Expression Blend. In fact I'm drawing more inspiration from the Expression Blend interface, but that's another story.

When I first started on this I knew that I needed to parse the regular expression on the fly and display the hints based on the text next to the caret. I begun by researching parsing theory. Now, I know what you're thinking. Why bother? Why not just use regular expressions? Well, I knew that regular expressions are ill-suited to parse HTML, and I thought that they may not be the best choice for parsing themselves, either. The reason I thought this, it turns out, was completely valid. Regular expressions are designed to parse regular languages. The regular expression syntax, however, is a context-sensitive language as listed in Chomsky's hierarchy. Beyond that I was somewhat lost when reading the complex labyrinth of articles on Wikipedia related to parsing theory, seemingly requiring a PhD to comprehend. But I was now equipped with some crucial knowledge with which I could write a parsing algorithm.

As I've often done, I wrote this algorithm completely from scratch so that I can understand it completely. After I wrote it and verified that it worked I determined that what I came up with is an embedded pushdown automaton capable of parsing mildly context sensitive languages. Perfect.

Essentially what this comes down to is that my algorithm is sophisticated enough to show the proper hints in the proper contexts. For example if you're starting a regular expression with a beginning parenthesis "(" then I should show a list of every possible construct that begins with a parenthesis: groups, lookarounds, comments, etc. But a beginning parenthesis "(" does not have the same meaning inside a bracket expression "[]". That's because a bracket expression follows its own rules. Secondly my algorithm is forward-looking and holds a memory as it parses from left-to-right, making it obscenely efficient for this task. Even with a huge 500+ character regular expression it'll know exactly what context the regular expression is in wherever the caret is at the time. And it'll do this in under 10 ms, faster than the human eye could even see it. This is almost too fast for your own good, but that's OK. I'm encouraged by how well it's working more than ever.

Stay tuned for updates.

UPDATE: Code hinting has been released!

Regex Hero Blog

Search This Blog

Code Hinting for Regular Expressions : The Theory

Labels

Comments

Post a Comment

Popular posts from this blog

Installer for Desktop version of Regex Hero

Regex Hero Professional is Now Free

Regex Hero Super Speed, Super Intelligence, and Ultimate