Regular expressions are amazing things. They can be used for almost anything and are very flexible. This looks at how they can be used for checking whether characters have been escaped or not.
The other day, I was writing a LaTeX document using nano. I needed to put a percentage in, but, in LaTeX, the percent sign is used to signify the start of a comment for that line (think // in C/C++). The syntax highlighter was based on regular expressions, but just highlighted everything after the percent sign, even when I had escaped it. So began my quest for escaping characters!
A note on the syntax in this - all of the regular expressions in this have quotes around them. They are NOT part of the expression
Initially, I just came up with a fairly bad but quick attempt to escape the comment. The regex that was already there was "%.*". Thinking very simply, I modified this to be anything other than a backslash followed by a % to be a comment, and resulted in this: "[^\]%.*".
Although this was simple, it did have its drawbacks. Now, if you had a comment at the beginning of the line, it wasn't highlighted. Also, if you had escaped the backslash, it still would be treated as a comment. Although this isn't a problem in LaTeX, I wanted something that I could apply to any language.
The first change to make was to make it work if it was the first character. This was fairly simple - regular expressions have a character, ^, to match the start of a line. Since I wanted this or not a backslash, I used the pipe operator, and came up with this: "(^|[^\])%.*". This worked perfectly for LaTeX, as you can't escape backslashes.
A cautionary note: this is where the explanation gets a little complicated. If you just want the expression, scroll down.
Since I was in the mood for adapting this regex further, I decided to change it so it could handle backslashes. I also changed the character to be a double quote mark ("), since this is more likely to be escaped.
A character will be escaped if it is preceded by an odd number of backslashes. As far as I know, there's no way to do this explicitly in regular expressions. So, I had to be creative.
An odd number is the same as an even number plus one. An even number will always be twice any integer. Any number of things could be represented by using *, and twice an integer could be used by having two backslashes an infinite number of times. However, a literal backslash in regular expressions must itself be escaped (confusing, hey?). This means that two backslashes would be \\\\, and an even number of backslashes by (\\\\)*. An odd number can therefore be (\\\\)*\\. With the quote at the end of this, it will be: "(\\\\)*\\"".
However, this still does not quite work. If there is an even number of slashes, all but the first one will be matched. This is not what we want. We need to have something other than a backslash at the beginning of it. We can re-use part of an expression from earlier - "(^|[^\])". This makes the regex "(^|[^\])(\\\\)*\\"". This is the final expression to match an unescaped quote mark.
In summary, regular expressions can be used to do pretty much anything. Unescaped characters can be found by the regular expressions (^|[^\])(\\\\)*\\