A few weeks ago I came across an article describing “A regular expression to check for prime numbers“. I was intrigued because processes defined by formal regular languages have an inherently linear runtime while the process described is non-linear. This contradicts the definition of regular languages, specifically the Pumping Lemma.
So, how is this even possible? Well, the regular expression discussed in this article and the regex framework it’s used with makes clever use of backtracking for realizing this ‘magic’.
Quite simply, the languages used by common regular expression frameworks aren’t exactly regular but rather context-free through features such as backtracking and backreferencing. So, however nifty and in some cases useful such seemingly magical usages of regular expressions are you might still want to consider using another, more appropriate tool for solving problems that show non-linear behaviour – the most notorious one probably being parsing HTML.