Regular Expression Non-Capturing Groups (or, what’s "(?:" mean?)

Adam Compton October 10, 2012
Source

(originally from https://web.archive.org/web/20150422001421/http://ajcsystems.com/blog/blog/2012/10/10/regular-expression-non-capturing-groups-or-whats-mean/)

When I was working recently on changing poodledo‘s parser from using a lexer to using regular expressions, I found myself implementing some crazy logic to try and ignore the results from matched groups whose matches I didn’t actually want (since I was using parentheses for their “pick one of these two alternatives” function instead of their “pull out this specific chunk as a match” function).

For example, take the string “task title @context name #due date”. Starting with the at-sign, I want to match all words until some other symbol is reached; or, to avoid confusing the parser, match everything following the at-sign and between two square brackets. Here’s the regex I came up with:

Using this regex, though, gave me a lot of extraneous matching groups:

How do I determine programmatically which of these groups is the one I want?

Digging around in the Python regex documentation, I found a section about non-capturing groups, which seemed like it would save me a significant amount of sanity. Apparently, Perl-5-compatible regular expression parsers often implement a regex extension syntax by following an open parenthesis with a question mark, and then some other symbols that indicate which extension is in use. In this specific case, “(?:” indicates a group whose matches shouldn’t be captured in the output list. Eureka!

Now the only logic I need is “Ignore matching groups which are empty”, my sanity is preserved, and all is well with the world.

Discussion in the ATmosphere

Loading comments...