Regular Expressions

Regular Expressions #

Regular expressions are a powerful tool for describing patterns in text. They are used in several OBITools like obigrep , obiannotate or obiscript .

Single characters #

PatternDescription
.any character, possibly including newline (flag s=true)
[xyz]character class
[^xyz]negated character class
[[:alpha:]]ASCII character class
[[:^alpha:]]negated ASCII character class

Composites #

PatternDescription
xyx followed by y
x|yx or y (prefer x)

Repetitions #

PatternDescription
x*zero or more x, prefer more
x+one or more x, prefer more
x?zero or one x, prefer one
x{n,m}n or n+1 or … or m x, prefer more
x{n,}n or more x, prefer more
x{n}exactly n x
x*?zero or more x, prefer fewer
x+?one or more x, prefer fewer
x??zero or one x, prefer zero
x{n,m}?n or n+1 or … or m x, prefer fewer
x{n,}?n or more x, prefer fewer
x{n}?exactly n x

Grouping #

PatternDescription
(re)numbered capturing group (submatch)
(?P<name>re)named & numbered capturing group (submatch)
(?<name>re)named & numbered capturing group (submatch)
(?:re)non-capturing group
(?flags)set flags within current group; non-capturing
(?flags:re)set flags during re; non-capturing

Character classes #

PatternDescription
[\d]digits (== \d)
[^\d]not digits (== \D)
[\D]not digits (== \D)
[^\D]not not digits (== \d)
[[:name:]]named ASCII class inside character class (== [:name:])
[^[:name:]]named ASCII class inside negated character class (== [:^name:])
[\p{Name}]named Unicode property inside character class (== \p{Name})
[^\p{Name}]named Unicode property inside negated character class (== \P{Name})

Named character classes #

PatternDescription
[[:alnum:]]alphanumeric (== [0-9A-Za-z])
[[:alpha:]]alphabetic (== [A-Za-z])
[[:ascii:]]ASCII (== [\x00-\x7F])
[[:blank:]]blank (== [\t ])
[[:cntrl:]]control (== [\x00-\x1F\x7F])
[[:digit:]]digits (== [0-9])
[[:graph:]]graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~])
[[:lower:]]lower case (== [a-z])
[[:print:]]printable (== [ -~] == [[:graph:]])
[[:punct:]]punctuation (== [!-/:-@[-\`{-~])
[[:space:]]whitespace (== [\t\n\v\f\r ])
[[:upper:]]upper case (== [A-Z])
[[:word:]]word characters (== [0-9A-Za-z_])
[[:xdigit:]]hex digit (== [0-9A-Fa-f])