Thursday, 10 May 2012

Regex - regular expressions

A regular expression is a special string that describes a search pattern.

A regular expression(regex) is one or more non-empty branches, separated by '|'.
'|' - It matches one of the branches

A branch is one or more atoms, concatenated.

An atom is possibly followed by a '*', '+', '?', or bound.
* - 0 or more atom
+ - 1 or more atom
? - 0 or 1 atom
a{1,3} - atom a occurs between 1 to 3 times

An atom is a regular expression enclosed in '()' (matching a match for the regular expression), a bracket expression (see below),
'.' (matching any single character)
'^' (matching the null string at the beginning of a line)
'$' (matching the null string at the end of a line)
a `\' followed by one of the characters `^.[$()|*+?{\' (matching that character taken as an ordinary character) or a single character with no other significance (matching that character).
There is one more type of atom, the back reference: `\' followed by a non-zero decimal digit d matches the same sequence of characters matched by the d-th parenthesized subexpression (numbering subexpressions by the positions of their opening parentheses, left to right), so that (e.g.) `([bc])\1' matches `bb' or `cc' but not `bc'.

[0-9] or [a-z] for specify the character range
A bracket expression is a list of characters enclosed in '[]'. It normally matches any single character from the list.
If the list begins with '^', it matches any single character not from the rest of the list.
If two characters in the list are separated by `-', this is shorthand for the full range of characters between those two inclusive (e.g. '[0-9]' matches any decimal digit).
With the exception of ']','^','-' all other special characters, including `\', lose their special significance within a bracket expression.

No comments:

Post a Comment