Parle pattern matching

Parle supports regex matching similar to flex. Also supported are the following POSIX character sets: [:alnum:], [:alpha:], [:blank:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], [:xdigit:].

The Unicode character classes are currently not enabled by default, pass --enable-parle-utf32 to make them available. A particular encoding can be mapped with a correctly constructed regex. For example, to match the EURO symbol encoded in UTF-8, the regular expression [\xe2][\x82][\xac] can be used. The pattern for an UTF-8 encoded string could be [ -\x7f]{+}[\x80-\xbf]{+}[\xc2-\xdf]{+}[\xe0-\xef]{+}[\xf0-\xff]+.

Character representations

Character representations

Sequence Description

\a Alert (bell).

\b Backspace.

\e ESC character, \x1b.

\n Newline.

\r Carriage return.

\f Form feed, \x0c.

\t Horizontal tab, \x09.

\v Vertical tab, \x0b.

\oct Character specified by a three-digit octal code.

\xhex Character specified by a hex code.

\cchar Named control character.

**Character representations**
Sequence	Description
\a	Alert (bell).
\b	Backspace.
\e	ESC character, \x1b.
\n	Newline.
\r	Carriage return.
\f	Form feed, \x0c.
\t	Horizontal tab, \x09.
\v	Vertical tab, \x0b.
\oct	Character specified by a three-digit octal code.
\xhex	Character specified by a hex code.
\cchar	Named control character.

Character classes

Character classes

Sequence Description

[...] A single character listed or contained within a listed range. Ranges can be combined with the {+} and {-} operators. For example [a-z]{+}[0-9] is the same as [0-9a-z] and [a-z]{-}[aeiou] is the same as [b-df-hj-np-tv-z].

[^...] A single character not listed and not contained within a listed range.

. Any character, default [^\n].

\d Digit character, [0-9].

\D Non-digit character, [^0-9].

\s White space character, [ \t\n\r\f\v].

\S Non-white space character, [^ \t\n\r\f\v].

\w Word character, [a-zA-Z0-9_].

\W Non-word character, [^a-zA-Z0-9_].

**Character classes**
Sequence	Description
[...]	A single character listed or contained within a listed range. Ranges can be combined with the `{+}` and `{-}` operators. For example `[a-z]{+}[0-9]` is the same as `[0-9a-z]` and `[a-z]{-}[aeiou]` is the same as `[b-df-hj-np-tv-z]`.
[^...]	A single character not listed and not contained within a listed range.
.	Any character, default `[^\n].`
\d	Digit character, `[0-9]`.
\D	Non-digit character, `[^0-9]`.
\s	White space character, `[ \t\n\r\f\v]`.
\S	Non-white space character, `[^ \t\n\r\f\v]`.
\w	Word character, `[a-zA-Z0-9_]`.
\W	Non-word character, `[^a-zA-Z0-9_]`.

Unicode character classes

Unicode character classes

Sequence Description

\p{C} Other.

\p{Cc} Other, control.

\p{Cf} Other, format.

\p{Co} Other, private use.

\p{Cs} Other, surrogate.

\p{L} Letter.

\p{LC} Letter, cased.

\p{Ll} Letter, lowercase.

\p{Lm} Letter, modifier.

\p{Lo} Letter, other.

\p{Lt} Letter, titlecase.

\p{Lu} Letter, uppercase.

\p{M} Mark.

\p{Mc} Mark, space combining.

\p{Me} Mark, enclosing.

\p{Mn} Mark, nonspacing.

\p{N} Number.

\p{Nd} Number, decimal digit.

\p{Nl} Number, letter.

\p{No} Number, other.

\p{P} Punctuation.

\p{Pc} Punctiation, connector.

\p{Pd} Punctuation, dash.

\p{Pe} Punctuation, close.

\p{Pf} Punctuation, final quote.

\p{Pi} Punctuation, initial quote.

\p{Po} Punctuation, other.

\p{Ps} Punctuation, open.

\p{S} Symbol.

\p{Sc} Symbol, currency.

\p{Sk} Symbol, modifier.

\p{Sm} Symbol, math.

\p{So} Symbol, other.

\p{Z} Separator.

\p{Zl} Separator, line.

\p{Zp} Separator, paragraph.

\p{Zs} Separator, space.

**Unicode character classes**
Sequence	Description
\p{C}	Other.
\p{Cc}	Other, control.
\p{Cf}	Other, format.
\p{Co}	Other, private use.
\p{Cs}	Other, surrogate.
\p{L}	Letter.
\p{LC}	Letter, cased.
\p{Ll}	Letter, lowercase.
\p{Lm}	Letter, modifier.
\p{Lo}	Letter, other.
\p{Lt}	Letter, titlecase.
\p{Lu}	Letter, uppercase.
\p{M}	Mark.
\p{Mc}	Mark, space combining.
\p{Me}	Mark, enclosing.
\p{Mn}	Mark, nonspacing.
\p{N}	Number.
\p{Nd}	Number, decimal digit.
\p{Nl}	Number, letter.
\p{No}	Number, other.
\p{P}	Punctuation.
\p{Pc}	Punctiation, connector.
\p{Pd}	Punctuation, dash.
\p{Pe}	Punctuation, close.
\p{Pf}	Punctuation, final quote.
\p{Pi}	Punctuation, initial quote.
\p{Po}	Punctuation, other.
\p{Ps}	Punctuation, open.
\p{S}	Symbol.
\p{Sc}	Symbol, currency.
\p{Sk}	Symbol, modifier.
\p{Sm}	Symbol, math.
\p{So}	Symbol, other.
\p{Z}	Separator.
\p{Zl}	Separator, line.
\p{Zp}	Separator, paragraph.
\p{Zs}	Separator, space.

These character classes are only available, if the option --enable-parle-utf32 was passed at the compilation time.

Alternation and repetition

Alternation and repetition

Sequence Greedy Description

...|... - Try sub-patterns in alternation.

* yes Match 0 or more times.

+ yes Match 1 or more times.

? yes Match 0 or 1 times.

{n} no Match exactly n times.

{n,} yes Match at least n times.

{n,m} yes Match at least n times but no more than m times.

*? no Match 0 or more times.

+? no Match 1 or more times.

?? no Match 0 or 1 times.

{n,}? no Match at least n times.

{n,m}? no Match at least n times but no more than m times.

{MACRO} - Include the regex MACRO in the current regex.

**Alternation and repetition**
Sequence	Greedy	Description
...\|...	-	Try sub-patterns in alternation.
*	yes	Match 0 or more times.
+	yes	Match 1 or more times.
?	yes	Match 0 or 1 times.
{n}	no	Match exactly n times.
{n,}	yes	Match at least n times.
{n,m}	yes	Match at least n times but no more than m times.
*?	no	Match 0 or more times.
+?	no	Match 1 or more times.
??	no	Match 0 or 1 times.
{n,}?	no	Match at least n times.
{n,m}?	no	Match at least n times but no more than m times.
{MACRO}	-	Include the regex MACRO in the current regex.

Anchors

Anchors

Sequence Description

^ Start of string or after a newline.

$ End of string or before a newline.

**Anchors**
Sequence	Description
^	Start of string or after a newline.
$	End of string or before a newline.

Grouping

Grouping

Sequence Description

(...) Group a regular expression to override default operator precedence.

(?r-s:pattern) Apply option r and omit option s while interpreting pattern. Options may be zero or more of the characters i, s, or x. i means case-insensitive. -i means case-sensitive. s alters the meaning of . to match any character whatsoever. -s alters the meaning of . to match any character except \n. x ignores comments and whitespace in patterns. Whitespace is ignored unless it is backslash-escaped, contained within ""s, or appears inside a character range. These options can be applied globally at the rules level by passing a combination of the bit flags to the lexer.

(?# comment ) Omit everything within (). The first ) character encountered ends the pattern. It is not possible for the comment to contain a ) character. The comment may span lines.

Sequence	Description
(...)	Group a regular expression to override default operator precedence.
(?r-s:pattern)	Apply option r and omit option s while interpreting pattern. Options may be zero or more of the characters i, s, or x. `i` means case-insensitive. `-i` means case-sensitive. `s` alters the meaning of `.` to match any character whatsoever. `-s` alters the meaning of `.` to match any character except `\n`. `x` ignores comments and whitespace in patterns. Whitespace is ignored unless it is backslash-escaped, contained within `""s`, or appears inside a character range. These options can be applied globally at the rules level by passing a combination of the bit flags to the lexer.
(?# comment )	Omit everything within (). The first ) character encountered ends the pattern. It is not possible for the comment to contain a ) character. The comment may span lines.