HistoryIn Apocalypse 5, Larry Wall enumerated 20 problems with "current regex culture". Among these were that Perl's regexes were "too compact and 'cute'", had "too much reliance on too few metacharacters", "little support for named captures", "little support for grammars", and "poor integration with [the] 'real' language".2 Between late 2004 and mid-2005, a compiler for Perl 6 style rules was developed for the Parrot virtual machine called Parrot Grammar Engine (PGE) which was later re-named to the more generic, Parser Grammar Engine. PGE is a combination of runtime and compiler for Perl 6 style grammars that allows any parrot-based compiler to use these tools for parsing, and also to provide rules to their runtimes. Among other Perl 6 features, support for named captures was added to Perl 5.10 in 2007 3. Changes from Perl 5There are only six unchanged features from Perl 5's regexes:
A few of the most powerful additions include:
The following changes greatly improve the readability of regexes
Implicit changesSome of the features of Perl 5 regular expressions become more powerful in Perl 6 because of their ability to encapsulate the expanded features of Perl 6 rules. For example, in Perl 5, there were positive and negative lookahead operators However, because S ← &(A !b) a+ B A ← a A? b B ← b B? c In Perl 6 rules that would be:
rule S { <before <A> <!before b>> a+ <B> }
rule A { a <A>? b }
rule B { b <B>? c }
Of course, given the ability to mix rules and regular code, that can be simplified even further:
rule S { (a+) (b+) (c+) <{$0.elems == $1.elems == $2.elems}> }
However, this makes use of assertions, which is a subtly different concept in Perl 6 rules but more substantially different in parsing theory, making this a semantic rather than syntactic predicate. The most important difference in practice is performance. There is no way for the rule engine to know what conditions will be matched by the assertion, so no optimization of this process can be made. Integration with PerlIn many languages, regular expressions are entered as strings, which are then passed to library routines that parse and compile them into an internal state. In Perl 5, regular expressions shared some of the lexical analysis with Perl's scanner. This simplified many aspects of regular expression usage, though it added a great deal of complexity to the scanner. In Perl 6, rules are part of the grammar of the language. No separate parser exists for rules, as it did in Perl 5. This means that code, embedded in rules, is parsed at the same time as the rule itself and its surrounding code. For example, it is possible to nest rules and code without re-invoking the parser:
rule ab {
(a.) # match "a" followed by any character
# Then check to see if that character was "b"
# If so, print a message.
{ $0 ~~ /b {say "found the b"}/ }
}
The above is a single block of Perl 6 code which contains an outer rule definition, an inner block of assertion code, and inside of that a regex that contains one more level of assertion. ImplementationKeywordsThere are several keywords used in conjunction with Perl 6 rules:
Here is an example of typical use:
token word { \w+ }
rule phrase { <word> [ \, <word> ]* \. }
if $string ~~ / <phrase> \n / {
...
}
ModifiersModifiers may be placed after any of the regex keywords, and before the delimiter. If a regex is named, the modifier comes after the name. Modifiers control the way regexes are parsed and how they behave. They are always introduced with a leading Some of the more important modifiers include:
For example:
rule addition :ratchet :sigspace { <term> \+ <expr> }
GrammarsA grammar may be defined using the
grammar Str::SprintfFormat {
regex format_token { \%: <index>? <precision>? <modifier>? <directive> }
token index { \d+ \$ }
token precision { <flags>? <vector>? <precision_count> }
token flags { <[\ +0\#\-]>+ }
token precision_count { [ <[1-9]>\d* | \* ]? [ \. [ \d* | \* ] ]? }
token vector { \*? v }
token modifier { ll | <[lhmVqL]> }
token directive { <[\%csduoxefgXEGbpniDUOF]> }
}
This is the grammar used to define Perl's Outside of this namespace, you could use these rules like so:
if / <Str::SprintfFormat::format_token> / { ... }
A rule used in this way is actually identical to the invocation of a subroutine with the extra semantics and side-effects of pattern matching (e.g. rule invocations can be backtracked). ExamplesHere are some example rules in Perl 6:
rx { a [ b | c ] ( d | e ) f : g }
rx { ( ab* ) <{ $1.size % 2 == 0 }> }
That last is identical to:
rx { ( ab[bb]* ) }
References
External links
| |