<?php
include_once $_SERVER['DOCUMENT_ROOT'] . '/include/shared-manual.inc';
$TOC = array();
$TOC_DEPRECATED = array();
$PARENTS = array();
include_once dirname(__FILE__) ."/toc/reference.pcre.pattern.syntax.inc";
$setup = array (
  'home' => 
  array (
    0 => 'index.php',
    1 => 'PHP Manual',
  ),
  'head' => 
  array (
    0 => 'UTF-8',
    1 => 'de',
  ),
  'this' => 
  array (
    0 => 'regexp.reference.performance.php',
    1 => 'Performance',
    2 => 'Performance',
  ),
  'up' => 
  array (
    0 => 'reference.pcre.pattern.syntax.php',
    1 => 'PCRE regex syntax',
  ),
  'prev' => 
  array (
    0 => 'regexp.reference.recursive.php',
    1 => 'Recursive patterns',
  ),
  'next' => 
  array (
    0 => 'reference.pcre.pattern.modifiers.php',
    1 => 'M&ouml;gliche Modifikatoren in RegEx-Suchmustern',
  ),
  'alternatives' => 
  array (
  ),
  'source' => 
  array (
    'lang' => 'en',
    'path' => 'reference/pcre/pattern.syntax.xml',
  ),
  'history' => 
  array (
  ),
);
$setup["toc"] = $TOC;
$setup["toc_deprecated"] = $TOC_DEPRECATED;
$setup["parents"] = $PARENTS;
manual_setup($setup);

contributors($setup);

?>
<div id="regexp.reference.performance" class="section">
  <h2 class="title">Performance</h2>
  <p class="para">
   Certain items that may appear in patterns are more efficient
   than others. It is more efficient to use a character class
   like [aeiou] than a set of alternatives such as (a|e|i|o|u).
   In general, the simplest construction that provides the
   required behaviour is usually the most efficient. Jeffrey
   Friedl&#039;s book contains a lot of discussion about optimizing
   regular expressions for efficient performance.
  </p>
  <p class="para">
   When a pattern begins with .* and the <a href="reference.pcre.pattern.modifiers.php" class="link">PCRE_DOTALL</a> option is
   set, the pattern is implicitly anchored by PCRE, since it
   can match only at the start of a subject string. However, if
   <a href="reference.pcre.pattern.modifiers.php" class="link">PCRE_DOTALL</a>
   is not set, PCRE cannot make this optimization,
   because the . metacharacter does not then match a newline,
   and if the subject string contains newlines, the pattern may
   match from the character immediately following one of them
   instead of from the very start. For example, the pattern

   <code class="literal">(.*) second</code>

   matches the subject &quot;first\nand second&quot; (where \n stands for
   a newline character) with the first captured substring being
   &quot;and&quot;. In order to do this, PCRE has to retry the match
   starting after every newline in the subject.
  </p>
  <p class="para">
   If you are using such a pattern with subject strings that do
   not contain newlines, the best performance is obtained by
   setting <a href="reference.pcre.pattern.modifiers.php" class="link">PCRE_DOTALL</a>,
   or starting the pattern with ^.* to
   indicate explicit anchoring. That saves PCRE from having to
   scan along the subject looking for a newline to restart at.
  </p>
  <p class="para">
   Beware of patterns that contain nested indefinite repeats.
   These can take a long time to run when applied to a string
   that does not match. Consider the pattern fragment

   <code class="literal">(a+)*</code>
  </p>
  <p class="para">
   This can match &quot;aaaa&quot; in 33 different ways, and this number
   increases very rapidly as the string gets longer. (The *
   repeat can match 0, 1, 2, 3, or 4 times, and for each of
   those cases other than 0, the + repeats can match different
   numbers of times.) When the remainder of the pattern is such
   that the entire match is going to fail, PCRE has in principle
   to try every possible variation, and this can take an
   extremely long time.
  </p>
  <p class="para">
   An optimization catches some of the more simple cases such
   as

   <code class="literal">(a+)*b</code>

   where a literal character follows. Before embarking on the
   standard matching procedure, PCRE checks that there is a &quot;b&quot;
   later in the subject string, and if there is not, it fails
   the match immediately. However, when there is no following
   literal this optimization cannot be used. You can see the
   difference by comparing the behaviour of

   <code class="literal">(a+)*\d</code>

   with the pattern above. The former gives a failure almost
   instantly when applied to a whole line of &quot;a&quot; characters,
   whereas the latter takes an appreciable time with strings
   longer than about 20 characters.
  </p>
 </div><?php manual_footer($setup); ?>