Print this page
9083 replace regex implementation with tre


  40    regcomp()
  41      The regcomp() function compiles the regular expression contained in the
  42      pattern string, subject to the flags in cflags, and places the results in
  43      the regex_t structure pointed to by preg.  The cflags argument is the
  44      bitwise OR of zero or more of the following flags:
  45 
  46      REG_EXTENDED  Compile extended regular expressions (EREs), rather than
  47                    the basic regular expressions (BREs) that are the default.
  48 
  49      REG_BASIC     This is a synonym for 0, provided as a counterpart to
  50                    REG_EXTENDED to improve readability.
  51 
  52      REG_NOSPEC    Compile with recognition of all special characters turned
  53                    off.  All characters are thus considered ordinary, so the
  54                    RE is a literal string.  This is an extension, compatible
  55                    with but not specified by IEEE Std 1003.2 ("POSIX.2"), and
  56                    should be used with caution in software intended to be
  57                    portable to other systems.  REG_EXTENDED and REG_NOSPEC may
  58                    not be used in the same call to regcomp().
  59 


  60      REG_ICASE     Compile for matching that ignores upper/lower case
  61                    distinctions.  See regex(5).
  62 
  63      REG_NOSUB     Compile for matching that need only report success or
  64                    failure, not what was matched.
  65 
  66      REG_NEWLINE   Compile for newline-sensitive matching.  By default,
  67                    newline is a completely ordinary character with no special
  68                    meaning in either REs or strings.  With this flag, "[^"
  69                    bracket expressions and "." never match newline, a "^"
  70                    anchor matches the null string after any newline in the
  71                    string in addition to its normal function, and the "$"
  72                    anchor matches the null string before any newline in the
  73                    string in addition to its normal function.
  74 
  75      REG_PEND      The regular expression ends, not at the first NUL, but just
  76                    before the character pointed to by the re_endp member of
  77                    the structure pointed to by preg.  The re_endp member is of
  78                    type const char *.  This flag permits inclusion of NULs in
  79                    the RE; they are considered ordinary characters.  This is


 168      input offsets for REG_STARTEND.  Use for output is still entirely
 169      controlled by nmatch; if nmatch is 0 or REG_NOSUB was specified, the
 170      value of pmatch[0] will not be changed by a successful regexec().
 171 
 172    regerror()
 173      The regerror() function maps a non-zero errcode from either regcomp() or
 174      regexec() to a human-readable, printable message.  If preg is non-NULL,
 175      the error code should have arisen from use of the regex_t pointed to by
 176      preg, and if the error code came from regcomp(), it should have been the
 177      result from the most recent regcomp() using that regex_t.  The
 178      (regerror() may be able to supply a more detailed message using
 179      information from the regex_t.) The regerror() function places the NUL-
 180      terminated message into the buffer pointed to by errbuf, limiting the
 181      length (including the NUL) to at most errbuf_size bytes.  If the whole
 182      message will not fit, as much of it as will fit before the terminating
 183      NUL is supplied.  In any case, the returned value is the size of buffer
 184      needed to hold the whole message (including terminating NUL).  If
 185      errbuf_size is 0, errbuf is ignored but the return value is still
 186      correct.
 187 
 188      If the errcode given to regerror() is first ORed with REG_ITOA, the
 189      "message" that results is the printable name of the error code, e.g.
 190      "REG_NOMATCH", rather than an explanation thereof.  If errcode is
 191      REG_ATOI, then preg shall be non-NULL and the re_endp member of the
 192      structure it points to must point to the printable name of an error code;
 193      in this case, the result in errbuf is the decimal digits of the numeric
 194      value of the error code (0 if the name is not recognized).  REG_ITOA and
 195      REG_ATOI are intended primarily as debugging facilities; they are
 196      extensions, compatible with but not specified by IEEE Std 1003.2
 197      ("POSIX.2"), and should be used with caution in software intended to be
 198      portable to other systems.
 199 
 200    regfree()
 201      The regfree() function frees any dynamically-allocated storage associated
 202      with the compiled RE pointed to by preg.  The remaining regex_t is no
 203      longer a valid compiled RE and the effect of supplying it to regexec() or
 204      regerror() is undefined.
 205 
 206 IMPLEMENTATION NOTES
 207      There are a number of decisions that IEEE Std 1003.2 ("POSIX.2") leaves
 208      up to the implementor, either by explicitly saying "undefined" or by
 209      virtue of them being forbidden by the RE grammar.  This implementation
 210      treats them as follows.
 211 
 212      There is no particular limit on the length of REs, except insofar as
 213      memory is limited.  Memory usage is approximately linear in RE size, and
 214      largely insensitive to RE complexity, except for bounded repetitions.
 215 
 216      A backslashed character other than one specifically given a magic meaning
 217      by IEEE Std 1003.2 ("POSIX.2") (such magic meanings occur only in BREs)
 218      is taken as an ordinary character.
 219 
 220      Any unmatched "[" is a REG_EBRACK error.
 221 
 222      Equivalence classes cannot begin or end bracket-expression ranges.  The
 223      endpoint of one range cannot begin another.
 224 
 225      RE_DUP_MAX, the limit on repetition counts in bounded repetitions, is
 226      255.
 227 
 228      A repetition operator ("?", "*", "+", or bounds) cannot follow another
 229      repetition operator.  A repetition operator cannot begin an expression or
 230      subexpression or follow "^" or "|".
 231 
 232      "|" cannot appear first or last in a (sub)expression or after another
 233      "|", i.e., an operand of "|" cannot be an empty subexpression.  An empty
 234      parenthesized subexpression, "()", is legal and matches an empty
 235      (sub)string.  An empty string is not a legal RE.
 236 
 237      A "{" followed by a digit is considered the beginning of bounds for a
 238      bounded repetition, which must then follow the syntax for bounds.  A "{"
 239      not followed by a digit is considered an ordinary character.
 240 
 241      "^" and "$" beginning and ending subexpressions in BREs are anchors, not
 242      ordinary characters.
 243 
 244 RETURN VALUES
 245      On successful completion, the regcomp() function returns 0.  Otherwise,
 246      it returns an integer value indicating an error as described in
 247      <regex.h>,   and the content of preg is undefined.
 248 
 249      On successful completion, the regexec() function returns 0.  Otherwise it
 250      returns REG_NOMATCH to indicate no match, or REG_ENOSYS to indicate that
 251      the function is not supported.
 252 
 253      Upon successful completion, the regerror() function returns the number of
 254      bytes needed to hold the entire generated string.  Otherwise, it returns
 255      0 to indicate that the function is not implemented.
 256 
 257      The regfree() function returns no value.
 258 
 259      The following constants are defined as error return values:
 260 
 261      REG_NOMATCH   The regexec() function failed to match.
 262      REG_BADPAT    Invalid regular expression.
 263      REG_ECOLLATE  Invalid collating element referenced.
 264      REG_ECTYPE    Invalid character class type referenced.
 265      REG_EESCAPE   Trailing "\" in pattern.
 266      REG_ESUBREG   Number in "\digit" invalid or in error.
 267      REG_EBRACK    "[]" imbalance.
 268      REG_ENOSYS    The function is not supported.
 269      REG_EPAREN    "\(\)" or "()" imbalance.
 270      REG_EBRACE    "\{\}" imbalance.
 271      REG_BADBR     Content of "\{\}" invalid: not a number, number too large,
 272                    more than two numbers, first larger than second.
 273      REG_ERANGE    Invalid endpoint in range expression.
 274      REG_ESPACE    Out of memory.
 275      REG_BADRPT    "?", "*" or "+" not preceded by valid regular expression.


 276 
 277 USAGE
 278      An application could use:
 279 
 280            regerror(code, preg, (char *)NULL, (size_t)0)
 281 
 282      to find out how big a buffer is needed for the generated string, malloc()
 283      a buffer to hold the string, and then call regerror() again to get the
 284      string (see malloc(3C)).  Alternately, it could allocate a fixed, static
 285      buffer that is big enough to hold most strings, and then use malloc()
 286      allocate a larger buffer if it finds that this is too small.
 287 
 288 EXAMPLES
 289      Matching string against the extended regular expression in pattern.
 290 
 291            #include <regex.h>
 292 
 293            /*
 294            * Match string against the extended regular expression in
 295            * pattern, treating errors as no match.


 331      No errors are defined.
 332 
 333 CODE SET INDEPENDENCE
 334      Enabled
 335 
 336 INTERFACE STABILITY
 337      Standard
 338 
 339 MT-LEVEL
 340      MT-Safe with exceptions
 341 
 342      The regcomp() function can be used safely in a multithreaded application
 343      as long as setlocale(3C) is not being called to change the locale.
 344 
 345 SEE ALSO
 346      attributes(5), regex(5), standards(5)
 347 
 348      IEEE Std 1003.2 ("POSIX.2"), sections 2.8 (Regular Expression Notation)
 349      and B.5 (C Binding for Regular Expression Matching).
 350 
 351 illumos                          June 14, 2017                         illumos


  40    regcomp()
  41      The regcomp() function compiles the regular expression contained in the
  42      pattern string, subject to the flags in cflags, and places the results in
  43      the regex_t structure pointed to by preg.  The cflags argument is the
  44      bitwise OR of zero or more of the following flags:
  45 
  46      REG_EXTENDED  Compile extended regular expressions (EREs), rather than
  47                    the basic regular expressions (BREs) that are the default.
  48 
  49      REG_BASIC     This is a synonym for 0, provided as a counterpart to
  50                    REG_EXTENDED to improve readability.
  51 
  52      REG_NOSPEC    Compile with recognition of all special characters turned
  53                    off.  All characters are thus considered ordinary, so the
  54                    RE is a literal string.  This is an extension, compatible
  55                    with but not specified by IEEE Std 1003.2 ("POSIX.2"), and
  56                    should be used with caution in software intended to be
  57                    portable to other systems.  REG_EXTENDED and REG_NOSPEC may
  58                    not be used in the same call to regcomp().
  59 
  60      REG_LITERAL   An alias of REG_NOSPEC.
  61 
  62      REG_ICASE     Compile for matching that ignores upper/lower case
  63                    distinctions.  See regex(5).
  64 
  65      REG_NOSUB     Compile for matching that need only report success or
  66                    failure, not what was matched.
  67 
  68      REG_NEWLINE   Compile for newline-sensitive matching.  By default,
  69                    newline is a completely ordinary character with no special
  70                    meaning in either REs or strings.  With this flag, "[^"
  71                    bracket expressions and "." never match newline, a "^"
  72                    anchor matches the null string after any newline in the
  73                    string in addition to its normal function, and the "$"
  74                    anchor matches the null string before any newline in the
  75                    string in addition to its normal function.
  76 
  77      REG_PEND      The regular expression ends, not at the first NUL, but just
  78                    before the character pointed to by the re_endp member of
  79                    the structure pointed to by preg.  The re_endp member is of
  80                    type const char *.  This flag permits inclusion of NULs in
  81                    the RE; they are considered ordinary characters.  This is


 170      input offsets for REG_STARTEND.  Use for output is still entirely
 171      controlled by nmatch; if nmatch is 0 or REG_NOSUB was specified, the
 172      value of pmatch[0] will not be changed by a successful regexec().
 173 
 174    regerror()
 175      The regerror() function maps a non-zero errcode from either regcomp() or
 176      regexec() to a human-readable, printable message.  If preg is non-NULL,
 177      the error code should have arisen from use of the regex_t pointed to by
 178      preg, and if the error code came from regcomp(), it should have been the
 179      result from the most recent regcomp() using that regex_t.  The
 180      (regerror() may be able to supply a more detailed message using
 181      information from the regex_t.) The regerror() function places the NUL-
 182      terminated message into the buffer pointed to by errbuf, limiting the
 183      length (including the NUL) to at most errbuf_size bytes.  If the whole
 184      message will not fit, as much of it as will fit before the terminating
 185      NUL is supplied.  In any case, the returned value is the size of buffer
 186      needed to hold the whole message (including terminating NUL).  If
 187      errbuf_size is 0, errbuf is ignored but the return value is still
 188      correct.
 189 












 190    regfree()
 191      The regfree() function frees any dynamically-allocated storage associated
 192      with the compiled RE pointed to by preg.  The remaining regex_t is no
 193      longer a valid compiled RE and the effect of supplying it to regexec() or
 194      regerror() is undefined.
 195 






































 196 RETURN VALUES
 197      On successful completion, the regcomp() function returns 0.  Otherwise,
 198      it returns an integer value indicating an error as described in
 199      <regex.h>,   and the content of preg is undefined.
 200 
 201      On successful completion, the regexec() function returns 0.  Otherwise it
 202      returns REG_NOMATCH to indicate no match.

 203 
 204      Upon successful completion, the regerror() function returns the number of
 205      bytes needed to hold the entire generated string.

 206 
 207      The regfree() function returns no value.
 208 
 209      The following constants are defined as error return values:
 210 
 211      REG_NOMATCH   The regexec() function failed to match.
 212      REG_BADPAT    Invalid regular expression.
 213      REG_ECOLLATE  Invalid collating element referenced.
 214      REG_ECTYPE    Invalid character class type referenced.
 215      REG_EESCAPE   Trailing "\" in pattern.
 216      REG_ESUBREG   Number in "\digit" invalid or in error.
 217      REG_EBRACK    "[]" imbalance.
 218      REG_ENOSYS    The function is not supported.
 219      REG_EPAREN    "\(\)" or "()" imbalance.
 220      REG_EBRACE    "\{\}" imbalance.
 221      REG_BADBR     Content of "\{\}" invalid: not a number, number too large,
 222                    more than two numbers, first larger than second.
 223      REG_ERANGE    Invalid endpoint in range expression.
 224      REG_ESPACE    Out of memory.
 225      REG_BADRPT    "?", "*" or "+" not preceded by valid regular expression.
 226      REG_EMPTY     Empty (sub)expression.
 227      REG_INVARG    Invalid argument, e.g. negative-length string.
 228 
 229 USAGE
 230      An application could use:
 231 
 232            regerror(code, preg, (char *)NULL, (size_t)0)
 233 
 234      to find out how big a buffer is needed for the generated string, malloc()
 235      a buffer to hold the string, and then call regerror() again to get the
 236      string (see malloc(3C)).  Alternately, it could allocate a fixed, static
 237      buffer that is big enough to hold most strings, and then use malloc()
 238      allocate a larger buffer if it finds that this is too small.
 239 
 240 EXAMPLES
 241      Matching string against the extended regular expression in pattern.
 242 
 243            #include <regex.h>
 244 
 245            /*
 246            * Match string against the extended regular expression in
 247            * pattern, treating errors as no match.


 283      No errors are defined.
 284 
 285 CODE SET INDEPENDENCE
 286      Enabled
 287 
 288 INTERFACE STABILITY
 289      Standard
 290 
 291 MT-LEVEL
 292      MT-Safe with exceptions
 293 
 294      The regcomp() function can be used safely in a multithreaded application
 295      as long as setlocale(3C) is not being called to change the locale.
 296 
 297 SEE ALSO
 298      attributes(5), regex(5), standards(5)
 299 
 300      IEEE Std 1003.2 ("POSIX.2"), sections 2.8 (Regular Expression Notation)
 301      and B.5 (C Binding for Regular Expression Matching).
 302 
 303 illumos                        February 3, 2018                        illumos