Print this page
9083 replace regex implementation with tre


  57 .\" This notice shall appear on any product containing this material.
  58 .\"
  59 .\" The contents of this file are subject to the terms of the
  60 .\" Common Development and Distribution License (the "License").
  61 .\" You may not use this file except in compliance with the License.
  62 .\"
  63 .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
  64 .\" or http://www.opensolaris.org/os/licensing.
  65 .\" See the License for the specific language governing permissions
  66 .\" and limitations under the License.
  67 .\"
  68 .\" When distributing Covered Code, include this CDDL HEADER in each
  69 .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  70 .\" If applicable, add the following below this CDDL HEADER, with the
  71 .\" fields enclosed by brackets "[]" replaced with your own identifying
  72 .\" information: Portions Copyright [yyyy] [name of copyright owner]
  73 .\"
  74 .\"
  75 .\" Copyright (c) 1992, X/Open Company Limited. All Rights Reserved.
  76 .\" Portions Copyright (c) 2003, Sun Microsystems, Inc.  All Rights Reserved.
  77 .\" Copyright 2017 Nexenta Systems, Inc.
  78 .\"
  79 .Dd June 14, 2017
  80 .Dt REGCOMP 3C
  81 .Os
  82 .Sh NAME
  83 .Nm regcomp ,
  84 .Nm regexec ,
  85 .Nm regerror ,
  86 .Nm regfree
  87 .Nd regular-expression library
  88 .Sh LIBRARY
  89 .Lb libc
  90 .Sh SYNOPSIS
  91 .In regex.h
  92 .Ft int
  93 .Fo regcomp
  94 .Fa "regex_t *restrict preg" "const char *restrict pattern" "int cflags"
  95 .Fc
  96 .Ft int
  97 .Fo regexec
  98 .Fa "const regex_t *restrict preg" "const char *restrict string"
  99 .Fa "size_t nmatch" "regmatch_t pmatch[restrict]" "int eflags"


 153 .Pq EREs ,
 154 rather than the basic regular expressions
 155 .Pq BREs
 156 that are the default.
 157 .It Dv REG_BASIC
 158 This is a synonym for 0, provided as a counterpart to
 159 .Dv REG_EXTENDED
 160 to improve readability.
 161 .It Dv REG_NOSPEC
 162 Compile with recognition of all special characters turned off.
 163 All characters are thus considered ordinary, so the RE is a literal string.
 164 This is an extension, compatible with but not specified by
 165 .St -p1003.2 ,
 166 and should be used with caution in software intended to be portable to other
 167 systems.
 168 .Dv REG_EXTENDED
 169 and
 170 .Dv REG_NOSPEC
 171 may not be used in the same call to
 172 .Fn regcomp .



 173 .It Dv REG_ICASE
 174 Compile for matching that ignores upper/lower case distinctions.
 175 See
 176 .Xr regex 5 .
 177 .It Dv REG_NOSUB
 178 Compile for matching that need only report success or failure,
 179 not what was matched.
 180 .It Dv REG_NEWLINE
 181 Compile for newline-sensitive matching.
 182 By default, newline is a completely ordinary character with no special
 183 meaning in either REs or strings.
 184 With this flag,
 185 .Qq [^
 186 bracket expressions and
 187 .Qq \&.
 188 never match newline,
 189 a
 190 .Qq \&^
 191 anchor matches the null string after any newline in the string in addition to
 192 its normal function, and the


 465 .Pc
 466 The
 467 .Fn regerror
 468 function places the NUL-terminated message into the buffer pointed to by
 469 .Fa errbuf ,
 470 limiting the length
 471 .Pq including the NUL
 472 to at most
 473 .Fa errbuf_size
 474 bytes.
 475 If the whole message will not fit, as much of it as will fit before the
 476 terminating NUL is supplied.
 477 In any case, the returned value is the size of buffer needed to hold the whole
 478 message
 479 .Pq including terminating NUL .
 480 If
 481 .Fa errbuf_size
 482 is 0,
 483 .Fa errbuf
 484 is ignored but the return value is still correct.
 485 .Pp
 486 If the
 487 .Fa errcode
 488 given to
 489 .Fn regerror
 490 is first ORed with
 491 .Dv REG_ITOA ,
 492 the
 493 .Qq message
 494 that results is the printable name of the error code, e.g.
 495 .Qq Dv REG_NOMATCH ,
 496 rather than an explanation thereof.
 497 If
 498 .Fa errcode
 499 is
 500 .Dv REG_ATOI ,
 501 then
 502 .Fa preg
 503 shall be non-NULL and the
 504 .Va re_endp
 505 member of the structure it points to must point to the printable name of an
 506 error code; in this case, the result in
 507 .Fa errbuf
 508 is the decimal digits of the numeric value of the error code
 509 .Pq 0 if the name is not recognized .
 510 .Dv REG_ITOA
 511 and
 512 .Dv REG_ATOI
 513 are intended primarily as debugging facilities; they are extensions,
 514 compatible with but not specified by
 515 .St -p1003.2 ,
 516 and should be used with caution in software intended to be portable to other
 517 systems.
 518 .Ss Fn regfree
 519 The
 520 .Fn regfree
 521 function frees any dynamically-allocated storage associated with the compiled RE
 522 pointed to by
 523 .Fa preg .
 524 The remaining
 525 .Ft regex_t
 526 is no longer a valid compiled RE and the effect of supplying it to
 527 .Fn regexec
 528 or
 529 .Fn regerror
 530 is undefined.
 531 .Sh IMPLEMENTATION NOTES
 532 There are a number of decisions that
 533 .St -p1003.2
 534 leaves up to the implementor,
 535 either by explicitly saying
 536 .Qq undefined
 537 or by virtue of them being forbidden by the RE grammar.
 538 This implementation treats them as follows.
 539 .Pp
 540 There is no particular limit on the length of REs, except insofar as memory is
 541 limited.
 542 Memory usage is approximately linear in RE size, and largely insensitive
 543 to RE complexity, except for bounded repetitions.
 544 .Pp
 545 A backslashed character other than one specifically given a magic meaning by
 546 .St -p1003.2
 547 .Pq such magic meanings occur only in BREs
 548 is taken as an ordinary character.
 549 .Pp
 550 Any unmatched
 551 .Qq \&[
 552 is a
 553 .Dv REG_EBRACK
 554 error.
 555 .Pp
 556 Equivalence classes cannot begin or end bracket-expression ranges.
 557 The endpoint of one range cannot begin another.
 558 .Pp
 559 .Dv RE_DUP_MAX ,
 560 the limit on repetition counts in bounded repetitions, is 255.
 561 .Pp
 562 A repetition operator
 563 .Po
 564 .Qq \&? ,
 565 .Qq \&* ,
 566 .Qq \&+ ,
 567 or bounds
 568 .Pc
 569 cannot follow another repetition operator.
 570 A repetition operator cannot begin an expression or subexpression
 571 or follow
 572 .Qq \&^
 573 or
 574 .Qq \&| .
 575 .Pp
 576 .Qq \&|
 577 cannot appear first or last in a (sub)expression or after another
 578 .Qq \&| ,
 579 i.e., an operand of
 580 .Qq \&|
 581 cannot be an empty subexpression.
 582 An empty parenthesized subexpression,
 583 .Qq () ,
 584 is legal and matches an empty (sub)string.
 585 An empty string is not a legal RE.
 586 .Pp
 587 A
 588 .Qq \&{
 589 followed by a digit is considered the beginning of bounds for a bounded
 590 repetition, which must then follow the syntax for bounds.
 591 A
 592 .Qq \&{
 593 .Em not
 594 followed by a digit is considered an ordinary character.
 595 .Pp
 596 .Qq \&^
 597 and
 598 .Qq \&$
 599 beginning and ending subexpressions in BREs are anchors, not ordinary
 600 characters.
 601 .Sh RETURN VALUES
 602 On successful completion, the
 603 .Fn regcomp
 604 function returns 0.
 605 Otherwise, it returns an integer value indicating an error as described in
 606 .In regex.h ,
 607 and the content of preg is undefined.
 608 .Pp
 609 On successful completion, the
 610 .Fn regexec
 611 function returns 0.
 612 Otherwise it returns
 613 .Dv REG_NOMATCH
 614 to indicate no match, or
 615 .Dv REG_ENOSYS
 616 to indicate that the function is not supported.
 617 .Pp
 618 Upon successful completion, the
 619 .Fn regerror
 620 function returns the number of bytes needed to hold the entire generated string.
 621 Otherwise, it returns 0 to indicate that the function is not implemented.
 622 .Pp
 623 The
 624 .Fn regfree
 625 function returns no value.
 626 .Pp
 627 The following constants are defined as error return values:
 628 .Pp
 629 .Bl -tag -width "REG_ECOLLATE" -compact
 630 .It Dv REG_NOMATCH
 631 The
 632 .Fn regexec
 633 function failed to match.
 634 .It Dv REG_BADPAT
 635 Invalid regular expression.
 636 .It Dv REG_ECOLLATE
 637 Invalid collating element referenced.
 638 .It Dv REG_ECTYPE
 639 Invalid character class type referenced.
 640 .It Dv REG_EESCAPE
 641 Trailing


 656 .Qq ()
 657 imbalance.
 658 .It Dv REG_EBRACE
 659 .Qq \e{\e}
 660 imbalance.
 661 .It Dv REG_BADBR
 662 Content of
 663 .Qq \e{\e}
 664 invalid: not a number, number too large, more than two
 665 numbers, first larger than second.
 666 .It Dv REG_ERANGE
 667 Invalid endpoint in range expression.
 668 .It Dv REG_ESPACE
 669 Out of memory.
 670 .It Dv REG_BADRPT
 671 .Qq \&? ,
 672 .Qq *
 673 or
 674 .Qq +
 675 not preceded by valid regular expression.




 676 .El
 677 .Sh USAGE
 678 An application could use:
 679 .Bd -literal -offset Ds
 680 regerror(code, preg, (char *)NULL, (size_t)0)
 681 .Ed
 682 .Pp
 683 to find out how big a buffer is needed for the generated string,
 684 .Fn malloc
 685 a buffer to hold the string, and then call
 686 .Fn regerror
 687 again to get the string
 688 .Po see
 689 .Xr malloc 3C
 690 .Pc .
 691 Alternately, it could allocate a fixed, static buffer that is big enough to hold
 692 most strings, and then use
 693 .Fn malloc
 694 allocate a larger buffer if it finds that this is too small.
 695 .Sh EXAMPLES




  57 .\" This notice shall appear on any product containing this material.
  58 .\"
  59 .\" The contents of this file are subject to the terms of the
  60 .\" Common Development and Distribution License (the "License").
  61 .\" You may not use this file except in compliance with the License.
  62 .\"
  63 .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
  64 .\" or http://www.opensolaris.org/os/licensing.
  65 .\" See the License for the specific language governing permissions
  66 .\" and limitations under the License.
  67 .\"
  68 .\" When distributing Covered Code, include this CDDL HEADER in each
  69 .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  70 .\" If applicable, add the following below this CDDL HEADER, with the
  71 .\" fields enclosed by brackets "[]" replaced with your own identifying
  72 .\" information: Portions Copyright [yyyy] [name of copyright owner]
  73 .\"
  74 .\"
  75 .\" Copyright (c) 1992, X/Open Company Limited. All Rights Reserved.
  76 .\" Portions Copyright (c) 2003, Sun Microsystems, Inc.  All Rights Reserved.
  77 .\" Copyright 2018 Nexenta Systems, Inc.
  78 .\"
  79 .Dd February 3, 2018
  80 .Dt REGCOMP 3C
  81 .Os
  82 .Sh NAME
  83 .Nm regcomp ,
  84 .Nm regexec ,
  85 .Nm regerror ,
  86 .Nm regfree
  87 .Nd regular-expression library
  88 .Sh LIBRARY
  89 .Lb libc
  90 .Sh SYNOPSIS
  91 .In regex.h
  92 .Ft int
  93 .Fo regcomp
  94 .Fa "regex_t *restrict preg" "const char *restrict pattern" "int cflags"
  95 .Fc
  96 .Ft int
  97 .Fo regexec
  98 .Fa "const regex_t *restrict preg" "const char *restrict string"
  99 .Fa "size_t nmatch" "regmatch_t pmatch[restrict]" "int eflags"


 153 .Pq EREs ,
 154 rather than the basic regular expressions
 155 .Pq BREs
 156 that are the default.
 157 .It Dv REG_BASIC
 158 This is a synonym for 0, provided as a counterpart to
 159 .Dv REG_EXTENDED
 160 to improve readability.
 161 .It Dv REG_NOSPEC
 162 Compile with recognition of all special characters turned off.
 163 All characters are thus considered ordinary, so the RE is a literal string.
 164 This is an extension, compatible with but not specified by
 165 .St -p1003.2 ,
 166 and should be used with caution in software intended to be portable to other
 167 systems.
 168 .Dv REG_EXTENDED
 169 and
 170 .Dv REG_NOSPEC
 171 may not be used in the same call to
 172 .Fn regcomp .
 173 .It Dv REG_LITERAL
 174 An alias of
 175 .Dv REG_NOSPEC .
 176 .It Dv REG_ICASE
 177 Compile for matching that ignores upper/lower case distinctions.
 178 See
 179 .Xr regex 5 .
 180 .It Dv REG_NOSUB
 181 Compile for matching that need only report success or failure,
 182 not what was matched.
 183 .It Dv REG_NEWLINE
 184 Compile for newline-sensitive matching.
 185 By default, newline is a completely ordinary character with no special
 186 meaning in either REs or strings.
 187 With this flag,
 188 .Qq [^
 189 bracket expressions and
 190 .Qq \&.
 191 never match newline,
 192 a
 193 .Qq \&^
 194 anchor matches the null string after any newline in the string in addition to
 195 its normal function, and the


 468 .Pc
 469 The
 470 .Fn regerror
 471 function places the NUL-terminated message into the buffer pointed to by
 472 .Fa errbuf ,
 473 limiting the length
 474 .Pq including the NUL
 475 to at most
 476 .Fa errbuf_size
 477 bytes.
 478 If the whole message will not fit, as much of it as will fit before the
 479 terminating NUL is supplied.
 480 In any case, the returned value is the size of buffer needed to hold the whole
 481 message
 482 .Pq including terminating NUL .
 483 If
 484 .Fa errbuf_size
 485 is 0,
 486 .Fa errbuf
 487 is ignored but the return value is still correct.

































 488 .Ss Fn regfree
 489 The
 490 .Fn regfree
 491 function frees any dynamically-allocated storage associated with the compiled RE
 492 pointed to by
 493 .Fa preg .
 494 The remaining
 495 .Ft regex_t
 496 is no longer a valid compiled RE and the effect of supplying it to
 497 .Fn regexec
 498 or
 499 .Fn regerror
 500 is undefined.






































































 501 .Sh RETURN VALUES
 502 On successful completion, the
 503 .Fn regcomp
 504 function returns 0.
 505 Otherwise, it returns an integer value indicating an error as described in
 506 .In regex.h ,
 507 and the content of preg is undefined.
 508 .Pp
 509 On successful completion, the
 510 .Fn regexec
 511 function returns 0.
 512 Otherwise it returns
 513 .Dv REG_NOMATCH
 514 to indicate no match.


 515 .Pp
 516 Upon successful completion, the
 517 .Fn regerror
 518 function returns the number of bytes needed to hold the entire generated string.

 519 .Pp
 520 The
 521 .Fn regfree
 522 function returns no value.
 523 .Pp
 524 The following constants are defined as error return values:
 525 .Pp
 526 .Bl -tag -width "REG_ECOLLATE" -compact
 527 .It Dv REG_NOMATCH
 528 The
 529 .Fn regexec
 530 function failed to match.
 531 .It Dv REG_BADPAT
 532 Invalid regular expression.
 533 .It Dv REG_ECOLLATE
 534 Invalid collating element referenced.
 535 .It Dv REG_ECTYPE
 536 Invalid character class type referenced.
 537 .It Dv REG_EESCAPE
 538 Trailing


 553 .Qq ()
 554 imbalance.
 555 .It Dv REG_EBRACE
 556 .Qq \e{\e}
 557 imbalance.
 558 .It Dv REG_BADBR
 559 Content of
 560 .Qq \e{\e}
 561 invalid: not a number, number too large, more than two
 562 numbers, first larger than second.
 563 .It Dv REG_ERANGE
 564 Invalid endpoint in range expression.
 565 .It Dv REG_ESPACE
 566 Out of memory.
 567 .It Dv REG_BADRPT
 568 .Qq \&? ,
 569 .Qq *
 570 or
 571 .Qq +
 572 not preceded by valid regular expression.
 573 .It Dv REG_EMPTY
 574 Empty (sub)expression.
 575 .It Dv REG_INVARG
 576 Invalid argument, e.g. negative-length string.
 577 .El
 578 .Sh USAGE
 579 An application could use:
 580 .Bd -literal -offset Ds
 581 regerror(code, preg, (char *)NULL, (size_t)0)
 582 .Ed
 583 .Pp
 584 to find out how big a buffer is needed for the generated string,
 585 .Fn malloc
 586 a buffer to hold the string, and then call
 587 .Fn regerror
 588 again to get the string
 589 .Po see
 590 .Xr malloc 3C
 591 .Pc .
 592 Alternately, it could allocate a fixed, static buffer that is big enough to hold
 593 most strings, and then use
 594 .Fn malloc
 595 allocate a larger buffer if it finds that this is too small.
 596 .Sh EXAMPLES