1 REGCOMP(3C)              Standard C Library Functions              REGCOMP(3C)
   2 
   3 NAME
   4      regcomp, regexec, regerror, regfree - regular-expression library
   5 
   6 LIBRARY
   7      Standard C Library (libc, -lc)
   8 
   9 SYNOPSIS
  10      #include <regex.h>
  11 
  12      int
  13      regcomp(regex_t *restrict preg, const char *restrict pattern,
  14          int cflags);
  15 
  16      int
  17      regexec(const regex_t *restrict preg, const char *restrict string,
  18          size_t nmatch, regmatch_t pmatch[restrict], int eflags);
  19 
  20      size_t
  21      regerror(int errcode, const regex_t *restrict preg,
  22          char *restrict errbuf, size_t errbuf_size);
  23 
  24      void
  25      regfree(regex_t *preg);
  26 
  27 DESCRIPTION
  28      These routines implement IEEE Std 1003.2 ("POSIX.2") regular expressions;
  29      see regex(5).  The regcomp() function compiles an RE written as a string
  30      into an internal form, regexec() matches that internal form against a
  31      string and reports results, regerror() transforms error codes from either
  32      into human-readable messages, and regfree() frees any dynamically-
  33      allocated storage used by the internal form of an RE.
  34 
  35      The header <regex.h> declares two structure types,   regex_t and
  36      regmatch_t, the former for compiled internal forms and the latter for
  37      match reporting.  It also declares the four functions, a type regoff_t,
  38      and a number of constants with names starting with "REG_".
  39 
  40    regcomp()
  41      The regcomp() function compiles the regular expression contained in the
  42      pattern string, subject to the flags in cflags, and places the results in
  43      the regex_t structure pointed to by preg.  The cflags argument is the
  44      bitwise OR of zero or more of the following flags:
  45 
  46      REG_EXTENDED  Compile extended regular expressions (EREs), rather than
  47                    the basic regular expressions (BREs) that are the default.
  48 
  49      REG_BASIC     This is a synonym for 0, provided as a counterpart to
  50                    REG_EXTENDED to improve readability.
  51 
  52      REG_NOSPEC    Compile with recognition of all special characters turned
  53                    off.  All characters are thus considered ordinary, so the
  54                    RE is a literal string.  This is an extension, compatible
  55                    with but not specified by IEEE Std 1003.2 ("POSIX.2"), and
  56                    should be used with caution in software intended to be
  57                    portable to other systems.  REG_EXTENDED and REG_NOSPEC may
  58                    not be used in the same call to regcomp().
  59 
  60      REG_LITERAL   An alias of REG_NOSPEC.
  61 
  62      REG_ICASE     Compile for matching that ignores upper/lower case
  63                    distinctions.  See regex(5).
  64 
  65      REG_NOSUB     Compile for matching that need only report success or
  66                    failure, not what was matched.
  67 
  68      REG_NEWLINE   Compile for newline-sensitive matching.  By default,
  69                    newline is a completely ordinary character with no special
  70                    meaning in either REs or strings.  With this flag, "[^"
  71                    bracket expressions and "." never match newline, a "^"
  72                    anchor matches the null string after any newline in the
  73                    string in addition to its normal function, and the "$"
  74                    anchor matches the null string before any newline in the
  75                    string in addition to its normal function.
  76 
  77      REG_PEND      The regular expression ends, not at the first NUL, but just
  78                    before the character pointed to by the re_endp member of
  79                    the structure pointed to by preg.  The re_endp member is of
  80                    type const char *.  This flag permits inclusion of NULs in
  81                    the RE; they are considered ordinary characters.  This is
  82                    an extension, compatible with but not specified by IEEE Std
  83                    1003.2 ("POSIX.2"), and should be used with caution in
  84                    software intended to be portable to other systems.
  85 
  86      When successful, regcomp() returns 0 and fills in the structure pointed
  87      to by preg.  One member of that structure (other than re_endp) is
  88      publicized: re_nsub, of type size_t, contains the number of parenthesized
  89      subexpressions within the RE (except that the value of this member is
  90      undefined if the REG_NOSUB flag was used).
  91 
  92    regexec()
  93      The regexec() function matches the compiled RE pointed to by preg against
  94      the string, subject to the flags in eflags, and reports results using
  95      nmatch, pmatch, and the returned value.  The RE must have been compiled
  96      by a previous invocation of regcomp().  The compiled form is not altered
  97      during execution of regexec(), so a single compiled RE can be used
  98      simultaneously by multiple threads.
  99 
 100      By default, the NUL-terminated string pointed to by string is considered
 101      to be the text of an entire line, minus any terminating newline.  The
 102      eflags argument is the bitwise OR of zero or more of the following flags:
 103 
 104      REG_NOTBOL    The first character of the string is treated as the
 105                    continuation of a line.  This means that the anchors "^",
 106                    "[[:<:]]", and "\<" do not match before it; but see
 107                    REG_STARTEND below.  This does not affect the behavior of
 108                    newlines under REG_NEWLINE.
 109 
 110      REG_NOTEOL    The NUL terminating the string does not end a line, so the
 111                    "$" anchor does not match before it.  This does not affect
 112                    the behavior of newlines under REG_NEWLINE.
 113 
 114      REG_STARTEND  The string is considered to start at string +
 115                    pmatch[0].rm_so and to end before the byte located at
 116                    string + pmatch[0].rm_eo, regardless of the value of
 117                    nmatch.  See below for the definition of pmatch and nmatch.
 118                    This is an extension, compatible with but not specified by
 119                    IEEE Std 1003.2 ("POSIX.2"), and should be used with
 120                    caution in software intended to be portable to other
 121                    systems.
 122 
 123                    Without REG_NOTBOL, the position rm_so is considered the
 124                    beginning of a line, such that "^" matches before it, and
 125                    the beginning of a word if there is a word character at
 126                    this position, such that "[[:<:]]" and "\<" match before
 127                    it.
 128 
 129                    With REG_NOTBOL, the character at position rm_so is treated
 130                    as the continuation of a line, and if rm_so is greater than
 131                    0, the preceding character is taken into consideration.  If
 132                    the preceding character is a newline and the regular
 133                    expression was compiled with REG_NEWLINE, "^" matches
 134                    before the string; if the preceding character is not a word
 135                    character but the string starts with a word character,
 136                    "[[:<:]]" and "\<" match before the string.
 137 
 138      See regex(5) for a discussion of what is matched in situations where an
 139      RE or a portion thereof could match any of several substrings of string.
 140 
 141      If REG_NOSUB was specified in the compilation of the RE, or if nmatch is
 142      0, regexec() ignores the pmatch argument (but see below for the case
 143      where REG_STARTEND is specified).  Otherwise, pmatch points to an array
 144      of nmatch structures of type regmatch_t.  Such a structure has at least
 145      the members rm_so and rm_eo, both of type regoff_t (a signed arithmetic
 146      type at least as large as an off_t and a ssize_t), containing
 147      respectively the offset of the first character of a substring and the
 148      offset of the first character after the end of the substring.  Offsets
 149      are measured from the beginning of the string argument given to
 150      regexec().  An empty substring is denoted by equal offsets, both
 151      indicating the character following the empty substring.
 152 
 153      The 0th member of the pmatch array is filled in to indicate what
 154      substring of string was matched by the entire RE.  Remaining members
 155      report what substring was matched by parenthesized subexpressions within
 156      the RE; member i reports subexpression i, with subexpressions counted
 157      (starting at 1) by the order of their opening parentheses in the RE, left
 158      to right.  Unused entries in the array (corresponding either to
 159      subexpressions that did not participate in the match at all, or to
 160      subexpressions that do not exist in the RE (that is, i > preg->re_nsub))
 161      have both rm_so and rm_eo set to -1.  If a subexpression participated in
 162      the match several times, the reported substring is the last one it
 163      matched.  (Note, as an example in particular, that when the RE "(b*)+"
 164      matches "bbb", the parenthesized subexpression matches each of the three
 165      `b's and then an infinite number of empty strings following the last "b",
 166      so the reported substring is one of the empties.)
 167 
 168      If REG_STARTEND is specified, pmatch must point to at least one
 169      regmatch_t (even if nmatch is 0 or REG_NOSUB was specified), to hold the
 170      input offsets for REG_STARTEND.  Use for output is still entirely
 171      controlled by nmatch; if nmatch is 0 or REG_NOSUB was specified, the
 172      value of pmatch[0] will not be changed by a successful regexec().
 173 
 174    regerror()
 175      The regerror() function maps a non-zero errcode from either regcomp() or
 176      regexec() to a human-readable, printable message.  If preg is non-NULL,
 177      the error code should have arisen from use of the regex_t pointed to by
 178      preg, and if the error code came from regcomp(), it should have been the
 179      result from the most recent regcomp() using that regex_t.  The
 180      (regerror() may be able to supply a more detailed message using
 181      information from the regex_t.) The regerror() function places the NUL-
 182      terminated message into the buffer pointed to by errbuf, limiting the
 183      length (including the NUL) to at most errbuf_size bytes.  If the whole
 184      message will not fit, as much of it as will fit before the terminating
 185      NUL is supplied.  In any case, the returned value is the size of buffer
 186      needed to hold the whole message (including terminating NUL).  If
 187      errbuf_size is 0, errbuf is ignored but the return value is still
 188      correct.
 189 
 190    regfree()
 191      The regfree() function frees any dynamically-allocated storage associated
 192      with the compiled RE pointed to by preg.  The remaining regex_t is no
 193      longer a valid compiled RE and the effect of supplying it to regexec() or
 194      regerror() is undefined.
 195 
 196 RETURN VALUES
 197      On successful completion, the regcomp() function returns 0.  Otherwise,
 198      it returns an integer value indicating an error as described in
 199      <regex.h>,   and the content of preg is undefined.
 200 
 201      On successful completion, the regexec() function returns 0.  Otherwise it
 202      returns REG_NOMATCH to indicate no match.
 203 
 204      Upon successful completion, the regerror() function returns the number of
 205      bytes needed to hold the entire generated string.
 206 
 207      The regfree() function returns no value.
 208 
 209      The following constants are defined as error return values:
 210 
 211      REG_NOMATCH   The regexec() function failed to match.
 212      REG_BADPAT    Invalid regular expression.
 213      REG_ECOLLATE  Invalid collating element referenced.
 214      REG_ECTYPE    Invalid character class type referenced.
 215      REG_EESCAPE   Trailing "\" in pattern.
 216      REG_ESUBREG   Number in "\digit" invalid or in error.
 217      REG_EBRACK    "[]" imbalance.
 218      REG_ENOSYS    The function is not supported.
 219      REG_EPAREN    "\(\)" or "()" imbalance.
 220      REG_EBRACE    "\{\}" imbalance.
 221      REG_BADBR     Content of "\{\}" invalid: not a number, number too large,
 222                    more than two numbers, first larger than second.
 223      REG_ERANGE    Invalid endpoint in range expression.
 224      REG_ESPACE    Out of memory.
 225      REG_BADRPT    "?", "*" or "+" not preceded by valid regular expression.
 226      REG_EMPTY     Empty (sub)expression.
 227      REG_INVARG    Invalid argument, e.g. negative-length string.
 228 
 229 USAGE
 230      An application could use:
 231 
 232            regerror(code, preg, (char *)NULL, (size_t)0)
 233 
 234      to find out how big a buffer is needed for the generated string, malloc()
 235      a buffer to hold the string, and then call regerror() again to get the
 236      string (see malloc(3C)).  Alternately, it could allocate a fixed, static
 237      buffer that is big enough to hold most strings, and then use malloc()
 238      allocate a larger buffer if it finds that this is too small.
 239 
 240 EXAMPLES
 241      Matching string against the extended regular expression in pattern.
 242 
 243            #include <regex.h>
 244 
 245            /*
 246            * Match string against the extended regular expression in
 247            * pattern, treating errors as no match.
 248            *
 249            * return 1 for match, 0 for no match
 250            */
 251            int
 252            match(const char *string, char *pattern)
 253            {
 254                    int status;
 255                    regex_t re;
 256 
 257                    if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) !=     0) {
 258                            return(0);      /* report error */
 259                    }
 260                    status = regexec(&re, string, (size_t) 0, NULL, 0);
 261                    regfree(&re);
 262                    if (status != 0) {
 263                            return(0);      /* report error */
 264                    }
 265                    return(1);
 266            }
 267 
 268      The following demonstrates how the REG_NOTBOL flag could be used with
 269      regexec() to find all substrings in a line that match a pattern supplied
 270      by a user.  (For simplicity of the example, very little error checking is
 271      done.)
 272 
 273            (void) regcomp(&re, pattern,     0);
 274            /* this call to regexec() finds the first match on the line */
 275            error = regexec(&re,     &buffer[0], 1, &pm, 0);
 276            while (error == 0) {    /* while matches found */
 277                    /* substring found between pm.rm_so and pm.rm_eo */
 278                    /* This call to regexec() finds the next match */
 279                    error = regexec(&re,     buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
 280            }
 281 
 282 ERRORS
 283      No errors are defined.
 284 
 285 CODE SET INDEPENDENCE
 286      Enabled
 287 
 288 INTERFACE STABILITY
 289      Standard
 290 
 291 MT-LEVEL
 292      MT-Safe with exceptions
 293 
 294      The regcomp() function can be used safely in a multithreaded application
 295      as long as setlocale(3C) is not being called to change the locale.
 296 
 297 SEE ALSO
 298      attributes(5), regex(5), standards(5)
 299 
 300      IEEE Std 1003.2 ("POSIX.2"), sections 2.8 (Regular Expression Notation)
 301      and B.5 (C Binding for Regular Expression Matching).
 302 
 303 illumos                        February 3, 2018                        illumos