1 REGCOMP(3C) Standard C Library Functions REGCOMP(3C) 2 3 NAME 4 regcomp, regexec, regerror, regfree - regular-expression library 5 6 LIBRARY 7 Standard C Library (libc, -lc) 8 9 SYNOPSIS 10 #include <regex.h> 11 12 int 13 regcomp(regex_t *restrict preg, const char *restrict pattern, 14 int cflags); 15 16 int 17 regexec(const regex_t *restrict preg, const char *restrict string, 18 size_t nmatch, regmatch_t pmatch[restrict], int eflags); 19 20 size_t 21 regerror(int errcode, const regex_t *restrict preg, 22 char *restrict errbuf, size_t errbuf_size); 23 24 void 25 regfree(regex_t *preg); 26 27 DESCRIPTION 28 These routines implement IEEE Std 1003.2 ("POSIX.2") regular expressions; 29 see regex(5). The regcomp() function compiles an RE written as a string 30 into an internal form, regexec() matches that internal form against a 31 string and reports results, regerror() transforms error codes from either 32 into human-readable messages, and regfree() frees any dynamically- 33 allocated storage used by the internal form of an RE. 34 35 The header <regex.h> declares two structure types, regex_t and 36 regmatch_t, the former for compiled internal forms and the latter for 37 match reporting. It also declares the four functions, a type regoff_t, 38 and a number of constants with names starting with "REG_". 39 40 regcomp() 41 The regcomp() function compiles the regular expression contained in the 42 pattern string, subject to the flags in cflags, and places the results in 43 the regex_t structure pointed to by preg. The cflags argument is the 44 bitwise OR of zero or more of the following flags: 45 46 REG_EXTENDED Compile extended regular expressions (EREs), rather than 47 the basic regular expressions (BREs) that are the default. 48 49 REG_BASIC This is a synonym for 0, provided as a counterpart to 50 REG_EXTENDED to improve readability. 51 52 REG_NOSPEC Compile with recognition of all special characters turned 53 off. All characters are thus considered ordinary, so the 54 RE is a literal string. This is an extension, compatible 55 with but not specified by IEEE Std 1003.2 ("POSIX.2"), and 56 should be used with caution in software intended to be 57 portable to other systems. REG_EXTENDED and REG_NOSPEC may 58 not be used in the same call to regcomp(). 59 60 REG_LITERAL An alias of REG_NOSPEC. 61 62 REG_ICASE Compile for matching that ignores upper/lower case 63 distinctions. See regex(5). 64 65 REG_NOSUB Compile for matching that need only report success or 66 failure, not what was matched. 67 68 REG_NEWLINE Compile for newline-sensitive matching. By default, 69 newline is a completely ordinary character with no special 70 meaning in either REs or strings. With this flag, "[^" 71 bracket expressions and "." never match newline, a "^" 72 anchor matches the null string after any newline in the 73 string in addition to its normal function, and the "$" 74 anchor matches the null string before any newline in the 75 string in addition to its normal function. 76 77 REG_PEND The regular expression ends, not at the first NUL, but just 78 before the character pointed to by the re_endp member of 79 the structure pointed to by preg. The re_endp member is of 80 type const char *. This flag permits inclusion of NULs in 81 the RE; they are considered ordinary characters. This is 82 an extension, compatible with but not specified by IEEE Std 83 1003.2 ("POSIX.2"), and should be used with caution in 84 software intended to be portable to other systems. 85 86 When successful, regcomp() returns 0 and fills in the structure pointed 87 to by preg. One member of that structure (other than re_endp) is 88 publicized: re_nsub, of type size_t, contains the number of parenthesized 89 subexpressions within the RE (except that the value of this member is 90 undefined if the REG_NOSUB flag was used). 91 92 regexec() 93 The regexec() function matches the compiled RE pointed to by preg against 94 the string, subject to the flags in eflags, and reports results using 95 nmatch, pmatch, and the returned value. The RE must have been compiled 96 by a previous invocation of regcomp(). The compiled form is not altered 97 during execution of regexec(), so a single compiled RE can be used 98 simultaneously by multiple threads. 99 100 By default, the NUL-terminated string pointed to by string is considered 101 to be the text of an entire line, minus any terminating newline. The 102 eflags argument is the bitwise OR of zero or more of the following flags: 103 104 REG_NOTBOL The first character of the string is treated as the 105 continuation of a line. This means that the anchors "^", 106 "[[:<:]]", and "\<" do not match before it; but see 107 REG_STARTEND below. This does not affect the behavior of 108 newlines under REG_NEWLINE. 109 110 REG_NOTEOL The NUL terminating the string does not end a line, so the 111 "$" anchor does not match before it. This does not affect 112 the behavior of newlines under REG_NEWLINE. 113 114 REG_STARTEND The string is considered to start at string + 115 pmatch[0].rm_so and to end before the byte located at 116 string + pmatch[0].rm_eo, regardless of the value of 117 nmatch. See below for the definition of pmatch and nmatch. 118 This is an extension, compatible with but not specified by 119 IEEE Std 1003.2 ("POSIX.2"), and should be used with 120 caution in software intended to be portable to other 121 systems. 122 123 Without REG_NOTBOL, the position rm_so is considered the 124 beginning of a line, such that "^" matches before it, and 125 the beginning of a word if there is a word character at 126 this position, such that "[[:<:]]" and "\<" match before 127 it. 128 129 With REG_NOTBOL, the character at position rm_so is treated 130 as the continuation of a line, and if rm_so is greater than 131 0, the preceding character is taken into consideration. If 132 the preceding character is a newline and the regular 133 expression was compiled with REG_NEWLINE, "^" matches 134 before the string; if the preceding character is not a word 135 character but the string starts with a word character, 136 "[[:<:]]" and "\<" match before the string. 137 138 See regex(5) for a discussion of what is matched in situations where an 139 RE or a portion thereof could match any of several substrings of string. 140 141 If REG_NOSUB was specified in the compilation of the RE, or if nmatch is 142 0, regexec() ignores the pmatch argument (but see below for the case 143 where REG_STARTEND is specified). Otherwise, pmatch points to an array 144 of nmatch structures of type regmatch_t. Such a structure has at least 145 the members rm_so and rm_eo, both of type regoff_t (a signed arithmetic 146 type at least as large as an off_t and a ssize_t), containing 147 respectively the offset of the first character of a substring and the 148 offset of the first character after the end of the substring. Offsets 149 are measured from the beginning of the string argument given to 150 regexec(). An empty substring is denoted by equal offsets, both 151 indicating the character following the empty substring. 152 153 The 0th member of the pmatch array is filled in to indicate what 154 substring of string was matched by the entire RE. Remaining members 155 report what substring was matched by parenthesized subexpressions within 156 the RE; member i reports subexpression i, with subexpressions counted 157 (starting at 1) by the order of their opening parentheses in the RE, left 158 to right. Unused entries in the array (corresponding either to 159 subexpressions that did not participate in the match at all, or to 160 subexpressions that do not exist in the RE (that is, i > preg->re_nsub)) 161 have both rm_so and rm_eo set to -1. If a subexpression participated in 162 the match several times, the reported substring is the last one it 163 matched. (Note, as an example in particular, that when the RE "(b*)+" 164 matches "bbb", the parenthesized subexpression matches each of the three 165 `b's and then an infinite number of empty strings following the last "b", 166 so the reported substring is one of the empties.) 167 168 If REG_STARTEND is specified, pmatch must point to at least one 169 regmatch_t (even if nmatch is 0 or REG_NOSUB was specified), to hold the 170 input offsets for REG_STARTEND. Use for output is still entirely 171 controlled by nmatch; if nmatch is 0 or REG_NOSUB was specified, the 172 value of pmatch[0] will not be changed by a successful regexec(). 173 174 regerror() 175 The regerror() function maps a non-zero errcode from either regcomp() or 176 regexec() to a human-readable, printable message. If preg is non-NULL, 177 the error code should have arisen from use of the regex_t pointed to by 178 preg, and if the error code came from regcomp(), it should have been the 179 result from the most recent regcomp() using that regex_t. The 180 (regerror() may be able to supply a more detailed message using 181 information from the regex_t.) The regerror() function places the NUL- 182 terminated message into the buffer pointed to by errbuf, limiting the 183 length (including the NUL) to at most errbuf_size bytes. If the whole 184 message will not fit, as much of it as will fit before the terminating 185 NUL is supplied. In any case, the returned value is the size of buffer 186 needed to hold the whole message (including terminating NUL). If 187 errbuf_size is 0, errbuf is ignored but the return value is still 188 correct. 189 190 regfree() 191 The regfree() function frees any dynamically-allocated storage associated 192 with the compiled RE pointed to by preg. The remaining regex_t is no 193 longer a valid compiled RE and the effect of supplying it to regexec() or 194 regerror() is undefined. 195 196 RETURN VALUES 197 On successful completion, the regcomp() function returns 0. Otherwise, 198 it returns an integer value indicating an error as described in 199 <regex.h>, and the content of preg is undefined. 200 201 On successful completion, the regexec() function returns 0. Otherwise it 202 returns REG_NOMATCH to indicate no match. 203 204 Upon successful completion, the regerror() function returns the number of 205 bytes needed to hold the entire generated string. 206 207 The regfree() function returns no value. 208 209 The following constants are defined as error return values: 210 211 REG_NOMATCH The regexec() function failed to match. 212 REG_BADPAT Invalid regular expression. 213 REG_ECOLLATE Invalid collating element referenced. 214 REG_ECTYPE Invalid character class type referenced. 215 REG_EESCAPE Trailing "\" in pattern. 216 REG_ESUBREG Number in "\digit" invalid or in error. 217 REG_EBRACK "[]" imbalance. 218 REG_ENOSYS The function is not supported. 219 REG_EPAREN "\(\)" or "()" imbalance. 220 REG_EBRACE "\{\}" imbalance. 221 REG_BADBR Content of "\{\}" invalid: not a number, number too large, 222 more than two numbers, first larger than second. 223 REG_ERANGE Invalid endpoint in range expression. 224 REG_ESPACE Out of memory. 225 REG_BADRPT "?", "*" or "+" not preceded by valid regular expression. 226 REG_EMPTY Empty (sub)expression. 227 REG_INVARG Invalid argument, e.g. negative-length string. 228 229 USAGE 230 An application could use: 231 232 regerror(code, preg, (char *)NULL, (size_t)0) 233 234 to find out how big a buffer is needed for the generated string, malloc() 235 a buffer to hold the string, and then call regerror() again to get the 236 string (see malloc(3C)). Alternately, it could allocate a fixed, static 237 buffer that is big enough to hold most strings, and then use malloc() 238 allocate a larger buffer if it finds that this is too small. 239 240 EXAMPLES 241 Matching string against the extended regular expression in pattern. 242 243 #include <regex.h> 244 245 /* 246 * Match string against the extended regular expression in 247 * pattern, treating errors as no match. 248 * 249 * return 1 for match, 0 for no match 250 */ 251 int 252 match(const char *string, char *pattern) 253 { 254 int status; 255 regex_t re; 256 257 if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) { 258 return(0); /* report error */ 259 } 260 status = regexec(&re, string, (size_t) 0, NULL, 0); 261 regfree(&re); 262 if (status != 0) { 263 return(0); /* report error */ 264 } 265 return(1); 266 } 267 268 The following demonstrates how the REG_NOTBOL flag could be used with 269 regexec() to find all substrings in a line that match a pattern supplied 270 by a user. (For simplicity of the example, very little error checking is 271 done.) 272 273 (void) regcomp(&re, pattern, 0); 274 /* this call to regexec() finds the first match on the line */ 275 error = regexec(&re, &buffer[0], 1, &pm, 0); 276 while (error == 0) { /* while matches found */ 277 /* substring found between pm.rm_so and pm.rm_eo */ 278 /* This call to regexec() finds the next match */ 279 error = regexec(&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL); 280 } 281 282 ERRORS 283 No errors are defined. 284 285 CODE SET INDEPENDENCE 286 Enabled 287 288 INTERFACE STABILITY 289 Standard 290 291 MT-LEVEL 292 MT-Safe with exceptions 293 294 The regcomp() function can be used safely in a multithreaded application 295 as long as setlocale(3C) is not being called to change the locale. 296 297 SEE ALSO 298 attributes(5), regex(5), standards(5) 299 300 IEEE Std 1003.2 ("POSIX.2"), sections 2.8 (Regular Expression Notation) 301 and B.5 (C Binding for Regular Expression Matching). 302 303 illumos February 3, 2018 illumos