40 regcomp()
41 The regcomp() function compiles the regular expression contained in the
42 pattern string, subject to the flags in cflags, and places the results in
43 the regex_t structure pointed to by preg. The cflags argument is the
44 bitwise OR of zero or more of the following flags:
45
46 REG_EXTENDED Compile extended regular expressions (EREs), rather than
47 the basic regular expressions (BREs) that are the default.
48
49 REG_BASIC This is a synonym for 0, provided as a counterpart to
50 REG_EXTENDED to improve readability.
51
52 REG_NOSPEC Compile with recognition of all special characters turned
53 off. All characters are thus considered ordinary, so the
54 RE is a literal string. This is an extension, compatible
55 with but not specified by IEEE Std 1003.2 ("POSIX.2"), and
56 should be used with caution in software intended to be
57 portable to other systems. REG_EXTENDED and REG_NOSPEC may
58 not be used in the same call to regcomp().
59
60 REG_ICASE Compile for matching that ignores upper/lower case
61 distinctions. See regex(5).
62
63 REG_NOSUB Compile for matching that need only report success or
64 failure, not what was matched.
65
66 REG_NEWLINE Compile for newline-sensitive matching. By default,
67 newline is a completely ordinary character with no special
68 meaning in either REs or strings. With this flag, "[^"
69 bracket expressions and "." never match newline, a "^"
70 anchor matches the null string after any newline in the
71 string in addition to its normal function, and the "$"
72 anchor matches the null string before any newline in the
73 string in addition to its normal function.
74
75 REG_PEND The regular expression ends, not at the first NUL, but just
76 before the character pointed to by the re_endp member of
77 the structure pointed to by preg. The re_endp member is of
78 type const char *. This flag permits inclusion of NULs in
79 the RE; they are considered ordinary characters. This is
168 input offsets for REG_STARTEND. Use for output is still entirely
169 controlled by nmatch; if nmatch is 0 or REG_NOSUB was specified, the
170 value of pmatch[0] will not be changed by a successful regexec().
171
172 regerror()
173 The regerror() function maps a non-zero errcode from either regcomp() or
174 regexec() to a human-readable, printable message. If preg is non-NULL,
175 the error code should have arisen from use of the regex_t pointed to by
176 preg, and if the error code came from regcomp(), it should have been the
177 result from the most recent regcomp() using that regex_t. The
178 (regerror() may be able to supply a more detailed message using
179 information from the regex_t.) The regerror() function places the NUL-
180 terminated message into the buffer pointed to by errbuf, limiting the
181 length (including the NUL) to at most errbuf_size bytes. If the whole
182 message will not fit, as much of it as will fit before the terminating
183 NUL is supplied. In any case, the returned value is the size of buffer
184 needed to hold the whole message (including terminating NUL). If
185 errbuf_size is 0, errbuf is ignored but the return value is still
186 correct.
187
188 If the errcode given to regerror() is first ORed with REG_ITOA, the
189 "message" that results is the printable name of the error code, e.g.
190 "REG_NOMATCH", rather than an explanation thereof. If errcode is
191 REG_ATOI, then preg shall be non-NULL and the re_endp member of the
192 structure it points to must point to the printable name of an error code;
193 in this case, the result in errbuf is the decimal digits of the numeric
194 value of the error code (0 if the name is not recognized). REG_ITOA and
195 REG_ATOI are intended primarily as debugging facilities; they are
196 extensions, compatible with but not specified by IEEE Std 1003.2
197 ("POSIX.2"), and should be used with caution in software intended to be
198 portable to other systems.
199
200 regfree()
201 The regfree() function frees any dynamically-allocated storage associated
202 with the compiled RE pointed to by preg. The remaining regex_t is no
203 longer a valid compiled RE and the effect of supplying it to regexec() or
204 regerror() is undefined.
205
206 IMPLEMENTATION NOTES
207 There are a number of decisions that IEEE Std 1003.2 ("POSIX.2") leaves
208 up to the implementor, either by explicitly saying "undefined" or by
209 virtue of them being forbidden by the RE grammar. This implementation
210 treats them as follows.
211
212 There is no particular limit on the length of REs, except insofar as
213 memory is limited. Memory usage is approximately linear in RE size, and
214 largely insensitive to RE complexity, except for bounded repetitions.
215
216 A backslashed character other than one specifically given a magic meaning
217 by IEEE Std 1003.2 ("POSIX.2") (such magic meanings occur only in BREs)
218 is taken as an ordinary character.
219
220 Any unmatched "[" is a REG_EBRACK error.
221
222 Equivalence classes cannot begin or end bracket-expression ranges. The
223 endpoint of one range cannot begin another.
224
225 RE_DUP_MAX, the limit on repetition counts in bounded repetitions, is
226 255.
227
228 A repetition operator ("?", "*", "+", or bounds) cannot follow another
229 repetition operator. A repetition operator cannot begin an expression or
230 subexpression or follow "^" or "|".
231
232 "|" cannot appear first or last in a (sub)expression or after another
233 "|", i.e., an operand of "|" cannot be an empty subexpression. An empty
234 parenthesized subexpression, "()", is legal and matches an empty
235 (sub)string. An empty string is not a legal RE.
236
237 A "{" followed by a digit is considered the beginning of bounds for a
238 bounded repetition, which must then follow the syntax for bounds. A "{"
239 not followed by a digit is considered an ordinary character.
240
241 "^" and "$" beginning and ending subexpressions in BREs are anchors, not
242 ordinary characters.
243
244 RETURN VALUES
245 On successful completion, the regcomp() function returns 0. Otherwise,
246 it returns an integer value indicating an error as described in
247 <regex.h>, and the content of preg is undefined.
248
249 On successful completion, the regexec() function returns 0. Otherwise it
250 returns REG_NOMATCH to indicate no match, or REG_ENOSYS to indicate that
251 the function is not supported.
252
253 Upon successful completion, the regerror() function returns the number of
254 bytes needed to hold the entire generated string. Otherwise, it returns
255 0 to indicate that the function is not implemented.
256
257 The regfree() function returns no value.
258
259 The following constants are defined as error return values:
260
261 REG_NOMATCH The regexec() function failed to match.
262 REG_BADPAT Invalid regular expression.
263 REG_ECOLLATE Invalid collating element referenced.
264 REG_ECTYPE Invalid character class type referenced.
265 REG_EESCAPE Trailing "\" in pattern.
266 REG_ESUBREG Number in "\digit" invalid or in error.
267 REG_EBRACK "[]" imbalance.
268 REG_ENOSYS The function is not supported.
269 REG_EPAREN "\(\)" or "()" imbalance.
270 REG_EBRACE "\{\}" imbalance.
271 REG_BADBR Content of "\{\}" invalid: not a number, number too large,
272 more than two numbers, first larger than second.
273 REG_ERANGE Invalid endpoint in range expression.
274 REG_ESPACE Out of memory.
275 REG_BADRPT "?", "*" or "+" not preceded by valid regular expression.
276
277 USAGE
278 An application could use:
279
280 regerror(code, preg, (char *)NULL, (size_t)0)
281
282 to find out how big a buffer is needed for the generated string, malloc()
283 a buffer to hold the string, and then call regerror() again to get the
284 string (see malloc(3C)). Alternately, it could allocate a fixed, static
285 buffer that is big enough to hold most strings, and then use malloc()
286 allocate a larger buffer if it finds that this is too small.
287
288 EXAMPLES
289 Matching string against the extended regular expression in pattern.
290
291 #include <regex.h>
292
293 /*
294 * Match string against the extended regular expression in
295 * pattern, treating errors as no match.
331 No errors are defined.
332
333 CODE SET INDEPENDENCE
334 Enabled
335
336 INTERFACE STABILITY
337 Standard
338
339 MT-LEVEL
340 MT-Safe with exceptions
341
342 The regcomp() function can be used safely in a multithreaded application
343 as long as setlocale(3C) is not being called to change the locale.
344
345 SEE ALSO
346 attributes(5), regex(5), standards(5)
347
348 IEEE Std 1003.2 ("POSIX.2"), sections 2.8 (Regular Expression Notation)
349 and B.5 (C Binding for Regular Expression Matching).
350
351 illumos June 14, 2017 illumos
|
40 regcomp()
41 The regcomp() function compiles the regular expression contained in the
42 pattern string, subject to the flags in cflags, and places the results in
43 the regex_t structure pointed to by preg. The cflags argument is the
44 bitwise OR of zero or more of the following flags:
45
46 REG_EXTENDED Compile extended regular expressions (EREs), rather than
47 the basic regular expressions (BREs) that are the default.
48
49 REG_BASIC This is a synonym for 0, provided as a counterpart to
50 REG_EXTENDED to improve readability.
51
52 REG_NOSPEC Compile with recognition of all special characters turned
53 off. All characters are thus considered ordinary, so the
54 RE is a literal string. This is an extension, compatible
55 with but not specified by IEEE Std 1003.2 ("POSIX.2"), and
56 should be used with caution in software intended to be
57 portable to other systems. REG_EXTENDED and REG_NOSPEC may
58 not be used in the same call to regcomp().
59
60 REG_LITERAL An alias of REG_NOSPEC.
61
62 REG_ICASE Compile for matching that ignores upper/lower case
63 distinctions. See regex(5).
64
65 REG_NOSUB Compile for matching that need only report success or
66 failure, not what was matched.
67
68 REG_NEWLINE Compile for newline-sensitive matching. By default,
69 newline is a completely ordinary character with no special
70 meaning in either REs or strings. With this flag, "[^"
71 bracket expressions and "." never match newline, a "^"
72 anchor matches the null string after any newline in the
73 string in addition to its normal function, and the "$"
74 anchor matches the null string before any newline in the
75 string in addition to its normal function.
76
77 REG_PEND The regular expression ends, not at the first NUL, but just
78 before the character pointed to by the re_endp member of
79 the structure pointed to by preg. The re_endp member is of
80 type const char *. This flag permits inclusion of NULs in
81 the RE; they are considered ordinary characters. This is
170 input offsets for REG_STARTEND. Use for output is still entirely
171 controlled by nmatch; if nmatch is 0 or REG_NOSUB was specified, the
172 value of pmatch[0] will not be changed by a successful regexec().
173
174 regerror()
175 The regerror() function maps a non-zero errcode from either regcomp() or
176 regexec() to a human-readable, printable message. If preg is non-NULL,
177 the error code should have arisen from use of the regex_t pointed to by
178 preg, and if the error code came from regcomp(), it should have been the
179 result from the most recent regcomp() using that regex_t. The
180 (regerror() may be able to supply a more detailed message using
181 information from the regex_t.) The regerror() function places the NUL-
182 terminated message into the buffer pointed to by errbuf, limiting the
183 length (including the NUL) to at most errbuf_size bytes. If the whole
184 message will not fit, as much of it as will fit before the terminating
185 NUL is supplied. In any case, the returned value is the size of buffer
186 needed to hold the whole message (including terminating NUL). If
187 errbuf_size is 0, errbuf is ignored but the return value is still
188 correct.
189
190 regfree()
191 The regfree() function frees any dynamically-allocated storage associated
192 with the compiled RE pointed to by preg. The remaining regex_t is no
193 longer a valid compiled RE and the effect of supplying it to regexec() or
194 regerror() is undefined.
195
196 RETURN VALUES
197 On successful completion, the regcomp() function returns 0. Otherwise,
198 it returns an integer value indicating an error as described in
199 <regex.h>, and the content of preg is undefined.
200
201 On successful completion, the regexec() function returns 0. Otherwise it
202 returns REG_NOMATCH to indicate no match.
203
204 Upon successful completion, the regerror() function returns the number of
205 bytes needed to hold the entire generated string.
206
207 The regfree() function returns no value.
208
209 The following constants are defined as error return values:
210
211 REG_NOMATCH The regexec() function failed to match.
212 REG_BADPAT Invalid regular expression.
213 REG_ECOLLATE Invalid collating element referenced.
214 REG_ECTYPE Invalid character class type referenced.
215 REG_EESCAPE Trailing "\" in pattern.
216 REG_ESUBREG Number in "\digit" invalid or in error.
217 REG_EBRACK "[]" imbalance.
218 REG_ENOSYS The function is not supported.
219 REG_EPAREN "\(\)" or "()" imbalance.
220 REG_EBRACE "\{\}" imbalance.
221 REG_BADBR Content of "\{\}" invalid: not a number, number too large,
222 more than two numbers, first larger than second.
223 REG_ERANGE Invalid endpoint in range expression.
224 REG_ESPACE Out of memory.
225 REG_BADRPT "?", "*" or "+" not preceded by valid regular expression.
226 REG_EMPTY Empty (sub)expression.
227 REG_INVARG Invalid argument, e.g. negative-length string.
228
229 USAGE
230 An application could use:
231
232 regerror(code, preg, (char *)NULL, (size_t)0)
233
234 to find out how big a buffer is needed for the generated string, malloc()
235 a buffer to hold the string, and then call regerror() again to get the
236 string (see malloc(3C)). Alternately, it could allocate a fixed, static
237 buffer that is big enough to hold most strings, and then use malloc()
238 allocate a larger buffer if it finds that this is too small.
239
240 EXAMPLES
241 Matching string against the extended regular expression in pattern.
242
243 #include <regex.h>
244
245 /*
246 * Match string against the extended regular expression in
247 * pattern, treating errors as no match.
283 No errors are defined.
284
285 CODE SET INDEPENDENCE
286 Enabled
287
288 INTERFACE STABILITY
289 Standard
290
291 MT-LEVEL
292 MT-Safe with exceptions
293
294 The regcomp() function can be used safely in a multithreaded application
295 as long as setlocale(3C) is not being called to change the locale.
296
297 SEE ALSO
298 attributes(5), regex(5), standards(5)
299
300 IEEE Std 1003.2 ("POSIX.2"), sections 2.8 (Regular Expression Notation)
301 and B.5 (C Binding for Regular Expression Matching).
302
303 illumos February 3, 2018 illumos
|