1 REGCOMP(3C) Standard C Library Functions REGCOMP(3C)
2
3 NAME
4 regcomp, regexec, regerror, regfree - regular-expression library
5
6 LIBRARY
7 Standard C Library (libc, -lc)
8
9 SYNOPSIS
10 #include <regex.h>
11
12 int
13 regcomp(regex_t *restrict preg, const char *restrict pattern,
14 int cflags);
15
16 int
17 regexec(const regex_t *restrict preg, const char *restrict string,
18 size_t nmatch, regmatch_t pmatch[restrict], int eflags);
19
20 size_t
21 regerror(int errcode, const regex_t *restrict preg,
22 char *restrict errbuf, size_t errbuf_size);
23
24 void
25 regfree(regex_t *preg);
26
27 DESCRIPTION
28 These routines implement IEEE Std 1003.2 ("POSIX.2") regular expressions;
29 see regex(5). The regcomp() function compiles an RE written as a string
30 into an internal form, regexec() matches that internal form against a
31 string and reports results, regerror() transforms error codes from either
32 into human-readable messages, and regfree() frees any dynamically-
33 allocated storage used by the internal form of an RE.
34
35 The header <regex.h> declares two structure types, regex_t and
36 regmatch_t, the former for compiled internal forms and the latter for
37 match reporting. It also declares the four functions, a type regoff_t,
38 and a number of constants with names starting with "REG_".
39
40 regcomp()
41 The regcomp() function compiles the regular expression contained in the
42 pattern string, subject to the flags in cflags, and places the results in
43 the regex_t structure pointed to by preg. The cflags argument is the
44 bitwise OR of zero or more of the following flags:
45
46 REG_EXTENDED Compile extended regular expressions (EREs), rather than
47 the basic regular expressions (BREs) that are the default.
48
49 REG_BASIC This is a synonym for 0, provided as a counterpart to
50 REG_EXTENDED to improve readability.
51
52 REG_NOSPEC Compile with recognition of all special characters turned
53 off. All characters are thus considered ordinary, so the
54 RE is a literal string. This is an extension, compatible
55 with but not specified by IEEE Std 1003.2 ("POSIX.2"), and
56 should be used with caution in software intended to be
57 portable to other systems. REG_EXTENDED and REG_NOSPEC may
58 not be used in the same call to regcomp().
59
60 REG_LITERAL An alias of REG_NOSPEC.
61
62 REG_ICASE Compile for matching that ignores upper/lower case
63 distinctions. See regex(5).
64
65 REG_NOSUB Compile for matching that need only report success or
66 failure, not what was matched.
67
68 REG_NEWLINE Compile for newline-sensitive matching. By default,
69 newline is a completely ordinary character with no special
70 meaning in either REs or strings. With this flag, "[^"
71 bracket expressions and "." never match newline, a "^"
72 anchor matches the null string after any newline in the
73 string in addition to its normal function, and the "$"
74 anchor matches the null string before any newline in the
75 string in addition to its normal function.
76
77 REG_PEND The regular expression ends, not at the first NUL, but just
78 before the character pointed to by the re_endp member of
79 the structure pointed to by preg. The re_endp member is of
80 type const char *. This flag permits inclusion of NULs in
81 the RE; they are considered ordinary characters. This is
82 an extension, compatible with but not specified by IEEE Std
83 1003.2 ("POSIX.2"), and should be used with caution in
84 software intended to be portable to other systems.
85
86 When successful, regcomp() returns 0 and fills in the structure pointed
87 to by preg. One member of that structure (other than re_endp) is
88 publicized: re_nsub, of type size_t, contains the number of parenthesized
89 subexpressions within the RE (except that the value of this member is
90 undefined if the REG_NOSUB flag was used).
91
92 regexec()
93 The regexec() function matches the compiled RE pointed to by preg against
94 the string, subject to the flags in eflags, and reports results using
95 nmatch, pmatch, and the returned value. The RE must have been compiled
96 by a previous invocation of regcomp(). The compiled form is not altered
97 during execution of regexec(), so a single compiled RE can be used
98 simultaneously by multiple threads.
99
100 By default, the NUL-terminated string pointed to by string is considered
101 to be the text of an entire line, minus any terminating newline. The
102 eflags argument is the bitwise OR of zero or more of the following flags:
103
104 REG_NOTBOL The first character of the string is treated as the
105 continuation of a line. This means that the anchors "^",
106 "[[:<:]]", and "\<" do not match before it; but see
107 REG_STARTEND below. This does not affect the behavior of
108 newlines under REG_NEWLINE.
109
110 REG_NOTEOL The NUL terminating the string does not end a line, so the
111 "$" anchor does not match before it. This does not affect
112 the behavior of newlines under REG_NEWLINE.
113
114 REG_STARTEND The string is considered to start at string +
115 pmatch[0].rm_so and to end before the byte located at
116 string + pmatch[0].rm_eo, regardless of the value of
117 nmatch. See below for the definition of pmatch and nmatch.
118 This is an extension, compatible with but not specified by
119 IEEE Std 1003.2 ("POSIX.2"), and should be used with
120 caution in software intended to be portable to other
121 systems.
122
123 Without REG_NOTBOL, the position rm_so is considered the
124 beginning of a line, such that "^" matches before it, and
125 the beginning of a word if there is a word character at
126 this position, such that "[[:<:]]" and "\<" match before
127 it.
128
129 With REG_NOTBOL, the character at position rm_so is treated
130 as the continuation of a line, and if rm_so is greater than
131 0, the preceding character is taken into consideration. If
132 the preceding character is a newline and the regular
133 expression was compiled with REG_NEWLINE, "^" matches
134 before the string; if the preceding character is not a word
135 character but the string starts with a word character,
136 "[[:<:]]" and "\<" match before the string.
137
138 See regex(5) for a discussion of what is matched in situations where an
139 RE or a portion thereof could match any of several substrings of string.
140
141 If REG_NOSUB was specified in the compilation of the RE, or if nmatch is
142 0, regexec() ignores the pmatch argument (but see below for the case
143 where REG_STARTEND is specified). Otherwise, pmatch points to an array
144 of nmatch structures of type regmatch_t. Such a structure has at least
145 the members rm_so and rm_eo, both of type regoff_t (a signed arithmetic
146 type at least as large as an off_t and a ssize_t), containing
147 respectively the offset of the first character of a substring and the
148 offset of the first character after the end of the substring. Offsets
149 are measured from the beginning of the string argument given to
150 regexec(). An empty substring is denoted by equal offsets, both
151 indicating the character following the empty substring.
152
153 The 0th member of the pmatch array is filled in to indicate what
154 substring of string was matched by the entire RE. Remaining members
155 report what substring was matched by parenthesized subexpressions within
156 the RE; member i reports subexpression i, with subexpressions counted
157 (starting at 1) by the order of their opening parentheses in the RE, left
158 to right. Unused entries in the array (corresponding either to
159 subexpressions that did not participate in the match at all, or to
160 subexpressions that do not exist in the RE (that is, i > preg->re_nsub))
161 have both rm_so and rm_eo set to -1. If a subexpression participated in
162 the match several times, the reported substring is the last one it
163 matched. (Note, as an example in particular, that when the RE "(b*)+"
164 matches "bbb", the parenthesized subexpression matches each of the three
165 `b's and then an infinite number of empty strings following the last "b",
166 so the reported substring is one of the empties.)
167
168 If REG_STARTEND is specified, pmatch must point to at least one
169 regmatch_t (even if nmatch is 0 or REG_NOSUB was specified), to hold the
170 input offsets for REG_STARTEND. Use for output is still entirely
171 controlled by nmatch; if nmatch is 0 or REG_NOSUB was specified, the
172 value of pmatch[0] will not be changed by a successful regexec().
173
174 regerror()
175 The regerror() function maps a non-zero errcode from either regcomp() or
176 regexec() to a human-readable, printable message. If preg is non-NULL,
177 the error code should have arisen from use of the regex_t pointed to by
178 preg, and if the error code came from regcomp(), it should have been the
179 result from the most recent regcomp() using that regex_t. The
180 (regerror() may be able to supply a more detailed message using
181 information from the regex_t.) The regerror() function places the NUL-
182 terminated message into the buffer pointed to by errbuf, limiting the
183 length (including the NUL) to at most errbuf_size bytes. If the whole
184 message will not fit, as much of it as will fit before the terminating
185 NUL is supplied. In any case, the returned value is the size of buffer
186 needed to hold the whole message (including terminating NUL). If
187 errbuf_size is 0, errbuf is ignored but the return value is still
188 correct.
189
190 regfree()
191 The regfree() function frees any dynamically-allocated storage associated
192 with the compiled RE pointed to by preg. The remaining regex_t is no
193 longer a valid compiled RE and the effect of supplying it to regexec() or
194 regerror() is undefined.
195
196 RETURN VALUES
197 On successful completion, the regcomp() function returns 0. Otherwise,
198 it returns an integer value indicating an error as described in
199 <regex.h>, and the content of preg is undefined.
200
201 On successful completion, the regexec() function returns 0. Otherwise it
202 returns REG_NOMATCH to indicate no match.
203
204 Upon successful completion, the regerror() function returns the number of
205 bytes needed to hold the entire generated string.
206
207 The regfree() function returns no value.
208
209 The following constants are defined as error return values:
210
211 REG_NOMATCH The regexec() function failed to match.
212 REG_BADPAT Invalid regular expression.
213 REG_ECOLLATE Invalid collating element referenced.
214 REG_ECTYPE Invalid character class type referenced.
215 REG_EESCAPE Trailing "\" in pattern.
216 REG_ESUBREG Number in "\digit" invalid or in error.
217 REG_EBRACK "[]" imbalance.
218 REG_ENOSYS The function is not supported.
219 REG_EPAREN "\(\)" or "()" imbalance.
220 REG_EBRACE "\{\}" imbalance.
221 REG_BADBR Content of "\{\}" invalid: not a number, number too large,
222 more than two numbers, first larger than second.
223 REG_ERANGE Invalid endpoint in range expression.
224 REG_ESPACE Out of memory.
225 REG_BADRPT "?", "*" or "+" not preceded by valid regular expression.
226 REG_EMPTY Empty (sub)expression.
227 REG_INVARG Invalid argument, e.g. negative-length string.
228
229 USAGE
230 An application could use:
231
232 regerror(code, preg, (char *)NULL, (size_t)0)
233
234 to find out how big a buffer is needed for the generated string, malloc()
235 a buffer to hold the string, and then call regerror() again to get the
236 string (see malloc(3C)). Alternately, it could allocate a fixed, static
237 buffer that is big enough to hold most strings, and then use malloc()
238 allocate a larger buffer if it finds that this is too small.
239
240 EXAMPLES
241 Matching string against the extended regular expression in pattern.
242
243 #include <regex.h>
244
245 /*
246 * Match string against the extended regular expression in
247 * pattern, treating errors as no match.
248 *
249 * return 1 for match, 0 for no match
250 */
251 int
252 match(const char *string, char *pattern)
253 {
254 int status;
255 regex_t re;
256
257 if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) {
258 return(0); /* report error */
259 }
260 status = regexec(&re, string, (size_t) 0, NULL, 0);
261 regfree(&re);
262 if (status != 0) {
263 return(0); /* report error */
264 }
265 return(1);
266 }
267
268 The following demonstrates how the REG_NOTBOL flag could be used with
269 regexec() to find all substrings in a line that match a pattern supplied
270 by a user. (For simplicity of the example, very little error checking is
271 done.)
272
273 (void) regcomp(&re, pattern, 0);
274 /* this call to regexec() finds the first match on the line */
275 error = regexec(&re, &buffer[0], 1, &pm, 0);
276 while (error == 0) { /* while matches found */
277 /* substring found between pm.rm_so and pm.rm_eo */
278 /* This call to regexec() finds the next match */
279 error = regexec(&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
280 }
281
282 ERRORS
283 No errors are defined.
284
285 CODE SET INDEPENDENCE
286 Enabled
287
288 INTERFACE STABILITY
289 Standard
290
291 MT-LEVEL
292 MT-Safe with exceptions
293
294 The regcomp() function can be used safely in a multithreaded application
295 as long as setlocale(3C) is not being called to change the locale.
296
297 SEE ALSO
298 attributes(5), regex(5), standards(5)
299
300 IEEE Std 1003.2 ("POSIX.2"), sections 2.8 (Regular Expression Notation)
301 and B.5 (C Binding for Regular Expression Matching).
302
303 illumos February 3, 2018 illumos