Print this page
4023 - Typo in file(1) manpage and various others
Split |
Close |
Expand all |
Collapse all |
--- old/usr/src/man/man3cpc/cpc_bind_event.3cpc
+++ new/usr/src/man/man3cpc/cpc_bind_event.3cpc
1 1 '\" te
2 2 .\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved.
3 3 .\" The contents of this file are subject to the terms of the Common Development and Distribution License (the "License"). You may not use this file except in compliance with the License.
4 4 .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. See the License for the specific language governing permissions and limitations under the License.
5 5 .\" When distributing Covered Code, include this CDDL HEADER in each file and include the License file at usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your own identifying information: Portions Copyright [yyyy] [name of copyright owner]
6 -.TH CPC_BIND_EVENT 3CPC "Mar 02, 2007"
6 +.TH CPC_BIND_EVENT 3CPC "Sep 10, 2013"
7 7 .SH NAME
8 8 cpc_bind_event, cpc_take_sample, cpc_rele \- use CPU performance counters on
9 9 lwps
10 10 .SH SYNOPSIS
11 11 .LP
12 12 .nf
13 13 cc [ \fIflag\fR... ] \fIfile\fR... \(milcpc [ \fIlibrary\fR... ]
14 14 #include <libcpc.h>
15 15
16 16 \fBint\fR \fBcpc_bind_event\fR(\fBcpc_event_t *\fR\fIevent\fR, \fBint\fR \fIflags\fR);
17 17 .fi
18 18
19 19 .LP
20 20 .nf
21 21 \fBint\fR \fBcpc_take_sample\fR(\fBcpc_event_t *\fR\fIevent\fR);
22 22 .fi
23 23
24 24 .LP
25 25 .nf
26 26 \fBint\fR \fBcpc_rele\fR(\fBvoid\fR);
27 27 .fi
28 28
29 29 .SH DESCRIPTION
30 30 .sp
31 31 .LP
32 32 Once the events to be sampled have been selected using, for example,
33 33 \fBcpc_strtoevent\fR(3CPC), the event selections can be bound to the calling
34 34 \fBLWP\fR using \fBcpc_bind_event()\fR. If \fBcpc_bind_event()\fR returns
35 35 successfully, the system has associated performance counter context with the
36 36 calling \fBLWP\fR. The context allows the system to virtualize the hardware
37 37 counters to that specific \fBLWP\fR, and the counters are enabled.
38 38 .sp
39 39 .LP
40 40 Two flags are defined that can be passed into the routine to allow the behavior
41 41 of the interface to be modified, as described below.
42 42 .sp
43 43 .LP
44 44 Counter values can be sampled at any time by calling \fBcpc_take_sample()\fR,
45 45 and dereferencing the fields of the \fBce_pic\fR\fB[]\fR array returned. The
46 46 \fBce_hrt\fR field contains the timestamp at which the kernel last sampled the
47 47 counters.
48 48 .sp
49 49 .LP
50 50 To immediately remove the performance counter context on an \fBLWP\fR, the
51 51 \fBcpc_rele()\fR interface should be used. Otherwise, the context will be
52 52 destroyed after the \fBLWP\fR or process exits.
53 53 .sp
54 54 .LP
55 55 The caller should take steps to ensure that the counters are sampled often
56 56 enough to avoid the 32-bit counters wrapping. The events most prone to wrap are
57 57 those that count processor clock cycles. If such an event is of interest,
58 58 sampling should occur frequently so that less than 4 billion clock cycles can
59 59 occur between samples. Practically speaking, this is only likely to be a
60 60 problem for otherwise idle systems, or when processes are bound to processors,
61 61 since normal context switching behavior will otherwise hide this problem.
62 62 .SH RETURN VALUES
63 63 .sp
64 64 .LP
65 65 Upon successful completion, \fBcpc_bind_event()\fR and \fBcpc_take_sample()\fR
66 66 return \fB0\fR. Otherwise, these functions return \fB\(mi1\fR, and set
67 67 \fBerrno\fR to indicate the error.
68 68 .SH ERRORS
69 69 .sp
70 70 .LP
71 71 The \fBcpc_bind_event()\fR and \fBcpc_take_sample()\fR functions will fail if:
72 72 .sp
73 73 .ne 2
74 74 .na
75 75 \fB\fBEACCES\fR\fR
76 76 .ad
77 77 .RS 11n
78 78 For \fBcpc_bind_event()\fR, access to the requested hypervisor event was
79 79 denied.
80 80 .RE
81 81
82 82 .sp
83 83 .ne 2
84 84 .na
85 85 \fB\fBEAGAIN\fR\fR
86 86 .ad
87 87 .RS 11n
88 88 Another process may be sampling system-wide CPU statistics. For
89 89 \fBcpc_bind_event()\fR, this implies that no new contexts can be created. For
90 90 \fBcpc_take_sample()\fR, this implies that the performance counter context has
91 91 been invalidated and must be released with \fBcpc_rele()\fR. Robust programs
92 92 should be coded to expect this behavior and recover from it by releasing the
93 93 now invalid context by calling \fBcpc_rele()\fR sleeping for a while, then
94 94 attempting to bind and sample the event once more.
95 95 .RE
96 96
97 97 .sp
98 98 .ne 2
99 99 .na
100 100 \fB\fBEINVAL\fR\fR
101 101 .ad
102 102 .RS 11n
103 103 The \fBcpc_take_sample()\fR function has been invoked before the context is
104 104 bound.
105 105 .RE
106 106
107 107 .sp
108 108 .ne 2
109 109 .na
110 110 \fB\fBENOTSUP\fR\fR
111 111 .ad
112 112 .RS 11n
113 113 The caller has attempted an operation that is illegal or not supported on the
114 114 current platform, such as attempting to specify signal delivery on counter
115 115 overflow on a CPU that doesn't generate an interrupt on counter overflow.
116 116 .RE
117 117
118 118 .SH USAGE
119 119 .sp
120 120 .LP
121 121 Prior to calling \fBcpc_bind_event()\fR, applications should call
122 122 \fBcpc_access\fR(3CPC) to determine if the counters are accessible on the
123 123 system.
124 124 .SH EXAMPLES
125 125 .LP
126 126 \fBExample 1 \fRUse hardware performance counters to measure events in a
127 127 process.
128 128 .sp
129 129 .LP
130 130 The example below shows how a standalone program can be instrumented with the
131 131 \fBlibcpc\fR routines to use hardware performance counters to measure events in
132 132 a process. The program performs 20 iterations of a computation, measuring the
133 133 counter values for each iteration. By default, the example makes the counters
134 134 measure external cache references and external cache hits; these options are
135 135 only appropriate for UltraSPARC processors. By setting the \fBPERFEVENTS\fR
136 136 environment variable to other strings (a list of which can be gleaned from the
137 137 \fB-h\fR flag of the \fBcpustat\fR or \fBcputrack\fR utilities), other events
138 138 can be counted. The \fBerror()\fR routine below is assumed to be a
139 139 user-provided routine analogous to the familiar \fBprintf\fR(3C) routine from
140 140 the C library but which also performs an \fBexit\fR(2) after printing the
141 141 message.
142 142
143 143 .sp
144 144 .in +2
145 145 .nf
146 146 \fB#include <inttypes.h>
147 147 #include <stdlib.h>
148 148 #include <stdio.h>
149 149 #include <unistd.h>
150 150 #include <libcpc.h>
151 151 int
152 152 main(int argc, char *argv[])
153 153 {
154 154 int cpuver, iter;
155 155 char *setting = NULL;
156 156 cpc_event_t event;
157 157
158 158 if (cpc_version(CPC_VER_CURRENT) != CPC_VER_CURRENT)
159 159 error("application:library cpc version mismatch!");
160 160
161 161 if ((cpuver = cpc_getcpuver()) == -1)
162 162 error("no performance counter hardware!");
163 163
164 164 if ((setting = getenv("PERFEVENTS")) == NULL)
165 165 setting = "pic0=EC_ref,pic1=EC_hit";
166 166
167 167 if (cpc_strtoevent(cpuver, setting, &event) != 0)
168 168 error("can't measure '%s' on this processor", setting);
169 169 setting = cpc_eventtostr(&event);
170 170
171 171 if (cpc_access() == -1)
172 172 error("can't access perf counters: %s", strerror(errno));
173 173
174 174 if (cpc_bind_event(&event, 0) == -1)
175 175 error("can't bind lwp%d: %s", _lwp_self(), strerror(errno));
176 176
↓ open down ↓ |
160 lines elided |
↑ open up ↑ |
177 177 for (iter = 1; iter <= 20; iter++) {
178 178 cpc_event_t before, after;
179 179
180 180 if (cpc_take_sample(&before) == -1)
181 181 break;
182 182
183 183 /* ==> Computation to be measured goes here <== */
184 184
185 185 if (cpc_take_sample(&after) == -1)
186 186 break;
187 - (void) printf("%3d: %" PRId64 " %" PRId64 "\n", iter,
187 + (void) printf("%3d: %" PRId64 " %" PRId64 "\en", iter,
188 188 after.ce_pic[0] - before.ce_pic[0],
189 189 after.ce_pic[1] - before.ce_pic[1]);
190 190 }
191 191
192 192 if (iter != 20)
193 193 error("can't sample '%s': %s", setting, strerror(errno));
194 194
195 195 free(setting);
196 196 return (0);
197 197 }\fR
198 198 .fi
199 199 .in -2
200 200
201 201 .LP
202 202 \fBExample 2 \fRWrite a signal handler to catch overflow signals.
203 203 .sp
204 204 .LP
205 205 This example builds on Example 1, but demonstrates how to write the signal
206 206 handler to catch overflow signals. The counters are preset so that counter zero
207 207 is 1000 counts short of overflowing, while counter one is set to zero. After
208 208 1000 counts on counter zero, the signal handler will be invoked.
209 209
210 210 .sp
211 211 .LP
212 212 First the signal handler:
213 213
214 214 .sp
215 215 .in +2
216 216 .nf
217 217 #define PRESET0 (UINT64_MAX - UINT64_C(999))
218 218 #define PRESET1 0
219 219
220 220 void
221 221 emt_handler(int sig, siginfo_t *sip, void *arg)
↓ open down ↓ |
24 lines elided |
↑ open up ↑ |
222 222 {
223 223 ucontext_t *uap = arg;
224 224 cpc_event_t sample;
225 225
226 226 if (sig != SIGEMT || sip->si_code != EMT_CPCOVF) {
227 227 psignal(sig, "example");
228 228 psiginfo(sip, "example");
229 229 return;
230 230 }
231 231
232 -(void) printf("lwp%d - si_addr %p ucontext: %%pc %p %%sp %p\n",
232 +(void) printf("lwp%d - si_addr %p ucontext: %%pc %p %%sp %p\en",
233 233 _lwp_self(), (void *)sip->si_addr,
234 234 (void *)uap->uc_mcontext.gregs[PC],
235 235 (void *)uap->uc_mcontext.gregs[USP]);
236 236
237 237 if (cpc_take_sample(&sample) == -1)
238 238 error("can't sample: %s", strerror(errno));
239 239
240 -(void) printf("0x%" PRIx64 " 0x%" PRIx64 "\n",
240 +(void) printf("0x%" PRIx64 " 0x%" PRIx64 "\en",
241 241 sample.ce_pic[0], sample.ce_pic[1]);
242 242 (void) fflush(stdout);
243 243
244 244 sample.ce_pic[0] = PRESET0;
245 245 sample.ce_pic[1] = PRESET1;
246 246 if (cpc_bind_event(&sample, CPC_BIND_EMT_OVF) == -1)
247 247 error("cannot bind lwp%d: %s", _lwp_self(), strerror(errno));
248 248 }
249 249 .fi
250 250 .in -2
251 251
252 252 .sp
253 253 .LP
254 254 and second the setup code (this can be placed after the code that selects the
255 255 event to be measured):
256 256
257 257 .sp
258 258 .in +2
259 259 .nf
260 260 \fBstruct sigaction act;
261 261 cpc_event_t event;
262 262 \&...
263 263 act.sa_sigaction = emt_handler;
264 264 bzero(&act.sa_mask, sizeof (act.sa_mask));
265 265 act.sa_flags = SA_RESTART|SA_SIGINFO;
266 266 if (sigaction(SIGEMT, &act, NULL) == -1)
267 267 error("sigaction: %s", strerror(errno));
268 268 event.ce_pic[0] = PRESET0;
269 269 event.ce_pic[1] = PRESET1;
270 270 if (cpc_bind_event(&event, CPC_BIND_EMT_OVF) == -1)
271 271 error("cannot bind lwp%d: %s", _lwp_self(), strerror(errno));
272 272
273 273 for (iter = 1; iter <= 20; iter++) {
274 274 /* ==> Computation to be measured goes here <== */
275 275 }
276 276
277 277 cpc_bind_event(NULL, 0); /* done */\fR
278 278 .fi
279 279 .in -2
280 280
281 281 .sp
282 282 .LP
283 283 Note that a more general version of the signal handler would use \fBwrite\fR(2)
284 284 directly instead of depending on the signal-unsafe semantics of \fBstderr\fR
285 285 and \fBstdout\fR. Most real signal handlers will probably do more with the
286 286 samples than just print them out.
287 287
288 288 .SH ATTRIBUTES
289 289 .sp
290 290 .LP
291 291 See \fBattributes\fR(5) for descriptions of the following attributes:
292 292 .sp
293 293
294 294 .sp
295 295 .TS
296 296 box;
297 297 c | c
298 298 l | l .
299 299 ATTRIBUTE TYPE ATTRIBUTE VALUE
300 300 _
301 301 MT-Level MT-Safe
302 302 _
303 303 Interface Stability Obsolete
304 304 .TE
305 305
306 306 .SH SEE ALSO
307 307 .sp
308 308 .LP
309 309 \fBcpustat\fR(1M), \fBcputrack\fR(1), \fBwrite\fR(2). \fBcpc\fR(3CPC),
310 310 \fBcpc_access\fR(3CPC), \fBcpc_bind_curlwp\fR(3CPC),
311 311 \fBcpc_set_sample\fR(3CPC), \fBcpc_strtoevent\fR(3CPC), \fBcpc_unbind\fR(3CPC),
312 312 \fBlibcpc\fR(3LIB), \fBattributes\fR(5)
313 313 .SH NOTES
314 314 .sp
315 315 .LP
316 316 The \fBcpc_bind_event()\fR, \fBcpc_take_sample()\fR, and \fBcpc_rele()\fR
317 317 functions exist for binary compatibility only. Source containing these
318 318 functions will not compile. These functions are obsolete and might be removed
319 319 in a future release. Applications should use \fBcpc_bind_curlwp\fR(3CPC),
320 320 \fBcpc_set_sample\fR(3CPC), and \fBcpc_unbind\fR(3CPC) instead.
321 321 .sp
322 322 .LP
323 323 Sometimes, even the overhead of performing a system call will be too disruptive
324 324 to the events being measured. Once a call to \fBcpc_bind_event()\fR has been
325 325 issued, it is possible to directly access the performance hardware registers
326 326 from within the application. If the performance counter context is active, then
327 327 the counters will count on behalf of the current \fBLWP\fR.
328 328 .SS "SPARC"
329 329 .sp
330 330 .in +2
331 331 .nf
332 332 rd %pic, %r\fBN\fR ! All UltraSPARC
333 333 wr %r\fIN\fR, %pic ! (ditto, but see text)
334 334 .fi
335 335 .in -2
336 336
337 337 .SS "x86"
338 338 .sp
339 339 .in +2
340 340 .nf
341 341 rdpmc ! Pentium II only
342 342 .fi
343 343 .in -2
344 344
345 345 .sp
346 346 .LP
347 347 If the counter context is not active or has been invalidated, the \fB%pic\fR
348 348 register (SPARC), and the \fBrdpmc\fR instruction (Pentium) will become
349 349 unavailable.
350 350 .sp
351 351 .LP
352 352 Note that the two 32-bit UltraSPARC performance counters are kept in the single
353 353 64-bit \fB%pic\fR register so a couple of additional instructions are required
354 354 to separate the values. Also note that when the \fB%pcr\fR register bit has
355 355 been set that configures the \fB%pic\fR register as readable by an application,
356 356 it is also writable. Any values written will be preserved by the context
357 357 switching mechanism.
358 358 .sp
359 359 .LP
360 360 Pentium II processors support the non-privileged \fBrdpmc\fR instruction which
361 361 requires [5] that the counter of interest be specified in \fB%ecx\fR, and
362 362 returns a 40-bit value in the \fB%edx:%eax\fR register pair. There is no
363 363 non-privileged access mechanism for Pentium I processors.
364 364 .SS "Handling counter overflow"
365 365 .sp
366 366 .LP
367 367 As described above, when counting events, some processors allow their counter
368 368 registers to silently overflow. More recent CPUs such as UltraSPARC III and
369 369 Pentium II, however, are capable of generating an interrupt when the hardware
370 370 counter overflows. Some processors offer more control over when interrupts will
371 371 actually be generated. For example, they might allow the interrupt to be
372 372 programmed to occur when only one of the counters overflows. See
373 373 \fBcpc_strtoevent\fR(3CPC) for the syntax.
374 374 .sp
375 375 .LP
376 376 The most obvious use for this facility is to ensure that the full 64-bit
377 377 counter values are maintained without repeated sampling. However, current
378 378 hardware does not record which counter overflowed. A more subtle use for this
379 379 facility is to preset the counter to a value to a little less than the maximum
380 380 value, then use the resulting interrupt to catch the counter overflow
381 381 associated with that event. The overflow can then be used as an indication of
382 382 the frequency of the occurrence of that event.
383 383 .sp
384 384 .LP
385 385 Note that the interrupt generated by the processor may not be particularly
386 386 precise. That is, the particular instruction that caused the counter overflow
387 387 may be earlier in the instruction stream than is indicated by the program
388 388 counter value in the ucontext.
389 389 .sp
390 390 .LP
391 391 When \fBcpc_bind_event()\fR is called with the \fBCPC_BIND_EMT_OVF\fR flag
392 392 set, then as before, the control registers and counters are preset from the
393 393 64-bit values contained in \fBevent\fR. However, when the flag is set, the
394 394 kernel arranges to send the calling process a \fBSIGEMT\fR signal when the
395 395 overflow occurs, with the \fBsi_code\fR field of the corresponding
396 396 \fBsiginfo\fR structure set to \fBEMT_CPCOVF\fR, and the \fBsi_addr\fR field is
397 397 the program counter value at the time the overflow interrupt was delivered.
398 398 Counting is disabled until the next call to \fBcpc_bind_event()\fR. Even in a
399 399 multithreaded process, during execution of the signal handler, the thread
400 400 behaves as if it is temporarily bound to the running \fBLWP\fR.
401 401 .sp
402 402 .LP
403 403 Different processors have different counter ranges available, though all
404 404 processors supported by Solaris allow at least 31 bits to be specified as a
405 405 counter preset value; thus portable preset values lie in the range
406 406 \fBUINT64_MAX\fR to \fBUINT64_MAX\fR\(mi\fBINT32_MAX\fR.
407 407 .sp
408 408 .LP
409 409 The appropriate preset value will often need to be determined experimentally.
410 410 Typically, it will depend on the event being measured, as well as the desire to
411 411 minimize the impact of the act of measurement on the event being measured; less
412 412 frequent interrupts and samples lead to less perturbation of the system.
413 413 .sp
414 414 .LP
415 415 If the processor cannot detect counter overflow, this call will fail
416 416 (\fBENOTSUP\fR). Specifying a null event unbinds the context from the
417 417 underlying \fBLWP\fR and disables signal delivery. Currently, only user events
418 418 can be measured using this technique. See Example 2, above.
419 419 .SS "Inheriting events onto multiple \fBLWP\fRs"
420 420 .sp
421 421 .LP
422 422 By default, the library binds the performance counter context to the current
423 423 \fBLWP\fR only. If the \fBCPC_BIND_LWP_INHERIT\fR flag is set, then any
424 424 subsequent \fBLWP\fRs created by that \fBLWP\fR will automatically inherit the
425 425 same performance counter context. The counters will be initialized to 0 as if
426 426 a \fBcpc_bind_event()\fR had just been issued. This automatic inheritance
427 427 behavior can be useful when dealing with multithreaded programs to determine
428 428 aggregate statistics for the program as a whole.
429 429 .sp
430 430 .LP
431 431 If the \fBCPC_BIND_EMT_OVF\fR flag is also set, the process will immediately
432 432 dispatch a \fBSIGEMT\fR signal to the freshly created \fBLWP\fR so that it can
433 433 preset its counters appropriately on the new \fBLWP\fR. This initialization
434 434 condition can be detected using \fBcpc_take_sample()\fR to check that both
435 435 \fBce_pic\fR[] values are set to \fBUINT64_MAX\fR.
↓ open down ↓ |
185 lines elided |
↑ open up ↑ |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX