1 '\" te
   2 .\" Copyright (c) 2005, Sun Microsystems, Inc.
   3 .TH CPUSTAT 1M "Jun 16, 2009"
   4 .SH NAME
   5 cpustat \- monitor system behavior using CPU performance counters
   6 .SH SYNOPSIS
   7 .LP
   8 .nf
   9 \fBcpustat\fR \fB-c\fR \fIeventspec\fR [\fB-c\fR \fIeventspec\fR]... [\fB-p\fR \fIperiod\fR] [\fB-T\fR u | d ]
  10      [\fB-sntD\fR] [\fIinterval\fR [\fIcount\fR]]
  11 .fi
  12 
  13 .LP
  14 .nf
  15 \fBcpustat\fR \fB-h\fR
  16 .fi
  17 
  18 .SH DESCRIPTION
  19 .sp
  20 .LP
  21 The \fBcpustat\fR utility allows \fBCPU\fR performance counters to be used to
  22 monitor the overall behavior of the \fBCPU\fRs in the system.
  23 .sp
  24 .LP
  25 If \fIinterval\fR is specified, \fBcpustat\fR samples activity every
  26 \fIinterval\fR seconds, repeating forever. If a \fIcount\fR is specified, the
  27 statistics are repeated \fIcount\fR times. If neither are specified, an
  28 interval of five seconds is used, and there is no limit to the number of
  29 samples that are taken.
  30 .SH OPTIONS
  31 .sp
  32 .LP
  33 The following options are supported:
  34 .sp
  35 .ne 2
  36 .na
  37 \fB\fB-c\fR \fIeventspec\fR\fR
  38 .ad
  39 .sp .6
  40 .RS 4n
  41 Specifies a set of events for the \fBCPU\fR performance counters to monitor.
  42 The syntax of these event specifications is:
  43 .sp
  44 .in +2
  45 .nf
  46 [picn=]\fIeventn\fR[,attr[\fIn\fR][=\fIval\fR]][,[picn=]\fIeventn\fR
  47      [,attr[n][=\fIval\fR]],...,]
  48 .fi
  49 .in -2
  50 .sp
  51 
  52 You can use the \fB-h\fR option to obtain a list of available events and
  53 attributes. This causes generation of the usage message. You can omit an
  54 explicit counter assignment, in which case \fBcpustat\fR attempts to choose a
  55 capable counter automatically.
  56 .sp
  57 Attribute values can be expressed in hexadecimal, octal, or decimal notation,
  58 in a format suitable for \fBstrtoll\fR(3C). An attribute present in the event
  59 specification without an explicit value receives a default value of \fB1\fR. An
  60 attribute without a corresponding counter number is applied to all counters in
  61 the specification.
  62 .sp
  63 The semantics of these event specifications can be determined by reading the
  64 \fBCPU\fR manufacturer's documentation for the events.
  65 .sp
  66 Multiple \fB-c\fR options can be specified, in which case the command cycles
  67 between the different event settings on each sample.
  68 .RE
  69 
  70 .sp
  71 .ne 2
  72 .na
  73 \fB\fB-D\fR\fR
  74 .ad
  75 .sp .6
  76 .RS 4n
  77 Enables debug mode.
  78 .RE
  79 
  80 .sp
  81 .ne 2
  82 .na
  83 \fB\fB-h\fR\fR
  84 .ad
  85 .sp .6
  86 .RS 4n
  87 Prints an extensive help message on how to use the utility and how to program
  88 the processor-dependent counters.
  89 .RE
  90 
  91 .sp
  92 .ne 2
  93 .na
  94 \fB\fB-n\fR\fR
  95 .ad
  96 .sp .6
  97 .RS 4n
  98 Omits all header output (useful if \fBcpustat\fR is the beginning of a
  99 pipeline).
 100 .RE
 101 
 102 .sp
 103 .ne 2
 104 .na
 105 \fB\fB-p\fR \fIperiod\fR\fR
 106 .ad
 107 .sp .6
 108 .RS 4n
 109 Causes \fBcpustat\fR to cycle through the list of \fIeventspec\fRs every
 110 \fIperiod\fR seconds. The tool sleeps after each cycle until \fIperiod\fR
 111 seconds have elapsed since the first \fIeventspec\fR was measured.
 112 .sp
 113 When this option is present, the optional \fIcount\fR parameter specifies the
 114 number of total cycles to make (instead of the number of total samples to
 115 take). If \fIperiod\fR is less than the number of \fIeventspec\fRs times
 116 \fIinterval\fR, the tool acts as it period is \fB0\fR.
 117 .RE
 118 
 119 .sp
 120 .ne 2
 121 .na
 122 \fB\fB-s\fR\fR
 123 .ad
 124 .sp .6
 125 .RS 4n
 126 Creates an idle soaker thread to spin while system-only \fIeventspec\fRs are
 127 bound. One idle soaker thread is bound to each CPU in the current processor
 128 set. System-only \fIeventspec\fRs contain both the \fBnouser\fR and the
 129 \fBsys\fR tokens and measure events that occur while the CPU is operating in
 130 privileged mode. This option prevents the kernel's idle loop from running and
 131 triggering system-mode events.
 132 .RE
 133 
 134 .sp
 135 .ne 2
 136 .na
 137 \fB\fB-T\fR \fBu\fR | \fBd\fR\fR
 138 .ad
 139 .sp .6
 140 .RS 4n
 141 Display a time stamp.
 142 .sp
 143 Specify \fBu\fR for a printed representation of the internal representation of
 144 time. See \fBtime\fR(2). Specify \fBd\fR for standard date format. See
 145 \fBdate\fR(1).
 146 .RE
 147 
 148 .sp
 149 .ne 2
 150 .na
 151 \fB\fB-t\fR\fR
 152 .ad
 153 .sp .6
 154 .RS 4n
 155 Prints an additional column of processor cycle counts, if available on the
 156 current architecture.
 157 .RE
 158 
 159 .SH USAGE
 160 .sp
 161 .LP
 162 A closely related utility, \fBcputrack\fR(1), can be used to monitor the
 163 behavior of individual applications with little or no interference from other
 164 activities on the system.
 165 .sp
 166 .LP
 167 The \fBcpustat\fR utility must be run by the super-user, as there is an
 168 intrinsic conflict between the use of the \fBCPU\fR performance counters
 169 system-wide by \fBcpustat\fR and the use of the \fBCPU\fR performance counters
 170 to monitor an individual process (for example, by \fBcputrack\fR.)
 171 .sp
 172 .LP
 173 Once any instance of this utility has started, no further per-process or
 174 per-\fBLWP\fR use of the counters is allowed until the last instance of the
 175 utility terminates.
 176 .sp
 177 .LP
 178 The times printed by the command correspond to the wallclock time when the
 179 hardware counters were actually sampled, instead of when the program told the
 180 kernel to sample them. The time is derived from the same timebase as
 181 \fBgethrtime\fR(3C).
 182 .sp
 183 .LP
 184 The processor cycle counts enabled by the \fB-t\fR option always apply to both
 185 user and system modes, regardless of the settings applied to the performance
 186 counter registers.
 187 .sp
 188 .LP
 189 On some hardware platforms running in system mode using the "sys" token, the
 190 counters are implemented using 32-bit registers. While the kernel attempts to
 191 catch all overflows to synthesize 64-bit counters, because of hardware
 192 implementation restrictions, overflows can be lost unless the sampling interval
 193 is kept short enough. The events most prone to wrap are those that count
 194 processor clock cycles. If such an event is of interest, sampling should occur
 195 frequently so that less than 4 billion clock cycles can occur between samples.
 196 .sp
 197 .LP
 198 The output of cpustat is designed to be readily parseable by \fBnawk\fR(1) and
 199 \fBperl\fR(1), thereby allowing performance tools to be composed by embedding
 200 \fBcpustat\fR in scripts. Alternatively, tools can be constructed directly
 201 using the same \fBAPI\fRs that \fBcpustat\fR is built upon using the facilities
 202 of \fBlibcpc\fR(3LIB). See \fBcpc\fR(3CPC).
 203 .sp
 204 .LP
 205 The \fBcpustat\fR utility only monitors the \fBCPU\fRs that are accessible to
 206 it in the current processor set. Thus, several instances of the utility can be
 207 running on the \fBCPU\fRs in different processor sets. See \fBpsrset\fR(1M) for
 208 more information about processor sets.
 209 .sp
 210 .LP
 211 Because \fBcpustat\fR uses \fBLWP\fRs bound to \fBCPU\fRs, the utility might
 212 have to be terminated before the configuration of the relevant processor can be
 213 changed.
 214 .SH EXAMPLES
 215 .SS "SPARC"
 216 .LP
 217 \fBExample 1 \fRMeasuring External Cache References and Misses
 218 .sp
 219 .LP
 220 The following example measures misses and references in the external cache.
 221 These occur while the processor is operating in user mode on an UltraSPARC
 222 machine.
 223 
 224 .sp
 225 .in +2
 226 .nf
 227 example% cpustat -c EC_ref,EC_misses 1 3
 228 
 229     time cpu event      pic0      pic1
 230    1.008   0  tick     69284      1647
 231    1.008   1  tick     43284      1175
 232    2.008   0  tick    179576      1834
 233    2.008   1  tick    202022     12046
 234    3.008   0  tick     93262       384
 235    3.008   1  tick     63649      1118
 236    3.008   2 total    651077     18204
 237 .fi
 238 .in -2
 239 .sp
 240 
 241 .SS "x86"
 242 .LP
 243 \fBExample 2 \fRMeasuring Branch Prediction Success on Pentium 4
 244 .sp
 245 .LP
 246 The following example measures branch mispredictions and total branch
 247 instructions in user and system mode on a Pentium 4 machine.
 248 
 249 .sp
 250 .in +2
 251 .nf
 252  example% cpustat -c \e
 253     pic12=branch_retired,emask12=0x4,pic14=branch_retired,\e
 254     emask14=0xf,sys 1 3
 255 
 256     time cpu event      pic12     pic14
 257    1.010   1  tick       458       684
 258    1.010   0  tick       305       511
 259    2.010   0  tick       181       269
 260    2.010   1  tick       469       684
 261    3.010   0  tick       182       269
 262    3.010   1  tick       468       684
 263    3.010   2 total      2063      3101
 264 .fi
 265 .in -2
 266 .sp
 267 
 268 .LP
 269 \fBExample 3 \fRCounting Memory Accesses on Opteron
 270 .sp
 271 .LP
 272 The following example determines the number of memory accesses made through
 273 each memory controller on an Opteron, broken down by internal memory latency:
 274 
 275 .sp
 276 .in +2
 277 .nf
 278 cpustat -c \e
 279    pic0=NB_mem_ctrlr_page_access,umask0=0x01, \e
 280    pic1=NB_mem_ctrlr_page_access,umask1=0x02, \e
 281    pic2=NB_mem_ctrlr_page_access,umask2=0x04,sys \e
 282    1
 283 
 284     time cpu event      pic0      pic1      pic2
 285    1.003   0  tick     41976     53519      7720
 286    1.003   1  tick      5589     19402       731
 287    2.003   1  tick      6011     17005       658
 288    2.003   0  tick     43944     45473      7338
 289    3.003   1  tick      7105     20177       762
 290    3.003   0  tick     47045     48025      7119
 291    4.003   0  tick     43224     46296      6694
 292    4.003   1  tick      5366     19114       652
 293 .fi
 294 .in -2
 295 .sp
 296 
 297 .SH WARNINGS
 298 .sp
 299 .LP
 300 By running the \fBcpustat\fR command, the super-user forcibly invalidates all
 301 existing performance counter context. This can in turn cause all invocations of
 302 the \fBcputrack\fR command, and other users of performance counter context, to
 303 exit prematurely with unspecified errors.
 304 .sp
 305 .LP
 306 If \fBcpustat\fR is invoked on a system that has \fBCPU\fR performance counters
 307 which are not supported by Solaris, the following message appears:
 308 .sp
 309 .in +2
 310 .nf
 311 cpustat: cannot access performance counters - Operation not applicable
 312 .fi
 313 .in -2
 314 .sp
 315 
 316 .sp
 317 .LP
 318 This error message implies that \fBcpc_open()\fR has failed and is documented
 319 in \fBcpc_open\fR(3CPC). Review this documentation for more information about
 320 the problem and possible solutions.
 321 .sp
 322 .LP
 323 If a short interval is requested, \fBcpustat\fR might not be able to keep up
 324 with the desired sample rate. In this case, some samples might be dropped.
 325 .SH ATTRIBUTES
 326 .sp
 327 .LP
 328 See \fBattributes\fR(5) for descriptions of the following attributes:
 329 .sp
 330 
 331 .sp
 332 .TS
 333 box;
 334 c | c
 335 l | l .
 336 ATTRIBUTE TYPE  ATTRIBUTE VALUE
 337 _
 338 Interface Stability     Evolving
 339 .TE
 340 
 341 .SH SEE ALSO
 342 .sp
 343 .LP
 344 \fBcputrack\fR(1), \fBnawk\fR(1), \fBperl\fR(1), \fBiostat\fR(1M),
 345 \fBprstat\fR(1M), \fBpsrset\fR(1M), \fBvmstat\fR(1M), \fBcpc\fR(3CPC),
 346 \fBcpc_open\fR(3CPC), \fBcpc_bind_cpu\fR(3CPC), \fBgethrtime\fR(3C),
 347 \fBstrtoll\fR(3C), \fBlibcpc\fR(3LIB), \fBattributes\fR(5)
 348 .SH NOTES
 349 .sp
 350 .LP
 351 When \fBcpustat\fR is run on a Pentium 4 with HyperThreading enabled, a CPC set
 352 is bound to only one logical CPU of each physical CPU. See
 353 \fBcpc_bind_cpu\fR(3CPC).