1 TH_DEFINE(1M) Maintenance Commands TH_DEFINE(1M) 2 3 4 5 NAME 6 th_define - create fault injection test harness error specifications 7 8 SYNOPSIS 9 th_define [-n name -i instance| -P path] [-a acc_types] 10 [-r reg_number] [-l offset [length]] 11 [-c count [failcount]] [-o operator [operand]] 12 [-f acc_chk] [-w max_wait_period [report_interval]] 13 14 15 or 16 17 18 th_define [-n name -i instance| -P path] 19 [-a log [acc_types] [-r reg_number] [-l offset [length]]] 20 [-c count [failcount]] [-s collect_time] [-p policy] 21 [-x flags] [-C comment_string] 22 [-e fixup_script [args]] 23 24 25 or 26 27 28 th_define [-h] 29 30 31 DESCRIPTION 32 The th_define utility provides an interface to the bus_ops fault 33 injection bofi device driver for defining error injection 34 specifications (referred to as errdefs). An errdef corresponds to a 35 specification of how to corrupt a device driver's accesses to its 36 hardware. The command line arguments determine the precise nature of 37 the fault to be injected. If the supplied arguments define a consistent 38 errdef, the th_define process will store the errdef with the bofi 39 driver and suspend itself until the criteria given by the errdef become 40 satisfied (in practice, this will occur when the access counts go to 41 zero). 42 43 44 You use the th_manage(1M) command with the start option to activate the 45 resulting errdef. The effect of th_manage with the start option is that 46 the bofi driver acts upon the errdef by matching the number of hardware 47 accesses--specified in count, that are of the type specified in 48 acc_types, made by instance number instance--of the driver whose name 49 is name, (or by the driver instance specified by path) to the register 50 set (or DMA handle) specified by reg_number, that lie within the range 51 offset to offset + length from the beginning of the register set or DMA 52 handle. It then applies operator and operand to the next failcount 53 matching accesses. 54 55 56 If acc_types includes log, th_define runs in automatic test script 57 generation mode, and a set of test scripts (written in the Korn shell) 58 is created and placed in a sub-directory of the current directory with 59 the name <driver>.test.<id> (for example, glm.test.978177106). A 60 separate, executable script is generated for each access handle that 61 matches the logging criteria. The log of accesses is placed at the top 62 of each script as a record of the session. If the current directory is 63 not writable, file output is written to standard output. The base name 64 of each test file is the driver name, and the extension is a number 65 that discriminates between different access handles. A control script 66 (with the same name as the created test directory) is generated that 67 will run all the test scripts sequentially. 68 69 70 Executing the scripts will install, and then activate, the resulting 71 error definitions. Error definitions are activated sequentially and the 72 driver instance under test is taken offline and brought back online 73 before each test (refer to the -e option for more information). By 74 default, logging applies to all PIO accesses, all interrupts, and all 75 DMA accesses to and from areas mapped for both reading and writing. You 76 can constrain logging by specifying additional acc_types, reg_number, 77 offset and length. Logging will continue for count matching accesses, 78 with an optional time limit of collect_time seconds. 79 80 81 Either the -n or -P option must be provided. The other options are 82 optional. If an option (other than -a) is specified multiple times, 83 only the final value for the option is used. If an option is not 84 specified, its associated value is set to an appropriate default, which 85 will provide maximal error coverage as described below. 86 87 OPTIONS 88 The following options are available: 89 90 -n name 91 92 Specify the name of the driver to test. (String) 93 94 95 -i instance 96 97 Test only the specified driver instance (-1 matches all instances 98 of driver). (Numeric) 99 100 101 -P path 102 103 Specify the full device path of the driver to test. (String) 104 105 106 -r reg_number 107 108 Test only the given register set or DMA handle (-1 matches all 109 register sets and DMA handles). (Numeric) 110 111 112 -a acc_types 113 114 Only the specified access types will be matched. Valid values for 115 the acc_types argument are log, pio, pio_r, pio_w, dma, dma_r, 116 dma_w and intr. Multiple access types, separated by spaces, can be 117 specified. The default is to match all hardware accesses. 118 119 If acc_types is set to log, logging will match all PIO accesses, 120 interrupts and DMA accesses to and from areas mapped for both 121 reading and writing. log can be combined with other acc_types, in 122 which case the matching condition for logging will be restricted to 123 the specified additional acc_types. Note that dma_r will match only 124 DMA handles mapped for reading only; dma_w will match only DMA 125 handles mapped for writing only; dma will match only DMA handles 126 mapped for both reading and writing. 127 128 129 -l offset [length] 130 131 Constrain the range of qualifying accesses. The offset and length 132 arguments indicate that any access of the type specified with the 133 -a option, to the register set or DMA handle specified with the -r 134 option, lie at least offset bytes into the register set or DMA 135 handle and at most offset + length bytes into it. The default for 136 offset is 0. The default for length is the maximum value that can 137 be placed in an offset_t C data type (see types.h). Negative values 138 are converted into unsigned quantities. Thus, th_define -l 0 -1 is 139 maximal. 140 141 142 -c count[failcount] 143 144 Wait for count number of matching accesses, then apply an operator 145 and operand (see the -o option) to the next failcount number of 146 matching accesses. If the access type (see the -a option) includes 147 logging, the number of logged accesses is given by count + 148 failcount - 1. The -1 is required because the last access coincides 149 with the first faulting access. 150 151 Note that access logging may be combined with error injection if 152 failcount and operator are nonzero and if the access type includes 153 logging and any of the other access types (pio, dma and intr) See 154 the description of access types in the definition of the -a option, 155 above. 156 157 When the count and failcount fields reach zero, the status of the 158 errdef is reported to standard output. When all active errdefs 159 created by the th_define process complete, the process exits. If 160 acc_types includes log, count determines how many accesses to log. 161 If count is not specified, a default value is used. If failcount is 162 set in this mode, it will simply increase the number of accesses 163 logged by a further failcount - 1. 164 165 166 -o operator [operand] 167 168 For qualifying PIO read and write accesses, the value read from or 169 written to the hardware is corrupted according to the value of 170 operator: 171 172 EQ 173 operand is returned to the driver. 174 175 176 OR 177 operand is bitwise ORed with the real value. 178 179 180 AND 181 operand is bitwise ANDed with the real value. 182 183 184 XOR 185 operand is bitwise XORed with the real value. 186 187 For PIO write accesses, the following operator is allowed: 188 189 NO 190 Simply ignore the driver's attempt to write to the hardware. 191 192 Note that a driver performs PIO via the ddi_getX(), ddi_putX(), 193 ddi_rep_getX() and ddi_rep_putX() routines (where X is 8, 16, 32 or 194 64). Accesses made using ddi_getX() and ddi_putX() are treated as 195 a single access, whereas an access made using the ddi_rep_*(9F) 196 routines are broken down into their respective number of accesses, 197 as given by the repcount parameter to these DDI calls. If the 198 access is performed via a DMA handle, operator and value are 199 applied to every access that comprises the DMA request. If 200 interference with interrupts has been requested then the operator 201 may take any of the following values: 202 203 DELAY 204 After count accesses (see the -c option), delay delivery 205 of the next failcount number of interrupts for operand 206 number of microseconds. 207 208 209 LOSE 210 After count number of interrupts, fail to deliver the next 211 failcount number of real interrupts to the driver. 212 213 214 EXTRA 215 After count number of interrupts, start delivering operand 216 number of extra interrupts for the next failcount number 217 of real interrupts. 218 219 The default value for operand and operator is to corrupt the data 220 access by flipping each bit (XOR with -1). 221 222 223 -f acc_chk 224 225 If the acc_chk parameter is set to 1 or pio, then the driver's 226 calls to ddi_check_acc_handle(9F) return DDI_FAILURE when the 227 access count goes to 1. If the acc_chk parameter is set to 2 or 228 dma, then the driver's calls to ddi_check_dma_handle(9F) return 229 DDI_FAILURE when the access count goes to 1. 230 231 232 -w max_wait_period [report_interval] 233 234 Constrain the period for which an error definition will remain 235 active. The option applies only to non-logging errdefs. If an error 236 definition remains active for max_wait_period seconds, the test 237 will be aborted. If report_interval is set to a nonzero value, the 238 current status of the error definition is reported to standard 239 output every report_interval seconds. The default value is zero. 240 The status of the errdef is reported in parsable format (eight 241 fields, each separated by a colon (:) character, the last of which 242 is a string enclosed by double quotes and the remaining seven 243 fields are integers): 244 245 ft:mt:ac:fc:chk:ec:s:"message" which are defined as follows: 246 247 ft 248 The UTC time when the fault was injected. 249 250 251 mt 252 The UTC time when the driver reported the fault. 253 254 255 ac 256 The number of remaining non-faulting accesses. 257 258 259 fc 260 The number of remaining faulting accesses. 261 262 263 chk 264 The value of the acc_chk field of the errdef. 265 266 267 ec 268 The number of fault reports issued by the driver 269 against this errdef (mt holds the time of the initial 270 report). 271 272 273 s 274 The severity level reported by the driver. 275 276 277 "message" 278 Textual reason why the driver has reported a fault. 279 280 281 282 -h 283 284 Display the command usage string. 285 286 287 -s collect_time 288 289 If acc_types is given with the -a option and includes log, the 290 errdef will log accesses for collect_time seconds (the default is 291 to log until the log becomes full). Note that, if the errdef 292 specification matches multiple driver handles, multiple logging 293 errdefs are registered with the bofi driver and logging terminates 294 when all logs become full or when collect_time expires or when the 295 associated errdefs are cleared. The current state of the log can be 296 checked with the th_manage(1M) command, using the broadcast 297 parameter. A log can be terminated by running th_manage(1M) with 298 the clear_errdefs option or by sending a SIGALRM signal to the 299 th_define process. See alarm(2) for the semantics of SIGALRM. 300 301 302 -p policy 303 304 Applicable when the acc_types option includes log. The parameter 305 modifies the policy used for converting from logged accesses to 306 errdefs. All policies are inclusive: 307 308 o Use rare to bias error definitions toward rare accesses 309 (default). 310 311 o Use operator to produce a separate error definition for 312 each operator type (default). 313 314 o Use common to bias error definitions toward common 315 accesses. 316 317 o Use median to bias error definitions toward median 318 accesses. 319 320 o Use maximal to produce multiple error definitions for 321 duplicate accesses. 322 323 o Use unbiased to create unbiased error definitions. 324 325 o Use onebyte, twobyte, fourbyte, or eightbyte to select 326 errdefs corresponding to 1, 2, 4 or 8 byte accesses (if 327 chosen, the -xr option is enforced in order to ensure 328 that ddi_rep_*() calls are decomposed into multiple 329 single accesses). 330 331 o Use multibyte to create error definitions for multibyte 332 accesses performed using ddi_rep_get*() and 333 ddi_rep_put*(). 334 Policies can be combined by adding together these options. See the 335 NOTES section for further information. 336 337 338 -x flags 339 340 Applicable when the acc_types option includes log. The flags 341 parameter modifies the way in which the bofi driver logs accesses. 342 It is specified as a string containing any combination of the 343 following letters: 344 345 w 346 Continuous logging (that is, the log will wrap when full). 347 348 349 t 350 Timestamp each log entry (access times are in seconds). 351 352 353 r 354 Log repeated I/O as individual accesses (for example, a 355 ddi_rep_get16(9F) call which has a repcount of N is logged N 356 times with each transaction logged as size 2 bytes. Without 357 this option, the default logging behavior is to log this 358 access once only, with a transaction size of twice the 359 repcount). 360 361 362 363 -C comment_string 364 365 Applicable when the acc_types option includes log. It provides a 366 comment string to be placed in any generated test scripts. The 367 string must be enclosed in double quotes. 368 369 370 -e fixup_script [args] 371 372 Applicable when the acc_types option includes log. The output of a 373 logging errdefs is to generate a test script for each driver access 374 handle. Use this option to embed a command in the resulting script 375 before the errors are injected. The generated test scripts will 376 take an instance offline and bring it back online before injecting 377 errors in order to bring the instance into a known fault-free 378 state. The executable fixup_script will be called twice with the 379 set of optional args-- once just before the instance is taken 380 offline and again after the instance has been brought online. The 381 following variables are passed into the environment of the called 382 executable: 383 384 DRIVER_PATH 385 Identifies the device path of the instance. 386 387 388 DRIVER_INSTANCE 389 Identifies the instance number of the device. 390 391 392 DRIVER_UNCONFIGURE 393 Has the value 1 when the instance is about to 394 be taken offline. 395 396 397 DRIVER_CONFIGURE 398 Has the value 1 when the instance has just 399 been brought online. 400 401 Typically, the executable ensures that the device under test is in 402 a suitable state to be taken offline (unconfigured) or in a 403 suitable state for error injection (for example configured, error 404 free and servicing a workload). A minimal script for a network 405 driver could be: 406 407 #!/bin/ksh 408 409 driver=xyznetdriver 410 ifnum=$driver$DRIVER_INSTANCE 411 412 if [[ $DRIVER_CONFIGURE = 1 ]]; then 413 ifconfig $ifnum plumb 414 ifconfig $ifnum ... 415 ifworkload start $ifnum 416 elif [[ $DRIVER_UNCONFIGURE = 1 ]]; then 417 ifworkload stop $ifnum 418 ifconfig $ifnum down 419 ifconfig $ifnum unplumb 420 fi 421 exit $? 422 423 424 The -e option must be the last option on the command line. 425 426 427 428 If the -a log option is selected but the -e option is not given, a 429 default script is used. This script repeatedly attempts to detach and 430 then re-attach the device instance under test. 431 432 EXAMPLES 433 Examples of Error Definitions 434 th_define -n foo -i 1 -a log 435 436 437 Logs all accesses to all handles used by instance 1 of the foo driver 438 while running the default workload (attaching and detaching the 439 instance). Then generates a set of test scripts to inject appropriate 440 errdefs while running that default workload. 441 442 443 th_define -n foo -i 1 -a log pio 444 445 446 Logs PIO accesses to each PIO handle used by instance 1 of the foo 447 driver while running the default workload (attaching and detaching the 448 instance). Then generates a set of test scripts to inject appropriate 449 errdefs while running that default workload. 450 451 452 th_define -n foo -i 1 -p onebyte median -e fixup arg -now 453 454 455 Logs all accesses to all handles used by instance 1 of the foo driver 456 while running the workload defined in the fixup script fixup with 457 arguments arg and -now. Then generates a set of test scripts to inject 458 appropriate errdefs while running that workload. The resulting error 459 definitions are requested to focus upon single byte accesses to 460 locations that are accessed a median number of times with respect to 461 frequency of access to I/O addresses. 462 463 464 th_define -n se -l 0x20 1 -a pio_r -o OR 0x4 -c 10 1000 465 466 467 Simulates a stuck serial chip command by forcing 1000 consecutive read 468 accesses made by any instance of the se driver to its command status 469 register, thereby returning status busy. 470 471 472 th_define -n foo -i 3 -r 1 -a pio_r -c 0 1 -f 1 -o OR 0x100 473 474 475 Causes 0x100 to be ORed into the next physical I/O read access from any 476 register in register set 1 of instance 3 of the foo driver. Subsequent 477 calls in the driver to ddi_check_acc_handle() return DDI_FAILURE. 478 479 480 th_define -n foo -i 3 -r 1 -a pio_r -c 0 1 -o OR 0x0 481 482 483 Causes 0x0 to be ORed into the next physical I/O read access from any 484 register in register set 1 of instance 3 of the foo driver. This is of 485 course a no-op. 486 487 488 th_define -n foo -i 3 -r 1 -l 0x8100 1 -a pio_r -c 0 10 -o EQ 0x70003 489 490 491 Causes the next ten next physical I/O reads from the register at offset 492 0x8100 in register set 1 of instance 3 of the foo driver to return 493 0x70003. 494 495 496 th_define -n foo -i 3 -r 1 -l 0x8100 1 -a pio_w -c 100 3 -o AND 497 0xffffffffffffefff 498 499 500 The next 100 physical I/O writes to the register at offset 0x8100 in 501 register set 1 of instance 3 of the foo driver take place as normal. 502 However, on each of the three subsequent accesses, the 0x1000 bit will 503 be cleared. 504 505 506 th_define -n foo -i 3 -r 1 -l 0x8100 0x10 -a pio_r -c 0 1 -f 1 -o XOR 7 507 508 509 Causes the bottom three bits to have their values toggled for the next 510 physical I/O read access to registers with offsets in the range 0x8100 511 to 0x8110 in register set 1 of instance 3 of the foo driver. Subsequent 512 calls in the driver to ddi_check_acc_handle() return DDI_FAILURE. 513 514 515 th_define -n foo -i 3 -a pio_w -c 0 1 -o NO 0 516 517 518 Prevents the next physical I/O write access to any register in any 519 register set of instance 3 of the foo driver from going out on the bus. 520 521 522 th_define -n foo -i 3 -l 0 8192 -a dma_r -c 0 1 -o OR 7 523 524 525 Causes 0x7 to be ORed into each long long in the first 8192 bytes of 526 the next DMA read, using any DMA handle for instance 3 of the foo 527 driver. 528 529 530 th_define -n foo -i 3 -r 2 -l 0 8 -a dma_r -c 0 1 -o OR 531 0x7070707070707070 532 533 534 Causes 0x70 to be ORed into each byte of the first long long of the 535 next DMA read, using the DMA handle with sequential allocation number 2 536 for instance 3 of the foo driver. 537 538 539 th_define -n foo -i 3 -l 256 256 -a dma_w -c 0 1 -f 2 -o OR 7 540 541 542 Causes 0x7 to be ORed into each long long in the range from offset 256 543 to offset 512 of the next DMA write, using any DMA handle for instance 544 3 of the foo driver. Subsequent calls in the driver to 545 ddi_check_dma_handle() return DDI_FAILURE. 546 547 548 th_define -n foo -i 3 -r 0 -l 0 8 -a dma_w -c 100 3 -o AND 549 0xffffffffffffefff 550 551 552 The next 100 DMA writes using the DMA handle with sequential allocation 553 number 0 for instance 3 of the foo driver take place as normal. 554 However, on each of the three subsequent accesses, the 0x1000 bit will 555 be cleared in the first long long of the transfer. 556 557 558 th_define -n foo -i 3 -a intr -c 0 6 -o LOSE 0 559 560 561 Causes the next six interrupts for instance 3 of the foo driver to be 562 lost. 563 564 565 th_define -n foo -i 3 -a intr -c 30 1 -o EXTRA 10 566 567 568 When the thirty-first subsequent interrupt for instance 3 of the foo 569 driver occurs, a further ten interrupts are also generated. 570 571 572 th_define -n foo -i 3 -a intr -c 0 1 -o DELAY 1024 573 574 575 Causes the next interrupt for instance 3 of the foo driver to be 576 delayed by 1024 microseconds. 577 578 NOTES 579 The policy option in the th_define -p syntax determines how a set of 580 logged accesses will be converted into the set of error definitions. 581 Each logged access will be matched against the chosen policies to 582 determine whether an error definition should be created based on the 583 access. 584 585 586 Any number of policy options can be combined to modify the generated 587 error definitions. 588 589 Bytewise Policies 590 These select particular I/O transfer sizes. Specifying a byte policy 591 will exclude other byte policies that have not been chosen. If none of 592 the byte type policies is selected, all transfer sizes are treated 593 equally. Otherwise, only those specified transfer sizes will be 594 selected. 595 596 onebyte 597 Create errdefs for one byte accesses (ddi_get8()) 598 599 600 twobyte 601 Create errdefs for two byte accesses (ddi_get16()) 602 603 604 fourbyte 605 Create errdefs for four byte accesses (ddi_get32()) 606 607 608 eightbyte 609 Create errdefs for eight byte accesses (ddi_get64()) 610 611 612 multibyte 613 Create errdefs for repeated byte accesses (ddi_rep_get*()) 614 615 616 Frequency of Access Policies 617 The frequency of access to a location is determined according to the 618 access type, location and transfer size (for example, a two-byte read 619 access to address A is considered distinct from a four-byte read access 620 to address A). The algorithm is to count the number of accesses (of a 621 given type and size) to a given location, and find the locations that 622 were most and least accessed (let maxa and mina be the number of times 623 these locations were accessed, and mean the total number of accesses 624 divided by total number of locations that were accessed). Then a rare 625 access is a location that was accessed less than 626 627 628 (mean - mina) / 3 + mina 629 630 631 times. Similarly for the definition of common accesses: 632 633 634 maxa - (maxa - mean) / 3 635 636 637 A location whose access patterns lies within these cutoffs is regarded 638 as a location that is accessed with median frequency. 639 640 rare 641 Create errdefs for locations that are rarely accessed. 642 643 644 common 645 Create errdefs for locations that are commonly accessed. 646 647 648 median 649 Create errdefs for locations that are accessed a median 650 frequency. 651 652 653 Policies for Minimizing errdefs 654 If a transaction is duplicated, either a single or multiple errdefs 655 will be written to the test scripts, depending upon the following two 656 policies: 657 658 maximal 659 Create multiple errdefs for locations that are repeatedly 660 accessed. 661 662 663 unbiased 664 Create a single errdef for locations that are repeatedly 665 accessed. 666 667 668 operators 669 For each location, a default operator and operand is 670 typically applied. For maximal test coverage, this default 671 may be modified using the operators policy so that a 672 separate errdef is created for each of the possible 673 corruption operators. 674 675 676 SEE ALSO 677 kill(1), th_manage(1M), alarm(2), ddi_check_acc_handle(9F), 678 ddi_check_dma_handle(9F) 679 680 681 682 April 9, 2016 TH_DEFINE(1M)