1 TH_DEFINE(1M)                Maintenance Commands                TH_DEFINE(1M)
   2 
   3 
   4 
   5 NAME
   6        th_define - create fault injection test harness error specifications
   7 
   8 SYNOPSIS
   9        th_define [-n name -i instance| -P path] [-a acc_types]
  10             [-r reg_number] [-l offset [length]]
  11             [-c count [failcount]] [-o operator [operand]]
  12             [-f acc_chk] [-w max_wait_period [report_interval]]
  13 
  14 
  15        or
  16 
  17 
  18        th_define [-n name -i instance| -P path]
  19             [-a log [acc_types] [-r reg_number] [-l offset [length]]]
  20             [-c count [failcount]] [-s collect_time] [-p policy]
  21             [-x flags] [-C comment_string]
  22             [-e fixup_script [args]]
  23 
  24 
  25        or
  26 
  27 
  28        th_define [-h]
  29 
  30 
  31 DESCRIPTION
  32        The th_define utility provides an interface to the bus_ops fault
  33        injection bofi device driver for defining error injection
  34        specifications (referred to as errdefs). An errdef corresponds to a
  35        specification of how to corrupt a device driver's accesses to its
  36        hardware. The command line arguments determine the precise nature of
  37        the fault to be injected. If the supplied arguments define a consistent
  38        errdef, the th_define process will store the errdef with the bofi
  39        driver and suspend itself until the criteria given by the errdef become
  40        satisfied (in practice, this will occur when the access counts go to
  41        zero).
  42 
  43 
  44        You use the th_manage(1M) command with the start option to activate the
  45        resulting errdef. The effect of th_manage with the start option is that
  46        the bofi driver acts upon the errdef by matching the number of hardware
  47        accesses--specified in count, that are of the type specified in
  48        acc_types, made by instance number instance--of the driver whose name
  49        is name, (or by the driver instance specified by path) to the register
  50        set (or DMA handle) specified by reg_number, that lie within the range
  51        offset to offset + length from the beginning of the register set or DMA
  52        handle. It then applies operator and operand to the next failcount
  53        matching accesses.
  54 
  55 
  56        If acc_types includes log, th_define runs in automatic test script
  57        generation mode, and a set of test scripts (written in the Korn shell)
  58        is created and placed in a sub-directory of the current directory with
  59        the name <driver>.test.<id> (for     example, glm.test.978177106).  A
  60        separate, executable script is generated for each access handle that
  61        matches the logging criteria. The log of accesses is placed at the top
  62        of each script as a record of the session. If the current directory is
  63        not writable, file output is written to standard output. The base name
  64        of each test file is the driver name, and the extension is a number
  65        that discriminates between different access handles. A control script
  66        (with the same name as the created test directory) is generated that
  67        will run all the test scripts sequentially.
  68 
  69 
  70        Executing the scripts will install, and then activate, the resulting
  71        error definitions. Error definitions are activated sequentially and the
  72        driver instance under test is taken offline and brought back online
  73        before each test (refer to the -e option for more information). By
  74        default, logging applies to all PIO accesses, all interrupts, and all
  75        DMA accesses to and from areas mapped for both reading and writing. You
  76        can constrain logging by specifying additional acc_types, reg_number,
  77        offset and length. Logging will continue for count matching accesses,
  78        with an optional time limit of collect_time seconds.
  79 
  80 
  81        Either the -n or -P option must be provided. The other options are
  82        optional. If an option (other than -a) is specified multiple times,
  83        only the final value for the option is used. If an option is not
  84        specified, its associated value is set to an appropriate default, which
  85        will provide maximal error coverage as described below.
  86 
  87 OPTIONS
  88        The following options are available:
  89 
  90        -n name
  91 
  92            Specify the name of the driver to test. (String)
  93 
  94 
  95        -i instance
  96 
  97            Test only the specified driver instance (-1 matches all instances
  98            of driver).  (Numeric)
  99 
 100 
 101        -P path
 102 
 103            Specify the full device path of the driver to test. (String)
 104 
 105 
 106        -r reg_number
 107 
 108            Test only the given register set or DMA handle (-1 matches all
 109            register sets and DMA handles). (Numeric)
 110 
 111 
 112        -a acc_types
 113 
 114            Only the specified access types will be matched. Valid values for
 115            the acc_types argument are log, pio, pio_r, pio_w, dma, dma_r,
 116            dma_w and intr. Multiple access types, separated by spaces, can be
 117            specified. The default is to match all hardware accesses.
 118 
 119            If acc_types is set to log, logging will match all PIO accesses,
 120            interrupts and DMA accesses to and from areas mapped for both
 121            reading and writing. log can be combined with other acc_types, in
 122            which case the matching condition for logging will be restricted to
 123            the specified additional acc_types. Note that dma_r will match only
 124            DMA handles mapped for reading only; dma_w will match only DMA
 125            handles mapped for writing only; dma will match only DMA handles
 126            mapped for both reading and writing.
 127 
 128 
 129        -l offset [length]
 130 
 131            Constrain the range of qualifying accesses. The offset and length
 132            arguments indicate that any access of the type specified with the
 133            -a option, to the register set or DMA handle specified with the -r
 134            option, lie at least offset bytes into the register set or DMA
 135            handle and at most offset + length bytes into it. The default for
 136            offset is 0.  The default for length is the maximum value that can
 137            be placed in an offset_t C data type (see types.h). Negative values
 138            are converted into unsigned quantities. Thus, th_define -l 0 -1 is
 139            maximal.
 140 
 141 
 142        -c count[failcount]
 143 
 144            Wait for count number of matching accesses, then apply an operator
 145            and operand (see the -o option) to the next failcount number of
 146            matching accesses. If the access type (see the -a option) includes
 147            logging, the number of logged accesses is given by count +
 148            failcount - 1. The -1 is required because the last access coincides
 149            with the first faulting access.
 150 
 151            Note that access logging may be combined with error injection if
 152            failcount and operator are nonzero and if the access type includes
 153            logging and any of the other access types (pio, dma and intr) See
 154            the description of access types in the definition of the -a option,
 155            above.
 156 
 157            When the count and failcount fields reach zero, the status of the
 158            errdef is reported to standard output. When all active errdefs
 159            created by the th_define process complete, the process exits. If
 160            acc_types includes log, count determines how many accesses to log.
 161            If count is not specified, a default value is used. If failcount is
 162            set in this mode, it will simply increase the number of accesses
 163            logged by a further failcount - 1.
 164 
 165 
 166        -o operator [operand]
 167 
 168            For qualifying PIO read and write accesses, the value read from or
 169            written to the hardware is corrupted according to the value of
 170            operator:
 171 
 172            EQ
 173                   operand is returned to the driver.
 174 
 175 
 176            OR
 177                   operand is bitwise ORed with the real value.
 178 
 179 
 180            AND
 181                   operand is bitwise ANDed with the real value.
 182 
 183 
 184            XOR
 185                   operand is bitwise XORed with the real value.
 186 
 187            For PIO write accesses, the following operator is allowed:
 188 
 189            NO
 190                  Simply ignore the driver's attempt to write to the hardware.
 191 
 192            Note that a driver performs PIO via the ddi_getX(), ddi_putX(),
 193            ddi_rep_getX() and ddi_rep_putX() routines (where X is 8, 16, 32 or
 194            64).  Accesses made using ddi_getX() and ddi_putX() are treated as
 195            a single access, whereas an access made using the ddi_rep_*(9F)
 196            routines are broken down into their respective number of accesses,
 197            as given by the repcount parameter to these DDI calls. If the
 198            access is performed via a DMA handle, operator and value are
 199            applied to every access that comprises the DMA request. If
 200            interference with interrupts has been requested then the operator
 201            may take any of the following values:
 202 
 203            DELAY
 204                     After count accesses (see the -c option), delay delivery
 205                     of the next failcount number of interrupts for operand
 206                     number of microseconds.
 207 
 208 
 209            LOSE
 210                     After count number of interrupts, fail to deliver the next
 211                     failcount number of real interrupts to the driver.
 212 
 213 
 214            EXTRA
 215                     After count number of interrupts, start delivering operand
 216                     number of extra interrupts for the next failcount number
 217                     of real interrupts.
 218 
 219            The default value for operand and operator is to corrupt the data
 220            access by flipping each bit (XOR with -1).
 221 
 222 
 223        -f acc_chk
 224 
 225            If the acc_chk parameter is set to 1 or pio, then the driver's
 226            calls to ddi_check_acc_handle(9F) return DDI_FAILURE when the
 227            access count goes to 1. If the acc_chk parameter is set to 2 or
 228            dma, then the driver's calls to ddi_check_dma_handle(9F) return
 229            DDI_FAILURE when the access count goes to 1.
 230 
 231 
 232        -w max_wait_period [report_interval]
 233 
 234            Constrain the period for which an error definition will remain
 235            active. The option applies only to non-logging errdefs. If an error
 236            definition remains active for max_wait_period seconds, the test
 237            will be aborted. If report_interval is set to a nonzero value, the
 238            current status of the error definition is reported to standard
 239            output every report_interval seconds. The default value is zero.
 240            The status of the errdef is reported in parsable format (eight
 241            fields, each separated by a colon (:) character, the last of which
 242            is a string enclosed by double quotes and the remaining seven
 243            fields are integers):
 244 
 245            ft:mt:ac:fc:chk:ec:s:"message" which are defined as follows:
 246 
 247            ft
 248                         The UTC time when the fault was injected.
 249 
 250 
 251            mt
 252                         The UTC time when the driver reported the fault.
 253 
 254 
 255            ac
 256                         The number of remaining non-faulting accesses.
 257 
 258 
 259            fc
 260                         The number of remaining faulting accesses.
 261 
 262 
 263            chk
 264                         The value of the acc_chk field of the errdef.
 265 
 266 
 267            ec
 268                         The number of fault reports issued by the driver
 269                         against this errdef (mt holds the time of the initial
 270                         report).
 271 
 272 
 273            s
 274                         The severity level reported by the driver.
 275 
 276 
 277            "message"
 278                         Textual reason why the driver has reported a fault.
 279 
 280 
 281 
 282        -h
 283 
 284            Display the command usage string.
 285 
 286 
 287        -s collect_time
 288 
 289            If acc_types is given with the -a option and includes log, the
 290            errdef will log accesses for collect_time seconds (the default is
 291            to log until the log becomes full). Note that, if the errdef
 292            specification matches multiple driver handles, multiple logging
 293            errdefs are registered with the bofi driver and logging terminates
 294            when all logs become full or when collect_time expires or when the
 295            associated errdefs are cleared. The current state of the log can be
 296            checked with the th_manage(1M) command, using the broadcast
 297            parameter. A log can be terminated by running th_manage(1M) with
 298            the clear_errdefs option or by sending a SIGALRM signal to the
 299            th_define process. See alarm(2) for the semantics of SIGALRM.
 300 
 301 
 302        -p policy
 303 
 304            Applicable when the acc_types option includes log. The parameter
 305            modifies the policy used for converting from logged accesses to
 306            errdefs. All policies are inclusive:
 307 
 308                o      Use rare to bias error definitions toward rare accesses
 309                       (default).
 310 
 311                o      Use operator to produce a separate error definition for
 312                       each operator type (default).
 313 
 314                o      Use common to bias error definitions toward common
 315                       accesses.
 316 
 317                o      Use median to bias error definitions toward median
 318                       accesses.
 319 
 320                o      Use maximal to produce multiple error definitions for
 321                       duplicate accesses.
 322 
 323                o      Use unbiased to create unbiased error definitions.
 324 
 325                o      Use onebyte, twobyte, fourbyte, or eightbyte to select
 326                       errdefs corresponding to 1, 2, 4 or 8 byte accesses (if
 327                       chosen, the -xr option is enforced in order to ensure
 328                       that ddi_rep_*() calls are decomposed into multiple
 329                       single accesses).
 330 
 331                o      Use multibyte to create error definitions for multibyte
 332                       accesses performed using ddi_rep_get*() and
 333                       ddi_rep_put*().
 334            Policies can be combined by adding together these options. See the
 335            NOTES section for further information.
 336 
 337 
 338        -x flags
 339 
 340            Applicable when the acc_types option includes log. The flags
 341            parameter modifies the way in which the bofi driver logs accesses.
 342            It is specified as a string containing any combination of the
 343            following letters:
 344 
 345            w
 346                 Continuous logging (that is, the log will wrap when full).
 347 
 348 
 349            t
 350                 Timestamp each log entry (access times are in seconds).
 351 
 352 
 353            r
 354                 Log repeated I/O as individual accesses (for example, a
 355                 ddi_rep_get16(9F) call which has a repcount of N is logged N
 356                 times with each transaction logged as size 2 bytes. Without
 357                 this option, the default logging behavior is to log this
 358                 access once only, with a transaction size of twice the
 359                 repcount).
 360 
 361 
 362 
 363        -C comment_string
 364 
 365            Applicable when the acc_types option includes log. It provides a
 366            comment string to be placed in any generated test scripts. The
 367            string must be enclosed in double quotes.
 368 
 369 
 370        -e fixup_script [args]
 371 
 372            Applicable when the acc_types option includes log. The output of a
 373            logging errdefs is to generate a test script for each driver access
 374            handle. Use this option to embed a command in the resulting script
 375            before the errors are injected. The generated test scripts will
 376            take an instance offline and bring it back online before injecting
 377            errors in order to bring the instance into a known fault-free
 378            state. The executable fixup_script will be called twice with the
 379            set of optional args-- once just before the instance is taken
 380            offline and again after the instance has been brought online. The
 381            following variables are passed into the environment of the called
 382            executable:
 383 
 384            DRIVER_PATH
 385                                  Identifies the device path of the instance.
 386 
 387 
 388            DRIVER_INSTANCE
 389                                  Identifies the instance number of the device.
 390 
 391 
 392            DRIVER_UNCONFIGURE
 393                                  Has the value 1 when the instance is about to
 394                                  be taken offline.
 395 
 396 
 397            DRIVER_CONFIGURE
 398                                  Has the value 1 when the instance has just
 399                                  been brought online.
 400 
 401            Typically, the executable ensures that the device under test is in
 402            a suitable state to be taken offline (unconfigured) or in a
 403            suitable state for error injection (for example configured, error
 404            free and servicing a workload). A minimal script for a network
 405            driver could be:
 406 
 407              #!/bin/ksh
 408 
 409              driver=xyznetdriver
 410              ifnum=$driver$DRIVER_INSTANCE
 411 
 412              if [[ $DRIVER_CONFIGURE = 1 ]]; then
 413                   ifconfig $ifnum plumb
 414                   ifconfig $ifnum ...
 415                   ifworkload start $ifnum
 416              elif [[ $DRIVER_UNCONFIGURE = 1 ]]; then
 417                   ifworkload stop $ifnum
 418                   ifconfig $ifnum down
 419                   ifconfig $ifnum unplumb
 420              fi
 421              exit $?
 422 
 423 
 424            The -e option must be the last option on the command line.
 425 
 426 
 427 
 428        If the -a log option is selected but the -e option is not given, a
 429        default script is used. This script repeatedly attempts to detach and
 430        then re-attach the device instance under test.
 431 
 432 EXAMPLES
 433    Examples of Error Definitions
 434        th_define -n foo -i 1 -a log
 435 
 436 
 437        Logs all accesses to all handles used by instance 1 of the foo driver
 438        while running the default workload (attaching and detaching the
 439        instance). Then generates a set of test scripts to inject appropriate
 440        errdefs while running that default workload.
 441 
 442 
 443        th_define -n foo -i 1 -a log pio
 444 
 445 
 446        Logs PIO accesses to each PIO handle used by instance 1 of the foo
 447        driver while running the default workload (attaching and detaching the
 448        instance). Then generates a set of test scripts to inject appropriate
 449        errdefs while running that default workload.
 450 
 451 
 452        th_define -n foo -i 1 -p onebyte median -e fixup arg -now
 453 
 454 
 455        Logs all accesses to all handles used by instance 1 of the foo driver
 456        while running the workload defined in the fixup script fixup with
 457        arguments arg and -now. Then generates a set of test scripts to inject
 458        appropriate errdefs while running that workload. The resulting error
 459        definitions are requested to focus upon single byte accesses to
 460        locations that are accessed a median number of times with respect to
 461        frequency of access to I/O addresses.
 462 
 463 
 464        th_define -n se -l 0x20 1 -a pio_r -o OR 0x4 -c 10 1000
 465 
 466 
 467        Simulates a stuck serial chip command by forcing 1000 consecutive read
 468        accesses made by any instance of the se driver to its command status
 469        register, thereby returning status busy.
 470 
 471 
 472        th_define -n foo -i 3 -r 1 -a pio_r -c 0 1 -f 1 -o OR 0x100
 473 
 474 
 475        Causes 0x100 to be ORed into the next physical I/O read access from any
 476        register in register set 1 of instance 3 of the foo driver. Subsequent
 477        calls in the driver to ddi_check_acc_handle() return DDI_FAILURE.
 478 
 479 
 480        th_define -n foo -i 3 -r 1 -a pio_r -c 0 1 -o OR 0x0
 481 
 482 
 483        Causes 0x0 to be ORed into the next physical I/O read access from any
 484        register in register set 1 of instance 3 of the foo driver. This is of
 485        course a no-op.
 486 
 487 
 488        th_define -n foo -i 3 -r 1 -l 0x8100 1 -a pio_r -c 0 10 -o EQ 0x70003
 489 
 490 
 491        Causes the next ten next physical I/O reads from the register at offset
 492        0x8100 in register set 1 of instance 3 of the foo driver to return
 493        0x70003.
 494 
 495 
 496        th_define -n foo -i 3 -r 1 -l 0x8100 1 -a pio_w -c 100 3 -o AND
 497        0xffffffffffffefff
 498 
 499 
 500        The next 100 physical I/O writes to the register at offset 0x8100 in
 501        register set 1 of instance 3 of the foo driver take place as normal.
 502        However, on each of the three subsequent accesses, the 0x1000 bit will
 503        be cleared.
 504 
 505 
 506        th_define -n foo -i 3 -r 1 -l 0x8100 0x10 -a pio_r -c 0 1 -f 1 -o XOR 7
 507 
 508 
 509        Causes the bottom three bits to have their values toggled for the next
 510        physical I/O read access to registers with offsets in the range 0x8100
 511        to 0x8110 in register set 1 of instance 3 of the foo driver. Subsequent
 512        calls in the driver to ddi_check_acc_handle() return DDI_FAILURE.
 513 
 514 
 515        th_define -n foo -i 3 -a pio_w -c 0 1 -o NO 0
 516 
 517 
 518        Prevents the next physical I/O write access to any register in any
 519        register set of instance 3 of the foo driver from going out on the bus.
 520 
 521 
 522        th_define -n foo -i 3 -l 0 8192 -a dma_r -c 0 1 -o OR 7
 523 
 524 
 525        Causes 0x7 to be ORed into each long long in the first 8192 bytes of
 526        the next DMA read, using any DMA handle for instance 3 of the foo
 527        driver.
 528 
 529 
 530        th_define -n foo -i 3 -r 2 -l 0 8 -a dma_r -c 0 1 -o OR
 531        0x7070707070707070
 532 
 533 
 534        Causes 0x70 to be ORed into each byte of the first long long of the
 535        next DMA read, using the DMA handle with sequential allocation number 2
 536        for instance 3 of the foo driver.
 537 
 538 
 539        th_define -n foo -i 3 -l 256 256 -a dma_w -c 0 1 -f 2 -o OR 7
 540 
 541 
 542        Causes 0x7 to be ORed into each long long in the range from offset 256
 543        to offset 512 of the next DMA write, using any DMA handle for instance
 544        3 of the foo driver. Subsequent calls in the driver to
 545        ddi_check_dma_handle() return DDI_FAILURE.
 546 
 547 
 548        th_define -n foo -i 3 -r 0 -l 0 8 -a dma_w -c 100 3 -o AND
 549        0xffffffffffffefff
 550 
 551 
 552        The next 100 DMA writes using the DMA handle with sequential allocation
 553        number 0 for instance 3 of the foo driver take place as normal.
 554        However, on each of the three subsequent accesses, the 0x1000 bit will
 555        be cleared in the first long long of the transfer.
 556 
 557 
 558        th_define -n foo -i 3 -a intr -c 0 6 -o LOSE 0
 559 
 560 
 561        Causes the next six interrupts for instance 3 of the foo driver to be
 562        lost.
 563 
 564 
 565        th_define -n foo -i 3 -a intr -c 30 1 -o EXTRA 10
 566 
 567 
 568        When the thirty-first subsequent interrupt for instance 3 of the foo
 569        driver occurs, a further ten interrupts are also generated.
 570 
 571 
 572        th_define -n foo -i 3 -a intr -c 0 1 -o DELAY 1024
 573 
 574 
 575        Causes the next interrupt for instance 3 of the foo driver to be
 576        delayed by 1024 microseconds.
 577 
 578 NOTES
 579        The policy option in the th_define -p syntax determines how a set of
 580        logged accesses will be converted into the set of error definitions.
 581        Each logged access will be matched against the chosen policies to
 582        determine whether an error definition should be created based on the
 583        access.
 584 
 585 
 586        Any number of policy options can be combined to modify the generated
 587        error definitions.
 588 
 589    Bytewise Policies
 590        These select particular I/O transfer sizes. Specifying a byte policy
 591        will exclude other byte policies that have not been chosen. If none of
 592        the byte type policies is selected, all transfer sizes are treated
 593        equally. Otherwise, only those specified transfer sizes will be
 594        selected.
 595 
 596        onebyte
 597                     Create errdefs for one byte accesses (ddi_get8())
 598 
 599 
 600        twobyte
 601                     Create errdefs for two byte accesses (ddi_get16())
 602 
 603 
 604        fourbyte
 605                     Create errdefs for four byte accesses (ddi_get32())
 606 
 607 
 608        eightbyte
 609                     Create errdefs for eight byte accesses (ddi_get64())
 610 
 611 
 612        multibyte
 613                     Create errdefs for repeated byte accesses (ddi_rep_get*())
 614 
 615 
 616    Frequency of Access Policies
 617        The frequency of access to a location is determined according to the
 618        access type, location and transfer size (for example, a two-byte read
 619        access to address A is considered distinct from a four-byte read access
 620        to address A).  The algorithm is to count the number of accesses (of a
 621        given type and size) to a given location, and find the locations that
 622        were most and least accessed (let maxa and mina be the number of times
 623        these locations were accessed, and mean the total number of accesses
 624        divided by total number of locations that were accessed). Then a rare
 625        access is a location that was accessed less than
 626 
 627 
 628        (mean - mina) / 3 + mina
 629 
 630 
 631        times. Similarly for the definition of common accesses:
 632 
 633 
 634        maxa - (maxa - mean) / 3
 635 
 636 
 637        A location whose access patterns lies within these cutoffs is regarded
 638        as a location that is accessed with median frequency.
 639 
 640        rare
 641                  Create errdefs for locations that are rarely accessed.
 642 
 643 
 644        common
 645                  Create errdefs for locations that are commonly accessed.
 646 
 647 
 648        median
 649                  Create errdefs for locations that are accessed a median
 650                  frequency.
 651 
 652 
 653    Policies for Minimizing errdefs
 654        If a transaction is duplicated, either a single or multiple errdefs
 655        will be written to the test scripts, depending upon the following two
 656        policies:
 657 
 658        maximal
 659                     Create multiple errdefs for locations that are repeatedly
 660                     accessed.
 661 
 662 
 663        unbiased
 664                     Create a single errdef for locations that are repeatedly
 665                     accessed.
 666 
 667 
 668        operators
 669                     For each location, a default operator and operand is
 670                     typically applied. For maximal test coverage, this default
 671                     may be modified using the operators policy so that a
 672                     separate errdef is created for each of the possible
 673                     corruption operators.
 674 
 675 
 676 SEE ALSO
 677        kill(1), th_manage(1M), alarm(2), ddi_check_acc_handle(9F),
 678        ddi_check_dma_handle(9F)
 679 
 680 
 681 
 682                                  April 9, 2016                   TH_DEFINE(1M)