1 MHD(7I)                         Ioctl Requests                         MHD(7I)
   2 
   3 NAME
   4      mhd - multihost disk control operations
   5 
   6 SYNOPSIS
   7      #include <sys/mhd.h>
   8 
   9 DESCRIPTION
  10      The mhd ioctl(2) control access rights of a multihost disk, using disk
  11      reservations on the disk device.
  12 
  13      The stability level of this interface (see attributes(5)) is evolving.
  14      As a result, the interface is subject to change and you should limit your
  15      use of it.
  16 
  17      The mhd ioctls fall into two major categories: (1) ioctls for non-shared
  18      multihost disks and (2) ioctls for shared multihost disks.
  19 
  20      One ioctl, MHIOCENFAILFAST, is applicable to both non-shared and shared
  21      multihost disks.  It is described after the first two categories.
  22 
  23      All the ioctls require root privilege.
  24 
  25      For all of the ioctls, the caller should obtain the file descriptor for
  26      the device by calling open(2) with the O_NDELAY flag; without the
  27      O_NDELAY flag, the open may fail due to another host already having a
  28      conflicting reservation on the device.  Some of the ioctls below permit
  29      the caller to forcibly clear a conflicting reservation held by another
  30      host, however, in order to call the ioctl, the caller must first obtain
  31      the open file descriptor.
  32 
  33    Non-shared multihost disks
  34      Non-shared multihost disks ioctls consist of MHIOCTKOWN, MHIOCRELEASE,
  35      MHIOCSTATUS, and MHIOCQRESERVE.  These ioctl requests control the access
  36      rights of non-shared multihost disks.  A non-shared multihost disk is one
  37      that supports serialized, mutually exclusive I/O mastery by the connected
  38      hosts.  This is in contrast to the shared-disk model, in which concurrent
  39      access is allowed from more than one host (see below).
  40 
  41      A non-shared multihost disk can be in one of two states:
  42 
  43      o       Exclusive access state, where only one connected host has I/O
  44              access
  45 
  46      o       Non-exclusive access state, where all connected hosts have I/O
  47              access.  An external hardware reset can cause the disk to enter
  48              the non-exclusive access state.
  49 
  50      Each multihost disk driver views the machine on which it's running as the
  51      "local host"; each views all other machines as "remote hosts".  For each
  52      I/O or ioctl request, the requesting host is the local host.
  53 
  54      Note that the non-shared ioctls are designed to work with SCSI-2 disks.
  55      The SCSI-2 RESERVE/RELEASE command set is the underlying hardware
  56      facility in the device that supports the non-shared ioctls.
  57 
  58      The function prototypes for the non-shared ioctls are:
  59 
  60        ioctl(fd, MHIOCTKOWN);
  61        ioctl(fd, MHIOCRELEASE);
  62        ioctl(fd, MHIOCSTATUS);
  63        ioctl(fd, MHIOCQRESERVE);
  64 
  65      MHIOCTKOWN     Forcefully acquires exclusive access rights to the
  66                     multihost disk for the local host.  Revokes all access
  67                     rights to the multihost disk from remote hosts.  Causes
  68                     the disk to enter the exclusive access state.
  69 
  70                     Implementation Note: Reservations (exclusive access
  71                     rights) broken via random resets should be reinstated by
  72                     the driver upon their detection, for example, in the
  73                     automatic probe function described below.
  74 
  75      MHIOCRELEASE   Relinquishes exclusive access rights to the multihost disk
  76                     for the local host.  On success, causes the disk to enter
  77                     the non- exclusive access state.
  78 
  79      MHIOCSTATUS    Probes a multihost disk to determine whether the local
  80                     host has access rights to the disk.  Returns 0 if the
  81                     local host has access to the disk, 1 if it doesn't, and -1
  82                     with errno set to EIO if the probe failed for some other
  83                     reason.
  84 
  85      MHIOCQRESERVE  Issues, simply and only, a SCSI-2 Reserve command.  If the
  86                     attempt to reserve fails due to the SCSI error Reservation
  87                     Conflict (which implies that some other host has the
  88                     device reserved), then the ioctl will return -1 with errno
  89                     set to EACCES.  The MHIOCQRESERVE ioctl does NOT issue a
  90                     bus device reset or bus reset prior to attempting the
  91                     SCSI-2 reserve command.  It also does not take care of re-
  92                     instating reservations that disappear due to bus resets or
  93                     bus device resets; if that behavior is desired, then the
  94                     caller can call MHIOCTKOWN after the MHIOCQRESERVE has
  95                     returned success.  If the device does not support the
  96                     SCSI-2 Reserve command, then the ioctl returns -1 with
  97                     errno set to ENOTSUP.  The MHIOCQRESERVE ioctl is intended
  98                     to be used by high-availability or clustering software for
  99                     a "quorum" disk, hence, the "Q" in the name of the ioctl.
 100 
 101    Shared Multihost Disks
 102      Shared multihost disks ioctls control access to shared multihost disks.
 103      The ioctls are merely a veneer on the SCSI-3 Persistent Reservation
 104      facility.  Therefore, the underlying semantic model is not described in
 105      detail here, see instead the SCSI-3 standard.  The SCSI-3 Persistent
 106      Reservations support the concept of a group of hosts all sharing access
 107      to a disk.
 108 
 109      The function prototypes and descriptions for the shared multihost ioctls
 110      are as follows:
 111 
 112      ioctl(fd, MHIOCGRP_INKEYS, (mhioc_inkeys_t *)k)
 113 
 114         Issues the SCSI-3 command Persistent Reserve In Read Keys to the
 115         device.  On input, the field k->li should be initialized by the      caller
 116         with k->li.listsize reflecting how big of an array the caller has
 117         allocated for the k->lilist field and with `k->li.listlen == 0'.  On
 118         return, the field k->li.listlen      is updated to indicate the number of
 119         reservation keys the device currently has: if this value is larger
 120         than k->li.listsize then that indicates      that the caller should have
 121         passed a bigger k->li.list array with a      bigger k->li.listsize.       The
 122         number of array elements actually written by the callee into
 123         k->li.list is the minimum of k->li.listlen and k->li.listsize. The
 124         field k->generation is updated with the      generation information
 125         returned by the SCSI-3 Read Keys query.  If the device does not
 126         support SCSI-3 Persistent Reservations, then this ioctl returns -1
 127         with errno set to ENOTSUP.
 128 
 129      ioctl(fd, MHIOCGRP_INRESV, (mhioc_inresvs_t *)r)
 130 
 131         Issues the SCSI-3 command Persistent Reserve In Read Reservations to
 132         the device.  Remarks similar to MHIOCGRP_INKEYS apply to the array
 133         manipulation.  If the device does not support SCSI-3 Persistent
 134         Reservations, then this ioctl returns -1 with errno set to ENOTSUP.
 135 
 136      ioctl(fd, MHIOCGRP_REGISTER, (mhioc_register_t *)r)
 137 
 138         Issues the SCSI-3 command Persistent Reserve Out Register.  The fields
 139         of structure r are all inputs; none of the fields are modified by the
 140         ioctl.  The field r->aptpl should be set to true to specify that
 141         registrations and reservations should persist across device power
 142         failures, or to false to specify that registrations and reservations
 143         should be cleared upon device power failure; true is the recommended
 144         setting.  The field r->oldkey is the key that the caller believes the
 145         device may already have for this host initiator; if the caller
 146         believes that that this host initiator is not already registered with
 147         this device, it should pass the special key of all zeros.  To achieve
 148         the effect of unregistering with the device, the caller should pass
 149         its current key for the r->oldkey field      and an r->newkey field
 150         containing the special key of all zeros.  If the device returns the
 151         SCSI error code Reservation Conflict, this ioctl returns -1 with errno
 152         set to EACCES.
 153 
 154      ioctl(fd, MHIOCGRP_RESERVE, (mhioc_resv_desc_t *)r)
 155 
 156         Issues the SCSI-3 command Persistent Reserve Out Reserve.  The fields
 157         of structure r are all inputs; none of the fields are modified by the
 158         ioctl.  If the device returns the SCSI error code Reservation
 159         Conflict, this ioctl returns -1 with errno set to EACCES.
 160 
 161      ioctl(fd, MHIOCGRP_PREEMPTANDABORT, (mhioc_preemptandabort_t *)r)
 162 
 163         Issues the SCSI-3 command Persistent Reserve Out Preempt-And-Abort.
 164         The fields of structure r are all inputs; none of the fields are
 165         modified by the ioctl.  The key of the victim host is specified by the
 166         field r->victim_key.  The field      r->resvdesc supplies the preempter's
 167         key and the reservation that it is requesting as part of the SCSI-3
 168         Preempt-And-Abort command.  If the device returns the SCSI error code
 169         Reservation Conflict, this ioctl returns -1 with errno set to EACCES.
 170 
 171      ioctl(fd, MHIOCGRP_PREEMPT, (mhioc_preemptandabort_t *)r)
 172 
 173         Similar to MHIOCGRP_PREEMPTANDABORT, but instead issues the SCSI-3
 174         command Persistent Reserve Out Preempt.  (Note: This command is not
 175         implemented).
 176 
 177      ioctl(fd, MHIOCGRP_CLEAR, (mhioc_resv_key_t *)r)
 178         Issues the SCSI-3 command Persistent Reserve Out Clear.  The input
 179         parameter r is the reservation key of the caller, which should have
 180         been already registered with the device, by an earlier call to
 181         MHIOCGRP_REGISTER.
 182 
 183      For each device, the non-shared ioctls should not be mixed with the
 184      Persistent Reserve Out shared ioctls, and vice-versa,  otherwise, the
 185      underlying device is likely to return errors, because SCSI does not
 186      permit SCSI-2 reservations to be mixed with SCSI-3 reservations on a
 187      single device.  It is, however, legitimate to call the Persistent Reserve
 188      In ioctls, because these are query only.  Issuing the MHIOCGRP_INKEYS
 189      ioctl is the recommended way for a caller to determine if the device
 190      supports SCSI-3 Persistent Reservations (the ioctl will return -1 with
 191      errno set to ENOTSUP if the device does not).
 192 
 193    MHIOCENFAILFAST Ioctl
 194      The MHIOCENFAILFAST ioctl is applicable for both non-shared and shared
 195      disks, and may be used with either the non-shared or shared ioctls.
 196 
 197      ioctl(fd, MHIOENFAILFAST, (unsigned int *)millisecs)
 198 
 199         Enables or disables the failfast option in the multihost disk driver
 200         and enables or disables automatic probing of a multihost disk,
 201         described below.  The argument is an unsigned integer specifying the
 202         number of milliseconds to wait between executions of the automatic
 203         probe function.  An argument of zero disables the failfast option and
 204         disables automatic probing.  If the MHIOCENFAILFAST ioctl is never
 205         called, the effect is defined to be that both the failfast option and
 206         automatic probing are disabled.
 207 
 208    Automatic Probing
 209      The MHIOCENFAILFAST ioctl sets up a timeout in the driver to periodically
 210      schedule automatic probes of the disk.  The automatic probe function
 211      works in this manner: The driver is scheduled to probe the multihost disk
 212      every n milliseconds, rounded up to the next integral multiple of the
 213      system clock's resolution.  If
 214 
 215            1.   the local host no longer has access rights to the multihost
 216                 disk, and
 217 
 218            2.   access rights were expected to be held by the local host,
 219 
 220      the driver immediately panics the machine to comply with the failfast
 221      model.
 222 
 223      If the driver makes this discovery outside the timeout function,
 224      especially during a read or write operation, it is imperative that it
 225      panic the system then as well.
 226 
 227 RETURN VALUES
 228      Each request returns -1 on failure and sets errno to indicate the error.
 229 
 230      EPERM              Caller is not root.
 231 
 232      EACCES             Access rights were denied.
 233 
 234      EIO                The multihost disk or controller was unable to
 235                         successfully complete the requested operation.
 236 
 237      EOPNOTSUP          The multihost disk does not support the operation.
 238                         For example, it does not support the SCSI-2
 239                         Reserve/Release command set, or the SCSI-3 Persistent
 240                         Reservation command set.
 241 
 242 STABILITY
 243      Uncommitted
 244 
 245 SEE ALSO
 246      ioctl(2), open(2), attributes(5)
 247 
 248 illumos                        February 17, 2020                       illumos