1 .\" Copyright (c) 2005 Sun Microsystems, Inc. All Rights Reserved. 2 .\" Copyright (c) 2017, Joyent, Inc. 3 .\" The contents of this file are subject to the terms of the 4 .\" Common Development and Distribution License (the "License"). 5 .\" You may not use this file except in compliance with the License. 6 .\" 7 .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 8 .\" or http://www.opensolaris.org/os/licensing. 9 .\" See the License for the specific language governing permissions 10 .\" and limitations under the License. 11 .\" 12 .\" When distributing Covered Code, include this CDDL HEADER in each 13 .\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. 14 .\" If applicable, add the following below this CDDL HEADER, with the 15 .\" fields enclosed by brackets "[]" replaced with your own identifying 16 .\" information: Portions Copyright [yyyy] [name of copyright owner] 17 .Dd February 17, 2020 18 .Dt MHD 7I 19 .Os 20 .Sh NAME 21 .Nm mhd 22 .Nd multihost disk control operations 23 .Sh SYNOPSIS 24 .In sys/mhd.h 25 .Sh DESCRIPTION 26 The 27 .Nm 28 .Xr ioctl 2 29 control access rights of a multihost disk, using 30 disk reservations on the disk device. 31 .Pp 32 The stability level of this interface (see 33 .Xr attributes 5 ) 34 is evolving. 35 As a result, the interface is subject to change and you should limit your use of 36 it. 37 .Pp 38 The mhd ioctls fall into two major categories: (1) ioctls for non-shared 39 multihost disks and (2) ioctls for shared multihost disks. 40 .Pp 41 One ioctl, 42 .Dv MHIOCENFAILFAST , 43 is applicable to both non-shared and shared multihost disks. 44 It is described after the first two categories. 45 .Pp 46 All the ioctls require root privilege. 47 .Pp 48 For all of the ioctls, the caller should obtain the file descriptor for the 49 device by calling 50 .Xr open 2 51 with the 52 .Dv O_NDELAY 53 flag; without the 54 .Dv O_NDELAY 55 flag, the open may fail due to another host already having a 56 conflicting reservation on the device. 57 Some of the ioctls below permit the caller to forcibly clear a conflicting 58 reservation held by another host, however, in order to call the ioctl, the 59 caller must first obtain the open file descriptor. 60 .Ss "Non-shared multihost disks" 61 Non-shared multihost disks ioctls consist of 62 .Dv MHIOCTKOWN , 63 .Dv MHIOCRELEASE , 64 .Dv MHIOCSTATUS , 65 and 66 .Dv MHIOCQRESERVE . 67 These ioctl requests control the access rights of non-shared multihost disks. 68 A non-shared multihost disk is one that supports serialized, mutually exclusive 69 I/O mastery by the connected hosts. 70 This is in contrast to the shared-disk model, in which 71 concurrent access is allowed from more than one host (see below). 72 .Pp 73 A non-shared multihost disk can be in one of two states: 74 .Bl -bullet -width indent 75 .It 76 Exclusive access state, where only one connected host has I/O access 77 .It 78 Non-exclusive access state, where all connected hosts have I/O access. 79 An external hardware reset can cause the disk to enter the non-exclusive access 80 state. 81 .El 82 .Pp 83 Each multihost disk driver views the machine on which it's running as the 84 .Dq local host ; 85 each views all other machines as 86 .Dq remote hosts . 87 For each I/O or ioctl request, the requesting host is the local host. 88 .Pp 89 Note that the non-shared ioctls are designed to work with SCSI-2 disks. 90 The 91 SCSI-2 RESERVE/RELEASE command set is the underlying hardware facility in the 92 device that supports the non-shared ioctls. 93 .Pp 94 The function prototypes for the non-shared ioctls are: 95 .Bd -literal -offset 2n 96 .Fn ioctl fd MHIOCTKOWN ; 97 .Fn ioctl fd MHIOCRELEASE ; 98 .Fn ioctl fd MHIOCSTATUS ; 99 .Fn ioctl fd MHIOCQRESERVE ; 100 .Ed 101 .Bl -tag -width MHIOCQRESERVE 102 .It Dv MHIOCTKOWN 103 Forcefully acquires exclusive access rights to the multihost disk for the local 104 host. 105 Revokes all access rights to the multihost disk from remote hosts. 106 Causes the disk to enter the exclusive access state. 107 .Pp 108 Implementation Note: Reservations (exclusive access rights) broken via random 109 resets should be reinstated by the driver upon their detection, for example, in 110 the automatic probe function described below. 111 .It Dv MHIOCRELEASE 112 Relinquishes exclusive access rights to the multihost disk for the local host. 113 On success, causes the disk to enter the non- exclusive access state. 114 .It Dv MHIOCSTATUS 115 Probes a multihost disk to determine whether the local host has access rights 116 to the disk. 117 Returns 118 .Sy 0 119 if the local host has access to the disk, 120 .Sy 1 121 if it doesn't, and 122 .Sy -1 123 with 124 .Va errno 125 set to 126 .Er EIO 127 if the probe failed for some other reason. 128 .It Dv MHIOCQRESERVE 129 Issues, simply and only, a SCSI-2 Reserve command. 130 If the attempt to reserve 131 fails due to the SCSI error Reservation Conflict (which implies that some other 132 host has the device reserved), then the ioctl will return 133 .Sy -1 134 with 135 .Va errno 136 set to 137 .Er EACCES . 138 The 139 .Dv MHIOCQRESERVE 140 ioctl does NOT issue a bus device 141 reset or bus reset prior to attempting the SCSI-2 reserve command. 142 It also 143 does not take care of re-instating reservations that disappear due to bus 144 resets or bus device resets; if that behavior is desired, then the caller can 145 call 146 .Dv MHIOCTKOWN 147 after the 148 .Dv MHIOCQRESERVE 149 has returned success. 150 If 151 the device does not support the SCSI-2 Reserve command, then the ioctl returns 152 .Er -1 153 with 154 .Va errno 155 set to 156 .Er ENOTSUP . 157 The 158 .Dv MHIOCQRESERVE 159 ioctl is intended to be used by high-availability or clustering software for a 160 .Dq quorum 161 disk, hence, the 162 .Dq Q 163 in the name of the ioctl. 164 .El 165 .Ss "Shared Multihost Disks" 166 Shared multihost disks ioctls control access to shared multihost disks. 167 The ioctls are merely a veneer on the SCSI-3 Persistent Reservation facility. 168 Therefore, the underlying semantic model is not described in detail here, see 169 instead the SCSI-3 standard. 170 The SCSI-3 Persistent Reservations support the 171 concept of a group of hosts all sharing access to a disk. 172 .Pp 173 The function prototypes and descriptions for the shared multihost ioctls are as 174 follows: 175 .Bl -tag -width 1n 176 .It Fn ioctl fd MHIOCGRP_INKEYS "(mhioc_inkeys_t *)k" 177 .Pp 178 Issues the SCSI-3 command Persistent Reserve In Read Keys to the device. 179 On input, the field 180 .Fa k->li 181 should be initialized by the caller with 182 .Fa k->li.listsize 183 reflecting how big of an array the caller has allocated for the 184 .Fa k->lilist 185 field and with 186 .Ql k->li.listlen\& ==\& 0 . 187 On return, the field 188 .Fa k->li.listlen 189 is updated to indicate the number of 190 reservation keys the device currently has: if this value is larger than 191 .Fa k->li.listsize 192 then that indicates that the caller should have passed a bigger 193 .Fa k->li.list 194 array with a bigger 195 .Fa k->li.listsize . 196 The number of array elements actually written by the callee into 197 .Fa k->li.list 198 is the minimum of 199 .Fa k->li.listlen 200 and 201 .Fa k->li.listsize . 202 The field 203 .Fa k->generation 204 is updated with the generation information returned by the SCSI-3 205 Read Keys query. 206 If the device does not support SCSI-3 Persistent Reservations, 207 then this ioctl returns 208 .Sy -1 209 with 210 .Va errno 211 set to 212 .Er ENOTSUP . 213 .It Fn ioctl fd MHIOCGRP_INRESV "(mhioc_inresvs_t *)r" 214 .Pp 215 Issues the SCSI-3 command Persistent Reserve In Read Reservations to the 216 device. 217 Remarks similar to 218 .Dv MHIOCGRP_INKEYS 219 apply to the array manipulation. 220 If the device does not support SCSI-3 Persistent Reservations, 221 then this ioctl returns 222 .Sy -1 223 with 224 .Va errno 225 set to 226 .Er ENOTSUP . 227 .It Fn ioctl fd MHIOCGRP_REGISTER "(mhioc_register_t *)r" 228 .Pp 229 Issues the SCSI-3 command Persistent Reserve Out Register. 230 The fields of structure 231 .Va r 232 are all inputs; none of the fields are modified by the ioctl. 233 The field 234 .Fa r->aptpl 235 should be set to true to specify that registrations 236 and reservations should persist across device power failures, or to false to 237 specify that registrations and reservations should be cleared upon device power 238 failure; true is the recommended setting. 239 The field 240 .Fa r->oldkey 241 is the key that the caller believes the device may already have for this host 242 initiator; if the caller believes that that this host initiator is not already 243 registered with this device, it should pass the special key of all zeros. 244 To achieve the effect of unregistering with the device, the caller should pass 245 its current key for the 246 .Fa r->oldkey 247 field and an 248 .Fa r->newkey 249 field containing the special key of all zeros. 250 If the device returns the SCSI error code 251 Reservation Conflict, this ioctl returns 252 .Sy -1 253 with 254 .Va errno 255 set to 256 .Er EACCES . 257 .It Fn ioctl fd MHIOCGRP_RESERVE "(mhioc_resv_desc_t *)r" 258 .Pp 259 Issues the SCSI-3 command Persistent Reserve Out Reserve. 260 The fields of 261 structure 262 .Va r 263 are all inputs; none of the fields are modified by the ioctl. 264 If the device returns the SCSI error code Reservation Conflict, this ioctl 265 returns 266 .Sy -1 267 with 268 .Va errno 269 set to 270 .Er EACCES . 271 .It Fn ioctl fd MHIOCGRP_PREEMPTANDABORT "(mhioc_preemptandabort_t *)r" 272 .Pp 273 Issues the SCSI-3 command Persistent Reserve Out Preempt-And-Abort. 274 The fields 275 of structure 276 .Va r 277 are all inputs; none of the fields are modified by the ioctl. 278 The key of the victim host is specified by the field 279 .Fa r->victim_key . 280 The field 281 .Fa r->resvdesc 282 supplies the preempter's key and the reservation that it is requesting as part 283 of the SCSI-3 Preempt-And-Abort command. 284 If the device returns the SCSI error code 285 Reservation Conflict, this ioctl returns 286 .Sy -1 287 with 288 .Va errno 289 set to 290 .Er EACCES . 291 .It Fn ioctl fd MHIOCGRP_PREEMPT "(mhioc_preemptandabort_t *)r" 292 .Pp 293 Similar to 294 .Dv MHIOCGRP_PREEMPTANDABORT , 295 but instead issues the SCSI-3 command Persistent Reserve Out Preempt. 296 (Note: This command is not implemented). 297 .It Fn ioctl fd MHIOCGRP_CLEAR "(mhioc_resv_key_t *)r" 298 Issues the SCSI-3 command Persistent Reserve Out Clear. 299 The input parameter 300 .Va r 301 is the reservation key of the caller, which should have been already 302 registered with the device, by an earlier call to 303 .Dv MHIOCGRP_REGISTER . 304 .El 305 .Pp 306 For each device, the non-shared ioctls should not be mixed with the Persistent 307 Reserve Out shared ioctls, and vice-versa, otherwise, the underlying device is 308 likely to return errors, because SCSI does not permit SCSI-2 reservations to be 309 mixed with SCSI-3 reservations on a single device. 310 It is, however, legitimate 311 to call the Persistent Reserve In ioctls, because these are query only. 312 Issuing the 313 .Dv MHIOCGRP_INKEYS 314 ioctl is the recommended way for a caller to 315 determine if the device supports SCSI-3 Persistent Reservations (the ioctl 316 will return 317 .Sy -1 318 with 319 .Va errno 320 set to 321 .Er ENOTSUP 322 if the device does not). 323 .Ss "MHIOCENFAILFAST Ioctl" 324 The 325 .Dv MHIOCENFAILFAST 326 ioctl is applicable for both non-shared and shared 327 disks, and may be used with either the non-shared or shared ioctls. 328 .Bl -tag -width 1n 329 .It Fn ioctl fd MHIOENFAILFAST "(unsigned int *)millisecs" 330 .Pp 331 Enables or disables the failfast option in the multihost disk driver and 332 enables or disables automatic probing of a multihost disk, described below. 333 The argument is an unsigned integer specifying the number of milliseconds to 334 wait between executions of the automatic probe function. 335 An argument of zero disables the failfast option and disables automatic probing. 336 If the 337 .Dv MHIOCENFAILFAST 338 ioctl is never called, the effect is defined to be that 339 both the failfast option and automatic probing are disabled. 340 .El 341 .Ss "Automatic Probing" 342 The 343 .Dv MHIOCENFAILFAST 344 ioctl sets up a timeout in the driver to periodically 345 schedule automatic probes of the disk. 346 The automatic probe function works in this manner: The driver is scheduled to 347 probe the multihost disk every n milliseconds, rounded up to the next integral 348 multiple of the system clock's resolution. 349 If 350 .Bl -enum -offset indent 351 .It 352 the local host no longer has access rights to the multihost disk, and 353 .It 354 access rights were expected to be held by the local host, 355 .El 356 .Pp 357 the driver immediately panics the machine to comply with the failfast model. 358 .Pp 359 If the driver makes this discovery outside the timeout function, especially 360 during a read or write operation, it is imperative that it panic the system 361 then as well. 362 .Sh RETURN VALUES 363 Each request returns 364 .Sy -1 365 on failure and sets 366 .Va errno 367 to indicate the error. 368 .Bl -tag -width Er 369 .It Er EPERM 370 Caller is not root. 371 .It Er EACCES 372 Access rights were denied. 373 .It Er EIO 374 The multihost disk or controller was unable to successfully complete the 375 requested operation. 376 .It Er EOPNOTSUP 377 The multihost disk does not support the operation. 378 For example, it does not support the SCSI-2 Reserve/Release command set, or the 379 SCSI-3 Persistent Reservation command set. 380 .El 381 .Sh STABILITY 382 Uncommitted 383 .Sh SEE ALSO 384 .Xr ioctl 2 , 385 .Xr open 2 , 386 .Xr attributes 5