Print this page
9059 Simplify SMAP relocations with krtld
Portions contributed by: John Levon <john.levon@joyent.com>
Split |
Close |
Expand all |
Collapse all |
--- old/usr/src/uts/intel/ia32/ml/copy.s
+++ new/usr/src/uts/intel/ia32/ml/copy.s
1 1 /*
2 2 * CDDL HEADER START
3 3 *
4 4 * The contents of this file are subject to the terms of the
5 5 * Common Development and Distribution License (the "License").
6 6 * You may not use this file except in compliance with the License.
7 7 *
8 8 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
9 9 * or http://www.opensolaris.org/os/licensing.
10 10 * See the License for the specific language governing permissions
11 11 * and limitations under the License.
12 12 *
13 13 * When distributing Covered Code, include this CDDL HEADER in each
14 14 * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
15 15 * If applicable, add the following below this CDDL HEADER, with the
16 16 * fields enclosed by brackets "[]" replaced with your own identifying
17 17 * information: Portions Copyright [yyyy] [name of copyright owner]
18 18 *
19 19 * CDDL HEADER END
20 20 */
21 21 /*
22 22 * Copyright 2009 Sun Microsystems, Inc. All rights reserved.
23 23 * Use is subject to license terms.
24 24 */
25 25
26 26 /*
27 27 * Copyright (c) 2009, Intel Corporation
28 28 * All rights reserved.
↓ open down ↓ |
28 lines elided |
↑ open up ↑ |
29 29 */
30 30
31 31 /* Copyright (c) 1990, 1991 UNIX System Laboratories, Inc. */
32 32 /* Copyright (c) 1984, 1986, 1987, 1988, 1989, 1990 AT&T */
33 33 /* All Rights Reserved */
34 34
35 35 /* Copyright (c) 1987, 1988 Microsoft Corporation */
36 36 /* All Rights Reserved */
37 37
38 38 /*
39 - * Copyright 2019 Joyent, Inc.
39 + * Copyright 2020 Joyent, Inc.
40 40 */
41 41
42 42 #include <sys/errno.h>
43 43 #include <sys/asm_linkage.h>
44 44
45 45 #include "assym.h"
46 46
47 47 #define KCOPY_MIN_SIZE 128 /* Must be >= 16 bytes */
48 48 #define XCOPY_MIN_SIZE 128 /* Must be >= 16 bytes */
49 49 /*
50 50 * Non-temopral access (NTA) alignment requirement
51 51 */
52 52 #define NTA_ALIGN_SIZE 4 /* Must be at least 4-byte aligned */
53 53 #define NTA_ALIGN_MASK _CONST(NTA_ALIGN_SIZE-1)
54 54 #define COUNT_ALIGN_SIZE 16 /* Must be at least 16-byte aligned */
55 55 #define COUNT_ALIGN_MASK _CONST(COUNT_ALIGN_SIZE-1)
56 56
57 57 /*
58 58 * With the introduction of Broadwell, Intel has introduced supervisor mode
↓ open down ↓ |
9 lines elided |
↑ open up ↑ |
59 59 * access protection -- SMAP. SMAP forces the kernel to set certain bits to
60 60 * enable access of user pages (AC in rflags, defines as PS_ACHK in
61 61 * <sys/psw.h>). One of the challenges is that the implementation of many of the
62 62 * userland copy routines directly use the kernel ones. For example, copyin and
63 63 * copyout simply go and jump to the do_copy_fault label and traditionally let
64 64 * those deal with the return for them. In fact, changing that is a can of frame
65 65 * pointers.
66 66 *
67 67 * Rules and Constraints:
68 68 *
69 - * 1. For anything that's not in copy.s, we have it do explicit calls to the
70 - * smap related code. It usually is in a position where it is able to. This is
71 - * restricted to the following three places: DTrace, resume() in swtch.s and
72 - * on_fault/no_fault. If you want to add it somewhere else, we should be
73 - * thinking twice.
69 + * 1. For anything that's not in copy.s, we have it do explicit smap_disable()
70 + * or smap_enable() calls. This is restricted to the following three places:
71 + * DTrace, resume() in swtch.s and on_fault/no_fault. If you want to add it
72 + * somewhere else, we should be thinking twice.
74 73 *
75 74 * 2. We try to toggle this at the smallest window possible. This means that if
76 75 * we take a fault, need to try to use a copyop in copyin() or copyout(), or any
77 76 * other function, we will always leave with SMAP enabled (the kernel cannot
78 77 * access user pages).
79 78 *
80 79 * 3. None of the *_noerr() or ucopy/uzero routines should toggle SMAP. They are
81 80 * explicitly only allowed to be called while in an on_fault()/no_fault() handler,
82 81 * which already takes care of ensuring that SMAP is enabled and disabled. Note
83 82 * this means that when under an on_fault()/no_fault() handler, one must not
84 - * call the non-*_noeer() routines.
83 + * call the non-*_noerr() routines.
85 84 *
86 85 * 4. The first thing we should do after coming out of an lofault handler is to
87 - * make sure that we call smap_enable again to ensure that we are safely
86 + * make sure that we call smap_enable() again to ensure that we are safely
88 87 * protected, as more often than not, we will have disabled smap to get there.
89 88 *
90 - * 5. The SMAP functions, smap_enable and smap_disable may not touch any
91 - * registers beyond those done by the call and ret. These routines may be called
92 - * from arbitrary contexts in copy.s where we have slightly more special ABIs in
93 - * place.
89 + * 5. smap_enable() and smap_disable() don't exist: calls to these functions
90 + * generate runtime relocations, that are then processed into the necessary
91 + * clac/stac, via the krtld hotinlines mechanism and hotinline_smap().
94 92 *
95 93 * 6. For any inline user of SMAP, the appropriate SMAP_ENABLE_INSTR and
96 - * SMAP_DISABLE_INSTR macro should be used (except for smap_enable() and
97 - * smap_disable()). If the number of these is changed, you must update the
98 - * constants SMAP_ENABLE_COUNT and SMAP_DISABLE_COUNT below.
94 + * SMAP_DISABLE_INSTR macro should be used. If the number of these is changed,
95 + * you must update the constants SMAP_ENABLE_COUNT and SMAP_DISABLE_COUNT below.
99 96 *
100 - * 7. Note, at this time SMAP is not implemented for the 32-bit kernel. There is
101 - * no known technical reason preventing it from being enabled.
102 - *
103 - * 8. Generally this .s file is processed by a K&R style cpp. This means that it
97 + * 7. Generally this .s file is processed by a K&R style cpp. This means that it
104 98 * really has a lot of feelings about whitespace. In particular, if you have a
105 99 * macro FOO with the arguments FOO(1, 3), the second argument is in fact ' 3'.
106 100 *
107 - * 9. The smap_enable and smap_disable functions should not generally be called.
108 - * They exist such that DTrace and on_trap() may use them, that's it.
109 - *
110 - * 10. In general, the kernel has its own value for rflags that gets used. This
101 + * 8. In general, the kernel has its own value for rflags that gets used. This
111 102 * is maintained in a few different places which vary based on how the thread
112 103 * comes into existence and whether it's a user thread. In general, when the
113 104 * kernel takes a trap, it always will set ourselves to a known set of flags,
114 105 * mainly as part of ENABLE_INTR_FLAGS and F_OFF and F_ON. These ensure that
115 106 * PS_ACHK is cleared for us. In addition, when using the sysenter instruction,
116 107 * we mask off PS_ACHK off via the AMD_SFMASK MSR. See init_cpu_syscall() for
117 108 * where that gets masked off.
118 109 */
119 110
120 111 /*
121 112 * The optimal 64-bit bcopy and kcopy for modern x86 processors uses
122 113 * "rep smovq" for large sizes. Performance data shows that many calls to
123 114 * bcopy/kcopy/bzero/kzero operate on small buffers. For best performance for
124 115 * these small sizes unrolled code is used. For medium sizes loops writing
125 116 * 64-bytes per loop are used. Transition points were determined experimentally.
126 117 */
127 118 #define BZERO_USE_REP (1024)
128 119 #define BCOPY_DFLT_REP (128)
129 120 #define BCOPY_NHM_REP (768)
130 121
131 122 /*
132 123 * Copy a block of storage, returning an error code if `from' or
133 124 * `to' takes a kernel pagefault which cannot be resolved.
134 125 * Returns errno value on pagefault error, 0 if all ok
135 126 */
136 127
137 128 /*
138 129 * I'm sorry about these macros, but copy.s is unsurprisingly sensitive to
139 130 * additional call instructions.
140 131 */
141 132 #define SMAP_DISABLE_COUNT 16
142 133 #define SMAP_ENABLE_COUNT 26
143 134
144 135 #define SMAP_DISABLE_INSTR(ITER) \
145 136 .globl _smap_disable_patch_/**/ITER; \
146 137 _smap_disable_patch_/**/ITER/**/:; \
147 138 nop; nop; nop;
148 139
149 140 #define SMAP_ENABLE_INSTR(ITER) \
150 141 .globl _smap_enable_patch_/**/ITER; \
151 142 _smap_enable_patch_/**/ITER/**/:; \
152 143 nop; nop; nop;
153 144
154 145 .globl kernelbase
155 146 .globl postbootkernelbase
156 147
157 148 ENTRY(kcopy)
158 149 pushq %rbp
159 150 movq %rsp, %rbp
160 151 #ifdef DEBUG
161 152 cmpq postbootkernelbase(%rip), %rdi /* %rdi = from */
162 153 jb 0f
163 154 cmpq postbootkernelbase(%rip), %rsi /* %rsi = to */
164 155 jnb 1f
165 156 0: leaq .kcopy_panic_msg(%rip), %rdi
166 157 xorl %eax, %eax
167 158 call panic
168 159 1:
169 160 #endif
170 161 /*
171 162 * pass lofault value as 4th argument to do_copy_fault
172 163 */
173 164 leaq _kcopy_copyerr(%rip), %rcx
174 165 movq %gs:CPU_THREAD, %r9 /* %r9 = thread addr */
175 166
176 167 do_copy_fault:
177 168 movq T_LOFAULT(%r9), %r11 /* save the current lofault */
178 169 movq %rcx, T_LOFAULT(%r9) /* new lofault */
179 170 call bcopy_altentry
180 171 xorl %eax, %eax /* return 0 (success) */
181 172 SMAP_ENABLE_INSTR(0)
182 173
183 174 /*
184 175 * A fault during do_copy_fault is indicated through an errno value
185 176 * in %rax and we iretq from the trap handler to here.
186 177 */
187 178 _kcopy_copyerr:
188 179 movq %r11, T_LOFAULT(%r9) /* restore original lofault */
189 180 leave
190 181 ret
191 182 SET_SIZE(kcopy)
192 183
193 184 #undef ARG_FROM
194 185 #undef ARG_TO
195 186 #undef ARG_COUNT
196 187
197 188 #define COPY_LOOP_INIT(src, dst, cnt) \
198 189 addq cnt, src; \
199 190 addq cnt, dst; \
200 191 shrq $3, cnt; \
201 192 neg cnt
202 193
203 194 /* Copy 16 bytes per loop. Uses %rax and %r8 */
204 195 #define COPY_LOOP_BODY(src, dst, cnt) \
205 196 prefetchnta 0x100(src, cnt, 8); \
206 197 movq (src, cnt, 8), %rax; \
207 198 movq 0x8(src, cnt, 8), %r8; \
208 199 movnti %rax, (dst, cnt, 8); \
209 200 movnti %r8, 0x8(dst, cnt, 8); \
210 201 addq $2, cnt
211 202
212 203 ENTRY(kcopy_nta)
213 204 pushq %rbp
214 205 movq %rsp, %rbp
215 206 #ifdef DEBUG
216 207 cmpq postbootkernelbase(%rip), %rdi /* %rdi = from */
217 208 jb 0f
218 209 cmpq postbootkernelbase(%rip), %rsi /* %rsi = to */
219 210 jnb 1f
220 211 0: leaq .kcopy_panic_msg(%rip), %rdi
221 212 xorl %eax, %eax
222 213 call panic
223 214 1:
224 215 #endif
225 216
226 217 movq %gs:CPU_THREAD, %r9
227 218 cmpq $0, %rcx /* No non-temporal access? */
228 219 /*
229 220 * pass lofault value as 4th argument to do_copy_fault
230 221 */
231 222 leaq _kcopy_nta_copyerr(%rip), %rcx /* doesn't set rflags */
232 223 jnz do_copy_fault /* use regular access */
233 224 /*
234 225 * Make sure cnt is >= KCOPY_MIN_SIZE
235 226 */
236 227 cmpq $KCOPY_MIN_SIZE, %rdx
237 228 jb do_copy_fault
238 229
239 230 /*
240 231 * Make sure src and dst are NTA_ALIGN_SIZE aligned,
241 232 * count is COUNT_ALIGN_SIZE aligned.
242 233 */
243 234 movq %rdi, %r10
244 235 orq %rsi, %r10
245 236 andq $NTA_ALIGN_MASK, %r10
246 237 orq %rdx, %r10
247 238 andq $COUNT_ALIGN_MASK, %r10
248 239 jnz do_copy_fault
249 240
250 241 ALTENTRY(do_copy_fault_nta)
251 242 movq %gs:CPU_THREAD, %r9 /* %r9 = thread addr */
252 243 movq T_LOFAULT(%r9), %r11 /* save the current lofault */
253 244 movq %rcx, T_LOFAULT(%r9) /* new lofault */
254 245
255 246 /*
256 247 * COPY_LOOP_BODY uses %rax and %r8
257 248 */
258 249 COPY_LOOP_INIT(%rdi, %rsi, %rdx)
259 250 2: COPY_LOOP_BODY(%rdi, %rsi, %rdx)
260 251 jnz 2b
261 252
262 253 mfence
263 254 xorl %eax, %eax /* return 0 (success) */
264 255 SMAP_ENABLE_INSTR(1)
265 256
266 257 _kcopy_nta_copyerr:
267 258 movq %r11, T_LOFAULT(%r9) /* restore original lofault */
268 259 leave
269 260 ret
270 261 SET_SIZE(do_copy_fault_nta)
271 262 SET_SIZE(kcopy_nta)
272 263
273 264 ENTRY(bcopy)
274 265 #ifdef DEBUG
275 266 orq %rdx, %rdx /* %rdx = count */
276 267 jz 1f
277 268 cmpq postbootkernelbase(%rip), %rdi /* %rdi = from */
278 269 jb 0f
279 270 cmpq postbootkernelbase(%rip), %rsi /* %rsi = to */
280 271 jnb 1f
281 272 0: leaq .bcopy_panic_msg(%rip), %rdi
282 273 jmp call_panic /* setup stack and call panic */
283 274 1:
284 275 #endif
285 276 /*
286 277 * bcopy_altentry() is called from kcopy, i.e., do_copy_fault.
287 278 * kcopy assumes that bcopy doesn't touch %r9 and %r11. If bcopy
288 279 * uses these registers in future they must be saved and restored.
289 280 */
290 281 ALTENTRY(bcopy_altentry)
291 282 do_copy:
292 283 #define L(s) .bcopy/**/s
293 284 cmpq $0x50, %rdx /* 80 */
294 285 jae bcopy_ck_size
295 286
296 287 /*
297 288 * Performance data shows many caller's copy small buffers. So for
298 289 * best perf for these sizes unrolled code is used. Store data without
299 290 * worrying about alignment.
300 291 */
301 292 leaq L(fwdPxQx)(%rip), %r10
302 293 addq %rdx, %rdi
303 294 addq %rdx, %rsi
304 295 movslq (%r10,%rdx,4), %rcx
305 296 leaq (%rcx,%r10,1), %r10
306 297 INDIRECT_JMP_REG(r10)
307 298
308 299 .p2align 4
309 300 L(fwdPxQx):
310 301 .int L(P0Q0)-L(fwdPxQx) /* 0 */
311 302 .int L(P1Q0)-L(fwdPxQx)
312 303 .int L(P2Q0)-L(fwdPxQx)
313 304 .int L(P3Q0)-L(fwdPxQx)
314 305 .int L(P4Q0)-L(fwdPxQx)
315 306 .int L(P5Q0)-L(fwdPxQx)
316 307 .int L(P6Q0)-L(fwdPxQx)
317 308 .int L(P7Q0)-L(fwdPxQx)
318 309
319 310 .int L(P0Q1)-L(fwdPxQx) /* 8 */
320 311 .int L(P1Q1)-L(fwdPxQx)
321 312 .int L(P2Q1)-L(fwdPxQx)
322 313 .int L(P3Q1)-L(fwdPxQx)
323 314 .int L(P4Q1)-L(fwdPxQx)
324 315 .int L(P5Q1)-L(fwdPxQx)
325 316 .int L(P6Q1)-L(fwdPxQx)
326 317 .int L(P7Q1)-L(fwdPxQx)
327 318
328 319 .int L(P0Q2)-L(fwdPxQx) /* 16 */
329 320 .int L(P1Q2)-L(fwdPxQx)
330 321 .int L(P2Q2)-L(fwdPxQx)
331 322 .int L(P3Q2)-L(fwdPxQx)
332 323 .int L(P4Q2)-L(fwdPxQx)
333 324 .int L(P5Q2)-L(fwdPxQx)
334 325 .int L(P6Q2)-L(fwdPxQx)
335 326 .int L(P7Q2)-L(fwdPxQx)
336 327
337 328 .int L(P0Q3)-L(fwdPxQx) /* 24 */
338 329 .int L(P1Q3)-L(fwdPxQx)
339 330 .int L(P2Q3)-L(fwdPxQx)
340 331 .int L(P3Q3)-L(fwdPxQx)
341 332 .int L(P4Q3)-L(fwdPxQx)
342 333 .int L(P5Q3)-L(fwdPxQx)
343 334 .int L(P6Q3)-L(fwdPxQx)
344 335 .int L(P7Q3)-L(fwdPxQx)
345 336
346 337 .int L(P0Q4)-L(fwdPxQx) /* 32 */
347 338 .int L(P1Q4)-L(fwdPxQx)
348 339 .int L(P2Q4)-L(fwdPxQx)
349 340 .int L(P3Q4)-L(fwdPxQx)
350 341 .int L(P4Q4)-L(fwdPxQx)
351 342 .int L(P5Q4)-L(fwdPxQx)
352 343 .int L(P6Q4)-L(fwdPxQx)
353 344 .int L(P7Q4)-L(fwdPxQx)
354 345
355 346 .int L(P0Q5)-L(fwdPxQx) /* 40 */
356 347 .int L(P1Q5)-L(fwdPxQx)
357 348 .int L(P2Q5)-L(fwdPxQx)
358 349 .int L(P3Q5)-L(fwdPxQx)
359 350 .int L(P4Q5)-L(fwdPxQx)
360 351 .int L(P5Q5)-L(fwdPxQx)
361 352 .int L(P6Q5)-L(fwdPxQx)
362 353 .int L(P7Q5)-L(fwdPxQx)
363 354
364 355 .int L(P0Q6)-L(fwdPxQx) /* 48 */
365 356 .int L(P1Q6)-L(fwdPxQx)
366 357 .int L(P2Q6)-L(fwdPxQx)
367 358 .int L(P3Q6)-L(fwdPxQx)
368 359 .int L(P4Q6)-L(fwdPxQx)
369 360 .int L(P5Q6)-L(fwdPxQx)
370 361 .int L(P6Q6)-L(fwdPxQx)
371 362 .int L(P7Q6)-L(fwdPxQx)
372 363
373 364 .int L(P0Q7)-L(fwdPxQx) /* 56 */
374 365 .int L(P1Q7)-L(fwdPxQx)
375 366 .int L(P2Q7)-L(fwdPxQx)
376 367 .int L(P3Q7)-L(fwdPxQx)
377 368 .int L(P4Q7)-L(fwdPxQx)
378 369 .int L(P5Q7)-L(fwdPxQx)
379 370 .int L(P6Q7)-L(fwdPxQx)
380 371 .int L(P7Q7)-L(fwdPxQx)
381 372
382 373 .int L(P0Q8)-L(fwdPxQx) /* 64 */
383 374 .int L(P1Q8)-L(fwdPxQx)
384 375 .int L(P2Q8)-L(fwdPxQx)
385 376 .int L(P3Q8)-L(fwdPxQx)
386 377 .int L(P4Q8)-L(fwdPxQx)
387 378 .int L(P5Q8)-L(fwdPxQx)
388 379 .int L(P6Q8)-L(fwdPxQx)
389 380 .int L(P7Q8)-L(fwdPxQx)
390 381
391 382 .int L(P0Q9)-L(fwdPxQx) /* 72 */
392 383 .int L(P1Q9)-L(fwdPxQx)
393 384 .int L(P2Q9)-L(fwdPxQx)
394 385 .int L(P3Q9)-L(fwdPxQx)
395 386 .int L(P4Q9)-L(fwdPxQx)
396 387 .int L(P5Q9)-L(fwdPxQx)
397 388 .int L(P6Q9)-L(fwdPxQx)
398 389 .int L(P7Q9)-L(fwdPxQx) /* 79 */
399 390
400 391 .p2align 4
401 392 L(P0Q9):
402 393 mov -0x48(%rdi), %rcx
403 394 mov %rcx, -0x48(%rsi)
404 395 L(P0Q8):
405 396 mov -0x40(%rdi), %r10
406 397 mov %r10, -0x40(%rsi)
407 398 L(P0Q7):
408 399 mov -0x38(%rdi), %r8
409 400 mov %r8, -0x38(%rsi)
410 401 L(P0Q6):
411 402 mov -0x30(%rdi), %rcx
412 403 mov %rcx, -0x30(%rsi)
413 404 L(P0Q5):
414 405 mov -0x28(%rdi), %r10
415 406 mov %r10, -0x28(%rsi)
416 407 L(P0Q4):
417 408 mov -0x20(%rdi), %r8
418 409 mov %r8, -0x20(%rsi)
419 410 L(P0Q3):
420 411 mov -0x18(%rdi), %rcx
421 412 mov %rcx, -0x18(%rsi)
422 413 L(P0Q2):
423 414 mov -0x10(%rdi), %r10
424 415 mov %r10, -0x10(%rsi)
425 416 L(P0Q1):
426 417 mov -0x8(%rdi), %r8
427 418 mov %r8, -0x8(%rsi)
428 419 L(P0Q0):
429 420 ret
430 421
431 422 .p2align 4
432 423 L(P1Q9):
433 424 mov -0x49(%rdi), %r8
434 425 mov %r8, -0x49(%rsi)
435 426 L(P1Q8):
436 427 mov -0x41(%rdi), %rcx
437 428 mov %rcx, -0x41(%rsi)
438 429 L(P1Q7):
439 430 mov -0x39(%rdi), %r10
440 431 mov %r10, -0x39(%rsi)
441 432 L(P1Q6):
442 433 mov -0x31(%rdi), %r8
443 434 mov %r8, -0x31(%rsi)
444 435 L(P1Q5):
445 436 mov -0x29(%rdi), %rcx
446 437 mov %rcx, -0x29(%rsi)
447 438 L(P1Q4):
448 439 mov -0x21(%rdi), %r10
449 440 mov %r10, -0x21(%rsi)
450 441 L(P1Q3):
451 442 mov -0x19(%rdi), %r8
452 443 mov %r8, -0x19(%rsi)
453 444 L(P1Q2):
454 445 mov -0x11(%rdi), %rcx
455 446 mov %rcx, -0x11(%rsi)
456 447 L(P1Q1):
457 448 mov -0x9(%rdi), %r10
458 449 mov %r10, -0x9(%rsi)
459 450 L(P1Q0):
460 451 movzbq -0x1(%rdi), %r8
461 452 mov %r8b, -0x1(%rsi)
462 453 ret
463 454
464 455 .p2align 4
465 456 L(P2Q9):
466 457 mov -0x4a(%rdi), %r8
467 458 mov %r8, -0x4a(%rsi)
468 459 L(P2Q8):
469 460 mov -0x42(%rdi), %rcx
470 461 mov %rcx, -0x42(%rsi)
471 462 L(P2Q7):
472 463 mov -0x3a(%rdi), %r10
473 464 mov %r10, -0x3a(%rsi)
474 465 L(P2Q6):
475 466 mov -0x32(%rdi), %r8
476 467 mov %r8, -0x32(%rsi)
477 468 L(P2Q5):
478 469 mov -0x2a(%rdi), %rcx
479 470 mov %rcx, -0x2a(%rsi)
480 471 L(P2Q4):
481 472 mov -0x22(%rdi), %r10
482 473 mov %r10, -0x22(%rsi)
483 474 L(P2Q3):
484 475 mov -0x1a(%rdi), %r8
485 476 mov %r8, -0x1a(%rsi)
486 477 L(P2Q2):
487 478 mov -0x12(%rdi), %rcx
488 479 mov %rcx, -0x12(%rsi)
489 480 L(P2Q1):
490 481 mov -0xa(%rdi), %r10
491 482 mov %r10, -0xa(%rsi)
492 483 L(P2Q0):
493 484 movzwq -0x2(%rdi), %r8
494 485 mov %r8w, -0x2(%rsi)
495 486 ret
496 487
497 488 .p2align 4
498 489 L(P3Q9):
499 490 mov -0x4b(%rdi), %r8
500 491 mov %r8, -0x4b(%rsi)
501 492 L(P3Q8):
502 493 mov -0x43(%rdi), %rcx
503 494 mov %rcx, -0x43(%rsi)
504 495 L(P3Q7):
505 496 mov -0x3b(%rdi), %r10
506 497 mov %r10, -0x3b(%rsi)
507 498 L(P3Q6):
508 499 mov -0x33(%rdi), %r8
509 500 mov %r8, -0x33(%rsi)
510 501 L(P3Q5):
511 502 mov -0x2b(%rdi), %rcx
512 503 mov %rcx, -0x2b(%rsi)
513 504 L(P3Q4):
514 505 mov -0x23(%rdi), %r10
515 506 mov %r10, -0x23(%rsi)
516 507 L(P3Q3):
517 508 mov -0x1b(%rdi), %r8
518 509 mov %r8, -0x1b(%rsi)
519 510 L(P3Q2):
520 511 mov -0x13(%rdi), %rcx
521 512 mov %rcx, -0x13(%rsi)
522 513 L(P3Q1):
523 514 mov -0xb(%rdi), %r10
524 515 mov %r10, -0xb(%rsi)
525 516 /*
526 517 * These trailing loads/stores have to do all their loads 1st,
527 518 * then do the stores.
528 519 */
529 520 L(P3Q0):
530 521 movzwq -0x3(%rdi), %r8
531 522 movzbq -0x1(%rdi), %r10
532 523 mov %r8w, -0x3(%rsi)
533 524 mov %r10b, -0x1(%rsi)
534 525 ret
535 526
536 527 .p2align 4
537 528 L(P4Q9):
538 529 mov -0x4c(%rdi), %r8
539 530 mov %r8, -0x4c(%rsi)
540 531 L(P4Q8):
541 532 mov -0x44(%rdi), %rcx
542 533 mov %rcx, -0x44(%rsi)
543 534 L(P4Q7):
544 535 mov -0x3c(%rdi), %r10
545 536 mov %r10, -0x3c(%rsi)
546 537 L(P4Q6):
547 538 mov -0x34(%rdi), %r8
548 539 mov %r8, -0x34(%rsi)
549 540 L(P4Q5):
550 541 mov -0x2c(%rdi), %rcx
551 542 mov %rcx, -0x2c(%rsi)
552 543 L(P4Q4):
553 544 mov -0x24(%rdi), %r10
554 545 mov %r10, -0x24(%rsi)
555 546 L(P4Q3):
556 547 mov -0x1c(%rdi), %r8
557 548 mov %r8, -0x1c(%rsi)
558 549 L(P4Q2):
559 550 mov -0x14(%rdi), %rcx
560 551 mov %rcx, -0x14(%rsi)
561 552 L(P4Q1):
562 553 mov -0xc(%rdi), %r10
563 554 mov %r10, -0xc(%rsi)
564 555 L(P4Q0):
565 556 mov -0x4(%rdi), %r8d
566 557 mov %r8d, -0x4(%rsi)
567 558 ret
568 559
569 560 .p2align 4
570 561 L(P5Q9):
571 562 mov -0x4d(%rdi), %r8
572 563 mov %r8, -0x4d(%rsi)
573 564 L(P5Q8):
574 565 mov -0x45(%rdi), %rcx
575 566 mov %rcx, -0x45(%rsi)
576 567 L(P5Q7):
577 568 mov -0x3d(%rdi), %r10
578 569 mov %r10, -0x3d(%rsi)
579 570 L(P5Q6):
580 571 mov -0x35(%rdi), %r8
581 572 mov %r8, -0x35(%rsi)
582 573 L(P5Q5):
583 574 mov -0x2d(%rdi), %rcx
584 575 mov %rcx, -0x2d(%rsi)
585 576 L(P5Q4):
586 577 mov -0x25(%rdi), %r10
587 578 mov %r10, -0x25(%rsi)
588 579 L(P5Q3):
589 580 mov -0x1d(%rdi), %r8
590 581 mov %r8, -0x1d(%rsi)
591 582 L(P5Q2):
592 583 mov -0x15(%rdi), %rcx
593 584 mov %rcx, -0x15(%rsi)
594 585 L(P5Q1):
595 586 mov -0xd(%rdi), %r10
596 587 mov %r10, -0xd(%rsi)
597 588 L(P5Q0):
598 589 mov -0x5(%rdi), %r8d
599 590 movzbq -0x1(%rdi), %r10
600 591 mov %r8d, -0x5(%rsi)
601 592 mov %r10b, -0x1(%rsi)
602 593 ret
603 594
604 595 .p2align 4
605 596 L(P6Q9):
606 597 mov -0x4e(%rdi), %r8
607 598 mov %r8, -0x4e(%rsi)
608 599 L(P6Q8):
609 600 mov -0x46(%rdi), %rcx
610 601 mov %rcx, -0x46(%rsi)
611 602 L(P6Q7):
612 603 mov -0x3e(%rdi), %r10
613 604 mov %r10, -0x3e(%rsi)
614 605 L(P6Q6):
615 606 mov -0x36(%rdi), %r8
616 607 mov %r8, -0x36(%rsi)
617 608 L(P6Q5):
618 609 mov -0x2e(%rdi), %rcx
619 610 mov %rcx, -0x2e(%rsi)
620 611 L(P6Q4):
621 612 mov -0x26(%rdi), %r10
622 613 mov %r10, -0x26(%rsi)
623 614 L(P6Q3):
624 615 mov -0x1e(%rdi), %r8
625 616 mov %r8, -0x1e(%rsi)
626 617 L(P6Q2):
627 618 mov -0x16(%rdi), %rcx
628 619 mov %rcx, -0x16(%rsi)
629 620 L(P6Q1):
630 621 mov -0xe(%rdi), %r10
631 622 mov %r10, -0xe(%rsi)
632 623 L(P6Q0):
633 624 mov -0x6(%rdi), %r8d
634 625 movzwq -0x2(%rdi), %r10
635 626 mov %r8d, -0x6(%rsi)
636 627 mov %r10w, -0x2(%rsi)
637 628 ret
638 629
639 630 .p2align 4
640 631 L(P7Q9):
641 632 mov -0x4f(%rdi), %r8
642 633 mov %r8, -0x4f(%rsi)
643 634 L(P7Q8):
644 635 mov -0x47(%rdi), %rcx
645 636 mov %rcx, -0x47(%rsi)
646 637 L(P7Q7):
647 638 mov -0x3f(%rdi), %r10
648 639 mov %r10, -0x3f(%rsi)
649 640 L(P7Q6):
650 641 mov -0x37(%rdi), %r8
651 642 mov %r8, -0x37(%rsi)
652 643 L(P7Q5):
653 644 mov -0x2f(%rdi), %rcx
654 645 mov %rcx, -0x2f(%rsi)
655 646 L(P7Q4):
656 647 mov -0x27(%rdi), %r10
657 648 mov %r10, -0x27(%rsi)
658 649 L(P7Q3):
659 650 mov -0x1f(%rdi), %r8
660 651 mov %r8, -0x1f(%rsi)
661 652 L(P7Q2):
662 653 mov -0x17(%rdi), %rcx
663 654 mov %rcx, -0x17(%rsi)
664 655 L(P7Q1):
665 656 mov -0xf(%rdi), %r10
666 657 mov %r10, -0xf(%rsi)
667 658 L(P7Q0):
668 659 mov -0x7(%rdi), %r8d
669 660 movzwq -0x3(%rdi), %r10
670 661 movzbq -0x1(%rdi), %rcx
671 662 mov %r8d, -0x7(%rsi)
672 663 mov %r10w, -0x3(%rsi)
673 664 mov %cl, -0x1(%rsi)
674 665 ret
675 666
676 667 /*
677 668 * For large sizes rep smovq is fastest.
678 669 * Transition point determined experimentally as measured on
679 670 * Intel Xeon processors (incl. Nehalem and previous generations) and
680 671 * AMD Opteron. The transition value is patched at boot time to avoid
681 672 * memory reference hit.
682 673 */
683 674 .globl bcopy_patch_start
684 675 bcopy_patch_start:
685 676 cmpq $BCOPY_NHM_REP, %rdx
686 677 .globl bcopy_patch_end
687 678 bcopy_patch_end:
688 679
689 680 .p2align 4
690 681 ALTENTRY(bcopy_ck_size)
691 682
692 683 cmpq $BCOPY_DFLT_REP, %rdx
693 684 jae L(use_rep)
694 685
695 686 /*
696 687 * Align to a 8-byte boundary. Avoids penalties from unaligned stores
697 688 * as well as from stores spanning cachelines.
698 689 */
699 690 test $0x7, %rsi
700 691 jz L(aligned_loop)
701 692 test $0x1, %rsi
702 693 jz 2f
703 694 movzbq (%rdi), %r8
704 695 dec %rdx
705 696 inc %rdi
706 697 mov %r8b, (%rsi)
707 698 inc %rsi
708 699 2:
709 700 test $0x2, %rsi
710 701 jz 4f
711 702 movzwq (%rdi), %r8
712 703 sub $0x2, %rdx
713 704 add $0x2, %rdi
714 705 mov %r8w, (%rsi)
715 706 add $0x2, %rsi
716 707 4:
717 708 test $0x4, %rsi
718 709 jz L(aligned_loop)
719 710 mov (%rdi), %r8d
720 711 sub $0x4, %rdx
721 712 add $0x4, %rdi
722 713 mov %r8d, (%rsi)
723 714 add $0x4, %rsi
724 715
725 716 /*
726 717 * Copy 64-bytes per loop
727 718 */
728 719 .p2align 4
729 720 L(aligned_loop):
730 721 mov (%rdi), %r8
731 722 mov 0x8(%rdi), %r10
732 723 lea -0x40(%rdx), %rdx
733 724 mov %r8, (%rsi)
734 725 mov %r10, 0x8(%rsi)
735 726 mov 0x10(%rdi), %rcx
736 727 mov 0x18(%rdi), %r8
737 728 mov %rcx, 0x10(%rsi)
738 729 mov %r8, 0x18(%rsi)
739 730
740 731 cmp $0x40, %rdx
741 732 mov 0x20(%rdi), %r10
742 733 mov 0x28(%rdi), %rcx
743 734 mov %r10, 0x20(%rsi)
744 735 mov %rcx, 0x28(%rsi)
745 736 mov 0x30(%rdi), %r8
746 737 mov 0x38(%rdi), %r10
747 738 lea 0x40(%rdi), %rdi
748 739 mov %r8, 0x30(%rsi)
749 740 mov %r10, 0x38(%rsi)
750 741 lea 0x40(%rsi), %rsi
751 742 jae L(aligned_loop)
752 743
753 744 /*
754 745 * Copy remaining bytes (0-63)
755 746 */
756 747 L(do_remainder):
757 748 leaq L(fwdPxQx)(%rip), %r10
758 749 addq %rdx, %rdi
759 750 addq %rdx, %rsi
760 751 movslq (%r10,%rdx,4), %rcx
761 752 leaq (%rcx,%r10,1), %r10
762 753 INDIRECT_JMP_REG(r10)
763 754
764 755 /*
765 756 * Use rep smovq. Clear remainder via unrolled code
766 757 */
767 758 .p2align 4
768 759 L(use_rep):
769 760 xchgq %rdi, %rsi /* %rsi = source, %rdi = destination */
770 761 movq %rdx, %rcx /* %rcx = count */
771 762 shrq $3, %rcx /* 8-byte word count */
772 763 rep
773 764 smovq
774 765
775 766 xchgq %rsi, %rdi /* %rdi = src, %rsi = destination */
776 767 andq $7, %rdx /* remainder */
777 768 jnz L(do_remainder)
778 769 ret
779 770 #undef L
780 771 SET_SIZE(bcopy_ck_size)
781 772
782 773 #ifdef DEBUG
783 774 /*
784 775 * Setup frame on the run-time stack. The end of the input argument
785 776 * area must be aligned on a 16 byte boundary. The stack pointer %rsp,
786 777 * always points to the end of the latest allocated stack frame.
787 778 * panic(const char *format, ...) is a varargs function. When a
788 779 * function taking variable arguments is called, %rax must be set
789 780 * to eight times the number of floating point parameters passed
790 781 * to the function in SSE registers.
791 782 */
792 783 call_panic:
793 784 pushq %rbp /* align stack properly */
794 785 movq %rsp, %rbp
795 786 xorl %eax, %eax /* no variable arguments */
796 787 call panic /* %rdi = format string */
797 788 #endif
798 789 SET_SIZE(bcopy_altentry)
799 790 SET_SIZE(bcopy)
800 791
801 792
802 793 /*
803 794 * Zero a block of storage, returning an error code if we
804 795 * take a kernel pagefault which cannot be resolved.
805 796 * Returns errno value on pagefault error, 0 if all ok
806 797 */
807 798
808 799 ENTRY(kzero)
809 800 #ifdef DEBUG
810 801 cmpq postbootkernelbase(%rip), %rdi /* %rdi = addr */
811 802 jnb 0f
812 803 leaq .kzero_panic_msg(%rip), %rdi
813 804 jmp call_panic /* setup stack and call panic */
814 805 0:
815 806 #endif
816 807 /*
817 808 * pass lofault value as 3rd argument for fault return
818 809 */
819 810 leaq _kzeroerr(%rip), %rdx
820 811
821 812 movq %gs:CPU_THREAD, %r9 /* %r9 = thread addr */
822 813 movq T_LOFAULT(%r9), %r11 /* save the current lofault */
823 814 movq %rdx, T_LOFAULT(%r9) /* new lofault */
824 815 call bzero_altentry
825 816 xorl %eax, %eax
826 817 movq %r11, T_LOFAULT(%r9) /* restore the original lofault */
827 818 ret
828 819 /*
829 820 * A fault during bzero is indicated through an errno value
830 821 * in %rax when we iretq to here.
831 822 */
832 823 _kzeroerr:
833 824 addq $8, %rsp /* pop bzero_altentry call ret addr */
834 825 movq %r11, T_LOFAULT(%r9) /* restore the original lofault */
835 826 ret
836 827 SET_SIZE(kzero)
837 828
838 829 /*
839 830 * Zero a block of storage.
840 831 */
841 832
842 833 ENTRY(bzero)
843 834 #ifdef DEBUG
844 835 cmpq postbootkernelbase(%rip), %rdi /* %rdi = addr */
845 836 jnb 0f
846 837 leaq .bzero_panic_msg(%rip), %rdi
847 838 jmp call_panic /* setup stack and call panic */
848 839 0:
849 840 #endif
850 841 ALTENTRY(bzero_altentry)
851 842 do_zero:
852 843 #define L(s) .bzero/**/s
853 844 xorl %eax, %eax
854 845
855 846 cmpq $0x50, %rsi /* 80 */
856 847 jae L(ck_align)
857 848
858 849 /*
859 850 * Performance data shows many caller's are zeroing small buffers. So
860 851 * for best perf for these sizes unrolled code is used. Store zeros
861 852 * without worrying about alignment.
862 853 */
863 854 leaq L(setPxQx)(%rip), %r10
864 855 addq %rsi, %rdi
865 856 movslq (%r10,%rsi,4), %rcx
866 857 leaq (%rcx,%r10,1), %r10
867 858 INDIRECT_JMP_REG(r10)
868 859
869 860 .p2align 4
870 861 L(setPxQx):
871 862 .int L(P0Q0)-L(setPxQx) /* 0 */
872 863 .int L(P1Q0)-L(setPxQx)
873 864 .int L(P2Q0)-L(setPxQx)
874 865 .int L(P3Q0)-L(setPxQx)
875 866 .int L(P4Q0)-L(setPxQx)
876 867 .int L(P5Q0)-L(setPxQx)
877 868 .int L(P6Q0)-L(setPxQx)
878 869 .int L(P7Q0)-L(setPxQx)
879 870
880 871 .int L(P0Q1)-L(setPxQx) /* 8 */
881 872 .int L(P1Q1)-L(setPxQx)
882 873 .int L(P2Q1)-L(setPxQx)
883 874 .int L(P3Q1)-L(setPxQx)
884 875 .int L(P4Q1)-L(setPxQx)
885 876 .int L(P5Q1)-L(setPxQx)
886 877 .int L(P6Q1)-L(setPxQx)
887 878 .int L(P7Q1)-L(setPxQx)
888 879
889 880 .int L(P0Q2)-L(setPxQx) /* 16 */
890 881 .int L(P1Q2)-L(setPxQx)
891 882 .int L(P2Q2)-L(setPxQx)
892 883 .int L(P3Q2)-L(setPxQx)
893 884 .int L(P4Q2)-L(setPxQx)
894 885 .int L(P5Q2)-L(setPxQx)
895 886 .int L(P6Q2)-L(setPxQx)
896 887 .int L(P7Q2)-L(setPxQx)
897 888
898 889 .int L(P0Q3)-L(setPxQx) /* 24 */
899 890 .int L(P1Q3)-L(setPxQx)
900 891 .int L(P2Q3)-L(setPxQx)
901 892 .int L(P3Q3)-L(setPxQx)
902 893 .int L(P4Q3)-L(setPxQx)
903 894 .int L(P5Q3)-L(setPxQx)
904 895 .int L(P6Q3)-L(setPxQx)
905 896 .int L(P7Q3)-L(setPxQx)
906 897
907 898 .int L(P0Q4)-L(setPxQx) /* 32 */
908 899 .int L(P1Q4)-L(setPxQx)
909 900 .int L(P2Q4)-L(setPxQx)
910 901 .int L(P3Q4)-L(setPxQx)
911 902 .int L(P4Q4)-L(setPxQx)
912 903 .int L(P5Q4)-L(setPxQx)
913 904 .int L(P6Q4)-L(setPxQx)
914 905 .int L(P7Q4)-L(setPxQx)
915 906
916 907 .int L(P0Q5)-L(setPxQx) /* 40 */
917 908 .int L(P1Q5)-L(setPxQx)
918 909 .int L(P2Q5)-L(setPxQx)
919 910 .int L(P3Q5)-L(setPxQx)
920 911 .int L(P4Q5)-L(setPxQx)
921 912 .int L(P5Q5)-L(setPxQx)
922 913 .int L(P6Q5)-L(setPxQx)
923 914 .int L(P7Q5)-L(setPxQx)
924 915
925 916 .int L(P0Q6)-L(setPxQx) /* 48 */
926 917 .int L(P1Q6)-L(setPxQx)
927 918 .int L(P2Q6)-L(setPxQx)
928 919 .int L(P3Q6)-L(setPxQx)
929 920 .int L(P4Q6)-L(setPxQx)
930 921 .int L(P5Q6)-L(setPxQx)
931 922 .int L(P6Q6)-L(setPxQx)
932 923 .int L(P7Q6)-L(setPxQx)
933 924
934 925 .int L(P0Q7)-L(setPxQx) /* 56 */
935 926 .int L(P1Q7)-L(setPxQx)
936 927 .int L(P2Q7)-L(setPxQx)
937 928 .int L(P3Q7)-L(setPxQx)
938 929 .int L(P4Q7)-L(setPxQx)
939 930 .int L(P5Q7)-L(setPxQx)
940 931 .int L(P6Q7)-L(setPxQx)
941 932 .int L(P7Q7)-L(setPxQx)
942 933
943 934 .int L(P0Q8)-L(setPxQx) /* 64 */
944 935 .int L(P1Q8)-L(setPxQx)
945 936 .int L(P2Q8)-L(setPxQx)
946 937 .int L(P3Q8)-L(setPxQx)
947 938 .int L(P4Q8)-L(setPxQx)
948 939 .int L(P5Q8)-L(setPxQx)
949 940 .int L(P6Q8)-L(setPxQx)
950 941 .int L(P7Q8)-L(setPxQx)
951 942
952 943 .int L(P0Q9)-L(setPxQx) /* 72 */
953 944 .int L(P1Q9)-L(setPxQx)
954 945 .int L(P2Q9)-L(setPxQx)
955 946 .int L(P3Q9)-L(setPxQx)
956 947 .int L(P4Q9)-L(setPxQx)
957 948 .int L(P5Q9)-L(setPxQx)
958 949 .int L(P6Q9)-L(setPxQx)
959 950 .int L(P7Q9)-L(setPxQx) /* 79 */
960 951
961 952 .p2align 4
962 953 L(P0Q9): mov %rax, -0x48(%rdi)
963 954 L(P0Q8): mov %rax, -0x40(%rdi)
964 955 L(P0Q7): mov %rax, -0x38(%rdi)
965 956 L(P0Q6): mov %rax, -0x30(%rdi)
966 957 L(P0Q5): mov %rax, -0x28(%rdi)
967 958 L(P0Q4): mov %rax, -0x20(%rdi)
968 959 L(P0Q3): mov %rax, -0x18(%rdi)
969 960 L(P0Q2): mov %rax, -0x10(%rdi)
970 961 L(P0Q1): mov %rax, -0x8(%rdi)
971 962 L(P0Q0):
972 963 ret
973 964
974 965 .p2align 4
975 966 L(P1Q9): mov %rax, -0x49(%rdi)
976 967 L(P1Q8): mov %rax, -0x41(%rdi)
977 968 L(P1Q7): mov %rax, -0x39(%rdi)
978 969 L(P1Q6): mov %rax, -0x31(%rdi)
979 970 L(P1Q5): mov %rax, -0x29(%rdi)
980 971 L(P1Q4): mov %rax, -0x21(%rdi)
981 972 L(P1Q3): mov %rax, -0x19(%rdi)
982 973 L(P1Q2): mov %rax, -0x11(%rdi)
983 974 L(P1Q1): mov %rax, -0x9(%rdi)
984 975 L(P1Q0): mov %al, -0x1(%rdi)
985 976 ret
986 977
987 978 .p2align 4
988 979 L(P2Q9): mov %rax, -0x4a(%rdi)
989 980 L(P2Q8): mov %rax, -0x42(%rdi)
990 981 L(P2Q7): mov %rax, -0x3a(%rdi)
991 982 L(P2Q6): mov %rax, -0x32(%rdi)
992 983 L(P2Q5): mov %rax, -0x2a(%rdi)
993 984 L(P2Q4): mov %rax, -0x22(%rdi)
994 985 L(P2Q3): mov %rax, -0x1a(%rdi)
995 986 L(P2Q2): mov %rax, -0x12(%rdi)
996 987 L(P2Q1): mov %rax, -0xa(%rdi)
997 988 L(P2Q0): mov %ax, -0x2(%rdi)
998 989 ret
999 990
1000 991 .p2align 4
1001 992 L(P3Q9): mov %rax, -0x4b(%rdi)
1002 993 L(P3Q8): mov %rax, -0x43(%rdi)
1003 994 L(P3Q7): mov %rax, -0x3b(%rdi)
1004 995 L(P3Q6): mov %rax, -0x33(%rdi)
1005 996 L(P3Q5): mov %rax, -0x2b(%rdi)
1006 997 L(P3Q4): mov %rax, -0x23(%rdi)
1007 998 L(P3Q3): mov %rax, -0x1b(%rdi)
1008 999 L(P3Q2): mov %rax, -0x13(%rdi)
1009 1000 L(P3Q1): mov %rax, -0xb(%rdi)
1010 1001 L(P3Q0): mov %ax, -0x3(%rdi)
1011 1002 mov %al, -0x1(%rdi)
1012 1003 ret
1013 1004
1014 1005 .p2align 4
1015 1006 L(P4Q9): mov %rax, -0x4c(%rdi)
1016 1007 L(P4Q8): mov %rax, -0x44(%rdi)
1017 1008 L(P4Q7): mov %rax, -0x3c(%rdi)
1018 1009 L(P4Q6): mov %rax, -0x34(%rdi)
1019 1010 L(P4Q5): mov %rax, -0x2c(%rdi)
1020 1011 L(P4Q4): mov %rax, -0x24(%rdi)
1021 1012 L(P4Q3): mov %rax, -0x1c(%rdi)
1022 1013 L(P4Q2): mov %rax, -0x14(%rdi)
1023 1014 L(P4Q1): mov %rax, -0xc(%rdi)
1024 1015 L(P4Q0): mov %eax, -0x4(%rdi)
1025 1016 ret
1026 1017
1027 1018 .p2align 4
1028 1019 L(P5Q9): mov %rax, -0x4d(%rdi)
1029 1020 L(P5Q8): mov %rax, -0x45(%rdi)
1030 1021 L(P5Q7): mov %rax, -0x3d(%rdi)
1031 1022 L(P5Q6): mov %rax, -0x35(%rdi)
1032 1023 L(P5Q5): mov %rax, -0x2d(%rdi)
1033 1024 L(P5Q4): mov %rax, -0x25(%rdi)
1034 1025 L(P5Q3): mov %rax, -0x1d(%rdi)
1035 1026 L(P5Q2): mov %rax, -0x15(%rdi)
1036 1027 L(P5Q1): mov %rax, -0xd(%rdi)
1037 1028 L(P5Q0): mov %eax, -0x5(%rdi)
1038 1029 mov %al, -0x1(%rdi)
1039 1030 ret
1040 1031
1041 1032 .p2align 4
1042 1033 L(P6Q9): mov %rax, -0x4e(%rdi)
1043 1034 L(P6Q8): mov %rax, -0x46(%rdi)
1044 1035 L(P6Q7): mov %rax, -0x3e(%rdi)
1045 1036 L(P6Q6): mov %rax, -0x36(%rdi)
1046 1037 L(P6Q5): mov %rax, -0x2e(%rdi)
1047 1038 L(P6Q4): mov %rax, -0x26(%rdi)
1048 1039 L(P6Q3): mov %rax, -0x1e(%rdi)
1049 1040 L(P6Q2): mov %rax, -0x16(%rdi)
1050 1041 L(P6Q1): mov %rax, -0xe(%rdi)
1051 1042 L(P6Q0): mov %eax, -0x6(%rdi)
1052 1043 mov %ax, -0x2(%rdi)
1053 1044 ret
1054 1045
1055 1046 .p2align 4
1056 1047 L(P7Q9): mov %rax, -0x4f(%rdi)
1057 1048 L(P7Q8): mov %rax, -0x47(%rdi)
1058 1049 L(P7Q7): mov %rax, -0x3f(%rdi)
1059 1050 L(P7Q6): mov %rax, -0x37(%rdi)
1060 1051 L(P7Q5): mov %rax, -0x2f(%rdi)
1061 1052 L(P7Q4): mov %rax, -0x27(%rdi)
1062 1053 L(P7Q3): mov %rax, -0x1f(%rdi)
1063 1054 L(P7Q2): mov %rax, -0x17(%rdi)
1064 1055 L(P7Q1): mov %rax, -0xf(%rdi)
1065 1056 L(P7Q0): mov %eax, -0x7(%rdi)
1066 1057 mov %ax, -0x3(%rdi)
1067 1058 mov %al, -0x1(%rdi)
1068 1059 ret
1069 1060
1070 1061 /*
1071 1062 * Align to a 16-byte boundary. Avoids penalties from unaligned stores
1072 1063 * as well as from stores spanning cachelines. Note 16-byte alignment
1073 1064 * is better in case where rep sstosq is used.
1074 1065 */
1075 1066 .p2align 4
1076 1067 L(ck_align):
1077 1068 test $0xf, %rdi
1078 1069 jz L(aligned_now)
1079 1070 test $1, %rdi
1080 1071 jz 2f
1081 1072 mov %al, (%rdi)
1082 1073 dec %rsi
1083 1074 lea 1(%rdi),%rdi
1084 1075 2:
1085 1076 test $2, %rdi
1086 1077 jz 4f
1087 1078 mov %ax, (%rdi)
1088 1079 sub $2, %rsi
1089 1080 lea 2(%rdi),%rdi
1090 1081 4:
1091 1082 test $4, %rdi
1092 1083 jz 8f
1093 1084 mov %eax, (%rdi)
1094 1085 sub $4, %rsi
1095 1086 lea 4(%rdi),%rdi
1096 1087 8:
1097 1088 test $8, %rdi
1098 1089 jz L(aligned_now)
1099 1090 mov %rax, (%rdi)
1100 1091 sub $8, %rsi
1101 1092 lea 8(%rdi),%rdi
1102 1093
1103 1094 /*
1104 1095 * For large sizes rep sstoq is fastest.
1105 1096 * Transition point determined experimentally as measured on
1106 1097 * Intel Xeon processors (incl. Nehalem) and AMD Opteron.
1107 1098 */
1108 1099 L(aligned_now):
1109 1100 cmp $BZERO_USE_REP, %rsi
1110 1101 ja L(use_rep)
1111 1102
1112 1103 /*
1113 1104 * zero 64-bytes per loop
1114 1105 */
1115 1106 .p2align 4
1116 1107 L(bzero_loop):
1117 1108 leaq -0x40(%rsi), %rsi
1118 1109 cmpq $0x40, %rsi
1119 1110 movq %rax, (%rdi)
1120 1111 movq %rax, 0x8(%rdi)
1121 1112 movq %rax, 0x10(%rdi)
1122 1113 movq %rax, 0x18(%rdi)
1123 1114 movq %rax, 0x20(%rdi)
1124 1115 movq %rax, 0x28(%rdi)
1125 1116 movq %rax, 0x30(%rdi)
1126 1117 movq %rax, 0x38(%rdi)
1127 1118 leaq 0x40(%rdi), %rdi
1128 1119 jae L(bzero_loop)
1129 1120
1130 1121 /*
1131 1122 * Clear any remaining bytes..
1132 1123 */
1133 1124 9:
1134 1125 leaq L(setPxQx)(%rip), %r10
1135 1126 addq %rsi, %rdi
1136 1127 movslq (%r10,%rsi,4), %rcx
1137 1128 leaq (%rcx,%r10,1), %r10
1138 1129 INDIRECT_JMP_REG(r10)
1139 1130
1140 1131 /*
1141 1132 * Use rep sstoq. Clear any remainder via unrolled code
1142 1133 */
1143 1134 .p2align 4
1144 1135 L(use_rep):
1145 1136 movq %rsi, %rcx /* get size in bytes */
1146 1137 shrq $3, %rcx /* count of 8-byte words to zero */
1147 1138 rep
1148 1139 sstoq /* %rcx = words to clear (%rax=0) */
1149 1140 andq $7, %rsi /* remaining bytes */
1150 1141 jnz 9b
1151 1142 ret
1152 1143 #undef L
1153 1144 SET_SIZE(bzero_altentry)
1154 1145 SET_SIZE(bzero)
1155 1146
1156 1147 /*
1157 1148 * Transfer data to and from user space -
1158 1149 * Note that these routines can cause faults
1159 1150 * It is assumed that the kernel has nothing at
1160 1151 * less than KERNELBASE in the virtual address space.
1161 1152 *
1162 1153 * Note that copyin(9F) and copyout(9F) are part of the
1163 1154 * DDI/DKI which specifies that they return '-1' on "errors."
1164 1155 *
1165 1156 * Sigh.
1166 1157 *
1167 1158 * So there's two extremely similar routines - xcopyin_nta() and
1168 1159 * xcopyout_nta() which return the errno that we've faithfully computed.
1169 1160 * This allows other callers (e.g. uiomove(9F)) to work correctly.
1170 1161 * Given that these are used pretty heavily, we expand the calling
1171 1162 * sequences inline for all flavours (rather than making wrappers).
1172 1163 */
1173 1164
1174 1165 /*
1175 1166 * Copy user data to kernel space.
1176 1167 */
1177 1168
1178 1169 ENTRY(copyin)
1179 1170 pushq %rbp
1180 1171 movq %rsp, %rbp
1181 1172 subq $24, %rsp
1182 1173
1183 1174 /*
1184 1175 * save args in case we trap and need to rerun as a copyop
1185 1176 */
1186 1177 movq %rdi, (%rsp)
1187 1178 movq %rsi, 0x8(%rsp)
1188 1179 movq %rdx, 0x10(%rsp)
1189 1180
1190 1181 movq kernelbase(%rip), %rax
1191 1182 #ifdef DEBUG
1192 1183 cmpq %rax, %rsi /* %rsi = kaddr */
1193 1184 jnb 1f
1194 1185 leaq .copyin_panic_msg(%rip), %rdi
1195 1186 xorl %eax, %eax
1196 1187 call panic
1197 1188 1:
1198 1189 #endif
1199 1190 /*
1200 1191 * pass lofault value as 4th argument to do_copy_fault
1201 1192 */
1202 1193 leaq _copyin_err(%rip), %rcx
1203 1194
1204 1195 movq %gs:CPU_THREAD, %r9
1205 1196 cmpq %rax, %rdi /* test uaddr < kernelbase */
1206 1197 jae 3f /* take copyop if uaddr > kernelbase */
1207 1198 SMAP_DISABLE_INSTR(0)
1208 1199 jmp do_copy_fault /* Takes care of leave for us */
1209 1200
1210 1201 _copyin_err:
1211 1202 SMAP_ENABLE_INSTR(2)
1212 1203 movq %r11, T_LOFAULT(%r9) /* restore original lofault */
1213 1204 addq $8, %rsp /* pop bcopy_altentry call ret addr */
1214 1205 3:
1215 1206 movq T_COPYOPS(%r9), %rax
1216 1207 cmpq $0, %rax
1217 1208 jz 2f
1218 1209 /*
1219 1210 * reload args for the copyop
1220 1211 */
1221 1212 movq (%rsp), %rdi
1222 1213 movq 0x8(%rsp), %rsi
1223 1214 movq 0x10(%rsp), %rdx
1224 1215 leave
1225 1216 movq CP_COPYIN(%rax), %rax
1226 1217 INDIRECT_JMP_REG(rax)
1227 1218
1228 1219 2: movl $-1, %eax
1229 1220 leave
1230 1221 ret
1231 1222 SET_SIZE(copyin)
1232 1223
1233 1224 ENTRY(xcopyin_nta)
1234 1225 pushq %rbp
1235 1226 movq %rsp, %rbp
1236 1227 subq $24, %rsp
1237 1228
1238 1229 /*
1239 1230 * save args in case we trap and need to rerun as a copyop
1240 1231 * %rcx is consumed in this routine so we don't need to save
1241 1232 * it.
1242 1233 */
1243 1234 movq %rdi, (%rsp)
1244 1235 movq %rsi, 0x8(%rsp)
1245 1236 movq %rdx, 0x10(%rsp)
1246 1237
1247 1238 movq kernelbase(%rip), %rax
1248 1239 #ifdef DEBUG
1249 1240 cmpq %rax, %rsi /* %rsi = kaddr */
1250 1241 jnb 1f
1251 1242 leaq .xcopyin_panic_msg(%rip), %rdi
1252 1243 xorl %eax, %eax
1253 1244 call panic
1254 1245 1:
1255 1246 #endif
1256 1247 movq %gs:CPU_THREAD, %r9
1257 1248 cmpq %rax, %rdi /* test uaddr < kernelbase */
1258 1249 jae 4f
1259 1250 cmpq $0, %rcx /* No non-temporal access? */
1260 1251 /*
1261 1252 * pass lofault value as 4th argument to do_copy_fault
1262 1253 */
1263 1254 leaq _xcopyin_err(%rip), %rcx /* doesn't set rflags */
1264 1255 jnz 6f /* use regular access */
1265 1256 /*
1266 1257 * Make sure cnt is >= XCOPY_MIN_SIZE bytes
1267 1258 */
1268 1259 cmpq $XCOPY_MIN_SIZE, %rdx
1269 1260 jae 5f
1270 1261 6:
1271 1262 SMAP_DISABLE_INSTR(1)
1272 1263 jmp do_copy_fault
1273 1264
1274 1265 /*
1275 1266 * Make sure src and dst are NTA_ALIGN_SIZE aligned,
1276 1267 * count is COUNT_ALIGN_SIZE aligned.
1277 1268 */
1278 1269 5:
1279 1270 movq %rdi, %r10
1280 1271 orq %rsi, %r10
1281 1272 andq $NTA_ALIGN_MASK, %r10
1282 1273 orq %rdx, %r10
1283 1274 andq $COUNT_ALIGN_MASK, %r10
1284 1275 jnz 6b
1285 1276 leaq _xcopyin_nta_err(%rip), %rcx /* doesn't set rflags */
1286 1277 SMAP_DISABLE_INSTR(2)
1287 1278 jmp do_copy_fault_nta /* use non-temporal access */
1288 1279
1289 1280 4:
1290 1281 movl $EFAULT, %eax
1291 1282 jmp 3f
1292 1283
1293 1284 /*
1294 1285 * A fault during do_copy_fault or do_copy_fault_nta is
1295 1286 * indicated through an errno value in %rax and we iret from the
1296 1287 * trap handler to here.
1297 1288 */
1298 1289 _xcopyin_err:
1299 1290 addq $8, %rsp /* pop bcopy_altentry call ret addr */
1300 1291 _xcopyin_nta_err:
1301 1292 SMAP_ENABLE_INSTR(3)
1302 1293 movq %r11, T_LOFAULT(%r9) /* restore original lofault */
1303 1294 3:
1304 1295 movq T_COPYOPS(%r9), %r8
1305 1296 cmpq $0, %r8
1306 1297 jz 2f
1307 1298
1308 1299 /*
1309 1300 * reload args for the copyop
1310 1301 */
1311 1302 movq (%rsp), %rdi
1312 1303 movq 0x8(%rsp), %rsi
1313 1304 movq 0x10(%rsp), %rdx
1314 1305 leave
1315 1306 movq CP_XCOPYIN(%r8), %r8
1316 1307 INDIRECT_JMP_REG(r8)
1317 1308
1318 1309 2: leave
1319 1310 ret
1320 1311 SET_SIZE(xcopyin_nta)
1321 1312
1322 1313 /*
1323 1314 * Copy kernel data to user space.
1324 1315 */
1325 1316
1326 1317 ENTRY(copyout)
1327 1318 pushq %rbp
1328 1319 movq %rsp, %rbp
1329 1320 subq $24, %rsp
1330 1321
1331 1322 /*
1332 1323 * save args in case we trap and need to rerun as a copyop
1333 1324 */
1334 1325 movq %rdi, (%rsp)
1335 1326 movq %rsi, 0x8(%rsp)
1336 1327 movq %rdx, 0x10(%rsp)
1337 1328
1338 1329 movq kernelbase(%rip), %rax
1339 1330 #ifdef DEBUG
1340 1331 cmpq %rax, %rdi /* %rdi = kaddr */
1341 1332 jnb 1f
1342 1333 leaq .copyout_panic_msg(%rip), %rdi
1343 1334 xorl %eax, %eax
1344 1335 call panic
1345 1336 1:
1346 1337 #endif
1347 1338 /*
1348 1339 * pass lofault value as 4th argument to do_copy_fault
1349 1340 */
1350 1341 leaq _copyout_err(%rip), %rcx
1351 1342
1352 1343 movq %gs:CPU_THREAD, %r9
1353 1344 cmpq %rax, %rsi /* test uaddr < kernelbase */
1354 1345 jae 3f /* take copyop if uaddr > kernelbase */
1355 1346 SMAP_DISABLE_INSTR(3)
1356 1347 jmp do_copy_fault /* Calls leave for us */
1357 1348
1358 1349 _copyout_err:
1359 1350 SMAP_ENABLE_INSTR(4)
1360 1351 movq %r11, T_LOFAULT(%r9) /* restore original lofault */
1361 1352 addq $8, %rsp /* pop bcopy_altentry call ret addr */
1362 1353 3:
1363 1354 movq T_COPYOPS(%r9), %rax
1364 1355 cmpq $0, %rax
1365 1356 jz 2f
1366 1357
1367 1358 /*
1368 1359 * reload args for the copyop
1369 1360 */
1370 1361 movq (%rsp), %rdi
1371 1362 movq 0x8(%rsp), %rsi
1372 1363 movq 0x10(%rsp), %rdx
1373 1364 leave
1374 1365 movq CP_COPYOUT(%rax), %rax
1375 1366 INDIRECT_JMP_REG(rax)
1376 1367
1377 1368 2: movl $-1, %eax
1378 1369 leave
1379 1370 ret
1380 1371 SET_SIZE(copyout)
1381 1372
1382 1373 ENTRY(xcopyout_nta)
1383 1374 pushq %rbp
1384 1375 movq %rsp, %rbp
1385 1376 subq $24, %rsp
1386 1377
1387 1378 /*
1388 1379 * save args in case we trap and need to rerun as a copyop
1389 1380 */
1390 1381 movq %rdi, (%rsp)
1391 1382 movq %rsi, 0x8(%rsp)
1392 1383 movq %rdx, 0x10(%rsp)
1393 1384
1394 1385 movq kernelbase(%rip), %rax
1395 1386 #ifdef DEBUG
1396 1387 cmpq %rax, %rdi /* %rdi = kaddr */
1397 1388 jnb 1f
1398 1389 leaq .xcopyout_panic_msg(%rip), %rdi
1399 1390 xorl %eax, %eax
1400 1391 call panic
1401 1392 1:
1402 1393 #endif
1403 1394 movq %gs:CPU_THREAD, %r9
1404 1395 cmpq %rax, %rsi /* test uaddr < kernelbase */
1405 1396 jae 4f
1406 1397
1407 1398 cmpq $0, %rcx /* No non-temporal access? */
1408 1399 /*
1409 1400 * pass lofault value as 4th argument to do_copy_fault
1410 1401 */
1411 1402 leaq _xcopyout_err(%rip), %rcx
1412 1403 jnz 6f
1413 1404 /*
1414 1405 * Make sure cnt is >= XCOPY_MIN_SIZE bytes
1415 1406 */
1416 1407 cmpq $XCOPY_MIN_SIZE, %rdx
1417 1408 jae 5f
1418 1409 6:
1419 1410 SMAP_DISABLE_INSTR(4)
1420 1411 jmp do_copy_fault
1421 1412
1422 1413 /*
1423 1414 * Make sure src and dst are NTA_ALIGN_SIZE aligned,
1424 1415 * count is COUNT_ALIGN_SIZE aligned.
1425 1416 */
1426 1417 5:
1427 1418 movq %rdi, %r10
1428 1419 orq %rsi, %r10
1429 1420 andq $NTA_ALIGN_MASK, %r10
1430 1421 orq %rdx, %r10
1431 1422 andq $COUNT_ALIGN_MASK, %r10
1432 1423 jnz 6b
1433 1424 leaq _xcopyout_nta_err(%rip), %rcx
1434 1425 SMAP_DISABLE_INSTR(5)
1435 1426 call do_copy_fault_nta
1436 1427 SMAP_ENABLE_INSTR(5)
1437 1428 ret
1438 1429
1439 1430 4:
1440 1431 movl $EFAULT, %eax
1441 1432 jmp 3f
1442 1433
1443 1434 /*
1444 1435 * A fault during do_copy_fault or do_copy_fault_nta is
1445 1436 * indicated through an errno value in %rax and we iret from the
1446 1437 * trap handler to here.
1447 1438 */
1448 1439 _xcopyout_err:
1449 1440 addq $8, %rsp /* pop bcopy_altentry call ret addr */
1450 1441 _xcopyout_nta_err:
1451 1442 SMAP_ENABLE_INSTR(6)
1452 1443 movq %r11, T_LOFAULT(%r9) /* restore original lofault */
1453 1444 3:
1454 1445 movq T_COPYOPS(%r9), %r8
1455 1446 cmpq $0, %r8
1456 1447 jz 2f
1457 1448
1458 1449 /*
1459 1450 * reload args for the copyop
1460 1451 */
1461 1452 movq (%rsp), %rdi
1462 1453 movq 0x8(%rsp), %rsi
1463 1454 movq 0x10(%rsp), %rdx
1464 1455 leave
1465 1456 movq CP_XCOPYOUT(%r8), %r8
1466 1457 INDIRECT_JMP_REG(r8)
1467 1458
1468 1459 2: leave
1469 1460 ret
1470 1461 SET_SIZE(xcopyout_nta)
1471 1462
1472 1463 /*
1473 1464 * Copy a null terminated string from one point to another in
1474 1465 * the kernel address space.
1475 1466 */
1476 1467
1477 1468 ENTRY(copystr)
1478 1469 pushq %rbp
1479 1470 movq %rsp, %rbp
1480 1471 #ifdef DEBUG
1481 1472 movq kernelbase(%rip), %rax
1482 1473 cmpq %rax, %rdi /* %rdi = from */
1483 1474 jb 0f
1484 1475 cmpq %rax, %rsi /* %rsi = to */
1485 1476 jnb 1f
1486 1477 0: leaq .copystr_panic_msg(%rip), %rdi
1487 1478 xorl %eax, %eax
1488 1479 call panic
1489 1480 1:
1490 1481 #endif
1491 1482 movq %gs:CPU_THREAD, %r9
1492 1483 movq T_LOFAULT(%r9), %r8 /* pass current lofault value as */
1493 1484 /* 5th argument to do_copystr */
1494 1485 xorl %r10d,%r10d /* pass smap restore need in %r10d */
1495 1486 /* as a non-ABI 6th arg */
1496 1487 do_copystr:
1497 1488 movq %gs:CPU_THREAD, %r9 /* %r9 = thread addr */
1498 1489 movq T_LOFAULT(%r9), %r11 /* save the current lofault */
1499 1490 movq %r8, T_LOFAULT(%r9) /* new lofault */
1500 1491
1501 1492 movq %rdx, %r8 /* save maxlength */
1502 1493
1503 1494 cmpq $0, %rdx /* %rdx = maxlength */
1504 1495 je copystr_enametoolong /* maxlength == 0 */
1505 1496
1506 1497 copystr_loop:
1507 1498 decq %r8
1508 1499 movb (%rdi), %al
1509 1500 incq %rdi
1510 1501 movb %al, (%rsi)
1511 1502 incq %rsi
1512 1503 cmpb $0, %al
1513 1504 je copystr_null /* null char */
1514 1505 cmpq $0, %r8
1515 1506 jne copystr_loop
1516 1507
1517 1508 copystr_enametoolong:
1518 1509 movl $ENAMETOOLONG, %eax
1519 1510 jmp copystr_out
1520 1511
1521 1512 copystr_null:
1522 1513 xorl %eax, %eax /* no error */
1523 1514
1524 1515 copystr_out:
1525 1516 cmpq $0, %rcx /* want length? */
1526 1517 je copystr_smap /* no */
1527 1518 subq %r8, %rdx /* compute length and store it */
1528 1519 movq %rdx, (%rcx)
1529 1520
1530 1521 copystr_smap:
1531 1522 cmpl $0, %r10d
1532 1523 jz copystr_done
1533 1524 SMAP_ENABLE_INSTR(7)
1534 1525
1535 1526 copystr_done:
1536 1527 movq %r11, T_LOFAULT(%r9) /* restore the original lofault */
1537 1528 leave
1538 1529 ret
1539 1530 SET_SIZE(copystr)
1540 1531
1541 1532 /*
1542 1533 * Copy a null terminated string from the user address space into
1543 1534 * the kernel address space.
1544 1535 */
1545 1536
1546 1537 ENTRY(copyinstr)
1547 1538 pushq %rbp
1548 1539 movq %rsp, %rbp
1549 1540 subq $32, %rsp
1550 1541
1551 1542 /*
1552 1543 * save args in case we trap and need to rerun as a copyop
1553 1544 */
1554 1545 movq %rdi, (%rsp)
1555 1546 movq %rsi, 0x8(%rsp)
1556 1547 movq %rdx, 0x10(%rsp)
1557 1548 movq %rcx, 0x18(%rsp)
1558 1549
1559 1550 movq kernelbase(%rip), %rax
1560 1551 #ifdef DEBUG
1561 1552 cmpq %rax, %rsi /* %rsi = kaddr */
1562 1553 jnb 1f
1563 1554 leaq .copyinstr_panic_msg(%rip), %rdi
1564 1555 xorl %eax, %eax
1565 1556 call panic
1566 1557 1:
1567 1558 #endif
1568 1559 /*
1569 1560 * pass lofault value as 5th argument to do_copystr
1570 1561 * do_copystr expects whether or not we need smap in %r10d
1571 1562 */
1572 1563 leaq _copyinstr_error(%rip), %r8
1573 1564 movl $1, %r10d
1574 1565
1575 1566 cmpq %rax, %rdi /* test uaddr < kernelbase */
1576 1567 jae 4f
1577 1568 SMAP_DISABLE_INSTR(6)
1578 1569 jmp do_copystr
1579 1570 4:
1580 1571 movq %gs:CPU_THREAD, %r9
1581 1572 jmp 3f
1582 1573
1583 1574 _copyinstr_error:
1584 1575 SMAP_ENABLE_INSTR(8)
1585 1576 movq %r11, T_LOFAULT(%r9) /* restore original lofault */
1586 1577 3:
1587 1578 movq T_COPYOPS(%r9), %rax
1588 1579 cmpq $0, %rax
1589 1580 jz 2f
1590 1581
1591 1582 /*
1592 1583 * reload args for the copyop
1593 1584 */
1594 1585 movq (%rsp), %rdi
1595 1586 movq 0x8(%rsp), %rsi
1596 1587 movq 0x10(%rsp), %rdx
1597 1588 movq 0x18(%rsp), %rcx
1598 1589 leave
1599 1590 movq CP_COPYINSTR(%rax), %rax
1600 1591 INDIRECT_JMP_REG(rax)
1601 1592
1602 1593 2: movl $EFAULT, %eax /* return EFAULT */
1603 1594 leave
1604 1595 ret
1605 1596 SET_SIZE(copyinstr)
1606 1597
1607 1598 /*
1608 1599 * Copy a null terminated string from the kernel
1609 1600 * address space to the user address space.
1610 1601 */
1611 1602
1612 1603 ENTRY(copyoutstr)
1613 1604 pushq %rbp
1614 1605 movq %rsp, %rbp
1615 1606 subq $32, %rsp
1616 1607
1617 1608 /*
1618 1609 * save args in case we trap and need to rerun as a copyop
1619 1610 */
1620 1611 movq %rdi, (%rsp)
1621 1612 movq %rsi, 0x8(%rsp)
1622 1613 movq %rdx, 0x10(%rsp)
1623 1614 movq %rcx, 0x18(%rsp)
1624 1615
1625 1616 movq kernelbase(%rip), %rax
1626 1617 #ifdef DEBUG
1627 1618 cmpq %rax, %rdi /* %rdi = kaddr */
1628 1619 jnb 1f
1629 1620 leaq .copyoutstr_panic_msg(%rip), %rdi
1630 1621 jmp call_panic /* setup stack and call panic */
1631 1622 1:
1632 1623 #endif
1633 1624 /*
1634 1625 * pass lofault value as 5th argument to do_copystr
1635 1626 * pass one as 6th argument to do_copystr in %r10d
1636 1627 */
1637 1628 leaq _copyoutstr_error(%rip), %r8
1638 1629 movl $1, %r10d
1639 1630
1640 1631 cmpq %rax, %rsi /* test uaddr < kernelbase */
1641 1632 jae 4f
1642 1633 SMAP_DISABLE_INSTR(7)
1643 1634 jmp do_copystr
1644 1635 4:
1645 1636 movq %gs:CPU_THREAD, %r9
1646 1637 jmp 3f
1647 1638
1648 1639 _copyoutstr_error:
1649 1640 SMAP_ENABLE_INSTR(9)
1650 1641 movq %r11, T_LOFAULT(%r9) /* restore the original lofault */
1651 1642 3:
1652 1643 movq T_COPYOPS(%r9), %rax
1653 1644 cmpq $0, %rax
1654 1645 jz 2f
1655 1646
1656 1647 /*
1657 1648 * reload args for the copyop
1658 1649 */
1659 1650 movq (%rsp), %rdi
1660 1651 movq 0x8(%rsp), %rsi
1661 1652 movq 0x10(%rsp), %rdx
1662 1653 movq 0x18(%rsp), %rcx
1663 1654 leave
1664 1655 movq CP_COPYOUTSTR(%rax), %rax
1665 1656 INDIRECT_JMP_REG(rax)
1666 1657
1667 1658 2: movl $EFAULT, %eax /* return EFAULT */
1668 1659 leave
1669 1660 ret
1670 1661 SET_SIZE(copyoutstr)
1671 1662
1672 1663 /*
1673 1664 * Since all of the fuword() variants are so similar, we have a macro to spit
1674 1665 * them out. This allows us to create DTrace-unobservable functions easily.
1675 1666 */
1676 1667
1677 1668 /*
1678 1669 * Note that we don't save and reload the arguments here
1679 1670 * because their values are not altered in the copy path.
1680 1671 * Additionally, when successful, the smap_enable jmp will
1681 1672 * actually return us to our original caller.
1682 1673 */
1683 1674
1684 1675 #define FUWORD(NAME, INSTR, REG, COPYOP, DISNUM, EN1, EN2) \
1685 1676 ENTRY(NAME) \
1686 1677 movq %gs:CPU_THREAD, %r9; \
1687 1678 cmpq kernelbase(%rip), %rdi; \
1688 1679 jae 1f; \
1689 1680 leaq _flt_/**/NAME, %rdx; \
1690 1681 movq %rdx, T_LOFAULT(%r9); \
1691 1682 SMAP_DISABLE_INSTR(DISNUM) \
1692 1683 INSTR (%rdi), REG; \
1693 1684 movq $0, T_LOFAULT(%r9); \
1694 1685 INSTR REG, (%rsi); \
1695 1686 xorl %eax, %eax; \
1696 1687 SMAP_ENABLE_INSTR(EN1) \
1697 1688 ret; \
1698 1689 _flt_/**/NAME: \
1699 1690 SMAP_ENABLE_INSTR(EN2) \
1700 1691 movq $0, T_LOFAULT(%r9); \
1701 1692 1: \
1702 1693 movq T_COPYOPS(%r9), %rax; \
1703 1694 cmpq $0, %rax; \
1704 1695 jz 2f; \
1705 1696 movq COPYOP(%rax), %rax; \
1706 1697 INDIRECT_JMP_REG(rax); \
1707 1698 2: \
1708 1699 movl $-1, %eax; \
1709 1700 ret; \
1710 1701 SET_SIZE(NAME)
1711 1702
1712 1703 FUWORD(fuword64, movq, %rax, CP_FUWORD64,8,10,11)
1713 1704 FUWORD(fuword32, movl, %eax, CP_FUWORD32,9,12,13)
1714 1705 FUWORD(fuword16, movw, %ax, CP_FUWORD16,10,14,15)
1715 1706 FUWORD(fuword8, movb, %al, CP_FUWORD8,11,16,17)
1716 1707
1717 1708 #undef FUWORD
1718 1709
1719 1710 /*
1720 1711 * Set user word.
1721 1712 */
1722 1713
1723 1714 /*
1724 1715 * Note that we don't save and reload the arguments here
1725 1716 * because their values are not altered in the copy path.
1726 1717 */
1727 1718
1728 1719 #define SUWORD(NAME, INSTR, REG, COPYOP, DISNUM, EN1, EN2) \
1729 1720 ENTRY(NAME) \
1730 1721 movq %gs:CPU_THREAD, %r9; \
1731 1722 cmpq kernelbase(%rip), %rdi; \
1732 1723 jae 1f; \
1733 1724 leaq _flt_/**/NAME, %rdx; \
1734 1725 SMAP_DISABLE_INSTR(DISNUM) \
1735 1726 movq %rdx, T_LOFAULT(%r9); \
1736 1727 INSTR REG, (%rdi); \
1737 1728 movq $0, T_LOFAULT(%r9); \
1738 1729 xorl %eax, %eax; \
1739 1730 SMAP_ENABLE_INSTR(EN1) \
1740 1731 ret; \
1741 1732 _flt_/**/NAME: \
1742 1733 SMAP_ENABLE_INSTR(EN2) \
1743 1734 movq $0, T_LOFAULT(%r9); \
1744 1735 1: \
1745 1736 movq T_COPYOPS(%r9), %rax; \
1746 1737 cmpq $0, %rax; \
1747 1738 jz 3f; \
1748 1739 movq COPYOP(%rax), %rax; \
1749 1740 INDIRECT_JMP_REG(rax); \
1750 1741 3: \
1751 1742 movl $-1, %eax; \
1752 1743 ret; \
1753 1744 SET_SIZE(NAME)
1754 1745
1755 1746 SUWORD(suword64, movq, %rsi, CP_SUWORD64,12,18,19)
1756 1747 SUWORD(suword32, movl, %esi, CP_SUWORD32,13,20,21)
1757 1748 SUWORD(suword16, movw, %si, CP_SUWORD16,14,22,23)
1758 1749 SUWORD(suword8, movb, %sil, CP_SUWORD8,15,24,25)
1759 1750
1760 1751 #undef SUWORD
1761 1752
1762 1753 #define FUWORD_NOERR(NAME, INSTR, REG) \
1763 1754 ENTRY(NAME) \
1764 1755 cmpq kernelbase(%rip), %rdi; \
1765 1756 cmovnbq kernelbase(%rip), %rdi; \
1766 1757 INSTR (%rdi), REG; \
1767 1758 INSTR REG, (%rsi); \
1768 1759 ret; \
1769 1760 SET_SIZE(NAME)
1770 1761
1771 1762 FUWORD_NOERR(fuword64_noerr, movq, %rax)
1772 1763 FUWORD_NOERR(fuword32_noerr, movl, %eax)
1773 1764 FUWORD_NOERR(fuword16_noerr, movw, %ax)
1774 1765 FUWORD_NOERR(fuword8_noerr, movb, %al)
1775 1766
1776 1767 #undef FUWORD_NOERR
1777 1768
1778 1769 #define SUWORD_NOERR(NAME, INSTR, REG) \
1779 1770 ENTRY(NAME) \
1780 1771 cmpq kernelbase(%rip), %rdi; \
1781 1772 cmovnbq kernelbase(%rip), %rdi; \
1782 1773 INSTR REG, (%rdi); \
1783 1774 ret; \
1784 1775 SET_SIZE(NAME)
1785 1776
1786 1777 SUWORD_NOERR(suword64_noerr, movq, %rsi)
1787 1778 SUWORD_NOERR(suword32_noerr, movl, %esi)
1788 1779 SUWORD_NOERR(suword16_noerr, movw, %si)
1789 1780 SUWORD_NOERR(suword8_noerr, movb, %sil)
1790 1781
1791 1782 #undef SUWORD_NOERR
1792 1783
1793 1784
1794 1785 .weak subyte
1795 1786 subyte=suword8
1796 1787 .weak subyte_noerr
1797 1788 subyte_noerr=suword8_noerr
1798 1789
1799 1790 .weak fulword
1800 1791 fulword=fuword64
1801 1792 .weak fulword_noerr
1802 1793 fulword_noerr=fuword64_noerr
1803 1794 .weak sulword
1804 1795 sulword=suword64
1805 1796 .weak sulword_noerr
1806 1797 sulword_noerr=suword64_noerr
1807 1798
1808 1799 ENTRY(copyin_noerr)
1809 1800 movq kernelbase(%rip), %rax
1810 1801 #ifdef DEBUG
1811 1802 cmpq %rax, %rsi /* %rsi = kto */
1812 1803 jae 1f
1813 1804 leaq .cpyin_ne_pmsg(%rip), %rdi
1814 1805 jmp call_panic /* setup stack and call panic */
1815 1806 1:
1816 1807 #endif
1817 1808 cmpq %rax, %rdi /* ufrom < kernelbase */
1818 1809 jb do_copy
1819 1810 movq %rax, %rdi /* force fault at kernelbase */
1820 1811 jmp do_copy
1821 1812 SET_SIZE(copyin_noerr)
1822 1813
1823 1814 ENTRY(copyout_noerr)
1824 1815 movq kernelbase(%rip), %rax
1825 1816 #ifdef DEBUG
1826 1817 cmpq %rax, %rdi /* %rdi = kfrom */
1827 1818 jae 1f
1828 1819 leaq .cpyout_ne_pmsg(%rip), %rdi
1829 1820 jmp call_panic /* setup stack and call panic */
1830 1821 1:
1831 1822 #endif
1832 1823 cmpq %rax, %rsi /* uto < kernelbase */
1833 1824 jb do_copy
1834 1825 movq %rax, %rsi /* force fault at kernelbase */
1835 1826 jmp do_copy
1836 1827 SET_SIZE(copyout_noerr)
1837 1828
1838 1829 ENTRY(uzero)
1839 1830 movq kernelbase(%rip), %rax
1840 1831 cmpq %rax, %rdi
1841 1832 jb do_zero
1842 1833 movq %rax, %rdi /* force fault at kernelbase */
1843 1834 jmp do_zero
1844 1835 SET_SIZE(uzero)
1845 1836
1846 1837 ENTRY(ucopy)
1847 1838 movq kernelbase(%rip), %rax
1848 1839 cmpq %rax, %rdi
1849 1840 cmovaeq %rax, %rdi /* force fault at kernelbase */
1850 1841 cmpq %rax, %rsi
1851 1842 cmovaeq %rax, %rsi /* force fault at kernelbase */
1852 1843 jmp do_copy
1853 1844 SET_SIZE(ucopy)
1854 1845
1855 1846 /*
1856 1847 * Note, the frame pointer is required here becuase do_copystr expects
1857 1848 * to be able to pop it off!
1858 1849 */
1859 1850 ENTRY(ucopystr)
1860 1851 pushq %rbp
1861 1852 movq %rsp, %rbp
1862 1853 movq kernelbase(%rip), %rax
1863 1854 cmpq %rax, %rdi
1864 1855 cmovaeq %rax, %rdi /* force fault at kernelbase */
1865 1856 cmpq %rax, %rsi
1866 1857 cmovaeq %rax, %rsi /* force fault at kernelbase */
1867 1858 /* do_copystr expects lofault address in %r8 */
1868 1859 /* do_copystr expects whether or not we need smap in %r10 */
1869 1860 xorl %r10d, %r10d
1870 1861 movq %gs:CPU_THREAD, %r8
1871 1862 movq T_LOFAULT(%r8), %r8
1872 1863 jmp do_copystr
1873 1864 SET_SIZE(ucopystr)
1874 1865
1875 1866 #ifdef DEBUG
1876 1867 .data
1877 1868 .kcopy_panic_msg:
1878 1869 .string "kcopy: arguments below kernelbase"
1879 1870 .bcopy_panic_msg:
1880 1871 .string "bcopy: arguments below kernelbase"
1881 1872 .kzero_panic_msg:
1882 1873 .string "kzero: arguments below kernelbase"
1883 1874 .bzero_panic_msg:
1884 1875 .string "bzero: arguments below kernelbase"
1885 1876 .copyin_panic_msg:
1886 1877 .string "copyin: kaddr argument below kernelbase"
1887 1878 .xcopyin_panic_msg:
1888 1879 .string "xcopyin: kaddr argument below kernelbase"
1889 1880 .copyout_panic_msg:
1890 1881 .string "copyout: kaddr argument below kernelbase"
1891 1882 .xcopyout_panic_msg:
1892 1883 .string "xcopyout: kaddr argument below kernelbase"
1893 1884 .copystr_panic_msg:
1894 1885 .string "copystr: arguments in user space"
↓ open down ↓ |
1774 lines elided |
↑ open up ↑ |
1895 1886 .copyinstr_panic_msg:
1896 1887 .string "copyinstr: kaddr argument not in kernel address space"
1897 1888 .copyoutstr_panic_msg:
1898 1889 .string "copyoutstr: kaddr argument not in kernel address space"
1899 1890 .cpyin_ne_pmsg:
1900 1891 .string "copyin_noerr: argument not in kernel address space"
1901 1892 .cpyout_ne_pmsg:
1902 1893 .string "copyout_noerr: argument not in kernel address space"
1903 1894 #endif
1904 1895
1905 -/*
1906 - * These functions are used for SMAP, supervisor mode access protection. They
1907 - * are hotpatched to become real instructions when the system starts up which is
1908 - * done in mlsetup() as a part of enabling the other CR4 related features.
1909 - *
1910 - * Generally speaking, smap_disable() is a stac instruction and smap_enable is a
1911 - * clac instruction. It's safe to call these any number of times, and in fact,
1912 - * out of paranoia, the kernel will likely call it at several points.
1913 - */
1914 -
1915 - ENTRY(smap_disable)
1916 - nop
1917 - nop
1918 - nop
1919 - ret
1920 - SET_SIZE(smap_disable)
1921 -
1922 - ENTRY(smap_enable)
1923 - nop
1924 - nop
1925 - nop
1926 - ret
1927 - SET_SIZE(smap_enable)
1928 -
1929 1896 .data
1930 1897 .align 4
1931 1898 .globl _smap_enable_patch_count
1932 1899 .type _smap_enable_patch_count,@object
1933 1900 .size _smap_enable_patch_count, 4
1934 1901 _smap_enable_patch_count:
1935 1902 .long SMAP_ENABLE_COUNT
1936 1903
1937 1904 .globl _smap_disable_patch_count
1938 1905 .type _smap_disable_patch_count,@object
1939 1906 .size _smap_disable_patch_count, 4
1940 1907 _smap_disable_patch_count:
1941 1908 .long SMAP_DISABLE_COUNT
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX