Print this page
11787 Kernel needs to be built with retpolines
11788 Kernel needs to generally use RSB stuffing
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: John Levon <john.levon@joyent.com>

*** 895,904 **** --- 895,1179 ---- * topology information, etc. Some of these subsystems include processor groups * (uts/common/os/pg.c.), CPU Module Interface (uts/i86pc/os/cmi.c), ACPI, * microcode, and performance monitoring. These functions all ASSERT that the * CPU they're being called on has reached a certain cpuid pass. If the passes * are rearranged, then this needs to be adjusted. + * + * ----------------------------------------------- + * Speculative Execution CPU Side Channel Security + * ----------------------------------------------- + * + * With the advent of the Spectre and Meltdown attacks which exploit speculative + * execution in the CPU to create side channels there have been a number of + * different attacks and corresponding issues that the operating system needs to + * mitigate against. The following list is some of the common, but not + * exhaustive, set of issues that we know about and have done some or need to do + * more work in the system to mitigate against: + * + * - Spectre v1 + * - Spectre v2 + * - Meltdown (Spectre v3) + * - Rogue Register Read (Spectre v3a) + * - Speculative Store Bypass (Spectre v4) + * - ret2spec, SpectreRSB + * - L1 Terminal Fault (L1TF) + * - Microarchitectural Data Sampling (MDS) + * + * Each of these requires different sets of mitigations and has different attack + * surfaces. For the most part, this discussion is about protecting the kernel + * from non-kernel executing environments such as user processes and hardware + * virtual machines. Unfortunately, there are a number of user vs. user + * scenarios that exist with these. The rest of this section will describe the + * overall approach that the system has taken to address these as well as their + * shortcomings. Unfortunately, not all of the above have been handled today. + * + * SPECTRE FAMILY (Spectre v2, ret2spec, SpectreRSB) + * + * The second variant of the spectre attack focuses on performing branch target + * injection. This generally impacts indirect call instructions in the system. + * There are three different ways to mitigate this issue that are commonly + * described today: + * + * 1. Using Indirect Branch Restricted Speculation (IBRS). + * 2. Using Retpolines and RSB Stuffing + * 3. Using Enhanced Indirect Branch Restricted Speculation (EIBRS) + * + * IBRS uses a feature added to microcode to restrict speculation, among other + * things. This form of mitigation has not been used as it has been generally + * seen as too expensive and requires reactivation upon various transitions in + * the system. + * + * As a less impactful alternative to IBRS, retpolines were developed by + * Google. These basically require one to replace indirect calls with a specific + * trampoline that will cause speculation to fail and break the attack. + * Retpolines require compiler support. We always build with retpolines in the + * external thunk mode. This means that a traditional indirect call is replaced + * with a call to one of the __x86_indirect_thunk_<reg> functions. A side effect + * of this is that all indirect function calls are performed through a register. + * + * We have to use a common external location of the thunk and not inline it into + * the callsite so that way we can have a single place to patch these functions. + * As it turns out, we actually have three different forms of retpolines that + * exist in the system: + * + * 1. A full retpoline + * 2. An AMD-specific optimized retpoline + * 3. A no-op version + * + * The first one is used in the general case. The second one is used if we can + * determine that we're on an AMD system and we can successfully toggle the + * lfence serializing MSR that exists on the platform. Basically with this + * present, an lfence is sufficient and we don't need to do anywhere near as + * complicated a dance to successfully use retpolines. + * + * The third form described above is the most curious. It turns out that the way + * that retpolines are implemented is that they rely on how speculation is + * performed on a 'ret' instruction. Intel has continued to optimize this + * process (which is partly why we need to have return stack buffer stuffing, + * but more on that in a bit) and in processors starting with Cascade Lake + * on the server side, it's dangerous to rely on retpolines. Instead, a new + * mechanism has been introduced called Enhanced IBRS (EIBRS). + * + * Unlike IBRS, EIBRS is designed to be enabled once at boot and left on each + * physical core. However, if this is the case, we don't want to use retpolines + * any more. Therefore if EIBRS is present, we end up turning each retpoline + * function (called a thunk) into a jmp instruction. This means that we're still + * paying the cost of an extra jump to the external thunk, but it gives us + * flexibility and the ability to have a single kernel image that works across a + * wide variety of systems and hardware features. + * + * Unfortunately, this alone is insufficient. First, Skylake systems have + * additional speculation for the Return Stack Buffer (RSB) which is used to + * return from call instructions which retpolines take advantage of. However, + * this problem is not just limited to Skylake and is actually more pernicious. + * The SpectreRSB paper introduces several more problems that can arise with + * dealing with this. The RSB can be poisoned just like the indirect branch + * predictor. This means that one needs to clear the RSB when transitioning + * between two different privilege domains. Some examples include: + * + * - Switching between two different user processes + * - Going between user land and the kernel + * - Returning to the kernel from a hardware virtual machine + * + * Mitigating this involves combining a couple of different things. The first is + * SMEP (supervisor mode execution protection) which was introduced in Ivy + * Bridge. When an RSB entry refers to a user address and we're executing in the + * kernel, speculation through it will be stopped when SMEP is enabled. This + * protects against a number of the different cases that we would normally be + * worried about such as when we enter the kernel from user land. + * + * To prevent against additional manipulation of the RSB from other contexts + * such as a non-root VMX context attacking the kernel we first look to enhanced + * IBRS. When EIBRS is present and enabled, then there is nothing else that we + * need to do to protect the kernel at this time. + * + * On CPUs without EIBRS we need to manually overwrite the contents of the + * return stack buffer. We do this through the x86_rsb_stuff() function. + * Currently this is employed on context switch. The x86_rsb_stuff() function is + * disabled when enhanced IBRS is present because Intel claims on such systems + * it will be ineffective. Stuffing the RSB in context switch helps prevent user + * to user attacks via the RSB. + * + * If SMEP is not present, then we would have to stuff the RSB every time we + * transitioned from user mode to the kernel, which isn't very practical right + * now. + * + * To fully protect user to user and vmx to vmx attacks from these classes of + * issues, we would also need to allow them to opt into performing an Indirect + * Branch Prediction Barrier (IBPB) on switch. This is not currently wired up. + * + * By default, the system will enable RSB stuffing and the required variant of + * retpolines and store that information in the x86_spectrev2_mitigation value. + * This will be evaluated after a microcode update as well, though it is + * expected that microcode updates will not take away features. This may mean + * that a late loaded microcode may not end up in the optimal configuration + * (though this should be rare). + * + * Currently we do not build kmdb with retpolines or perform any additional side + * channel security mitigations for it. One complication with kmdb is that it + * requires its own retpoline thunks and it would need to adjust itself based on + * what the kernel does. The threat model of kmdb is more limited and therefore + * it may make more sense to investigate using prediction barriers as the whole + * system is only executing a single instruction at a time while in kmdb. + * + * SPECTRE FAMILY (v1, v4) + * + * The v1 and v4 variants of spectre are not currently mitigated in the + * system and require other classes of changes to occur in the code. + * + * MELTDOWN + * + * Meltdown, or spectre v3, allowed a user process to read any data in their + * address space regardless of whether or not the page tables in question + * allowed the user to have the ability to read them. The solution to meltdown + * is kernel page table isolation. In this world, there are two page tables that + * are used for a process, one in user land and one in the kernel. To implement + * this we use per-CPU page tables and switch between the user and kernel + * variants when entering and exiting the kernel. For more information about + * this process and how the trampolines work, please see the big theory + * statements and additional comments in: + * + * - uts/i86pc/ml/kpti_trampolines.s + * - uts/i86pc/vm/hat_i86.c + * + * While Meltdown only impacted Intel systems and there are also Intel systems + * that have Meltdown fixed (called Rogue Data Cache Load), we always have + * kernel page table isolation enabled. While this may at first seem weird, an + * important thing to remember is that you can't speculatively read an address + * if it's never in your page table at all. Having user processes without kernel + * pages present provides us with an important layer of defense in the kernel + * against any other side channel attacks that exist and have yet to be + * discovered. As such, kernel page table isolation (KPTI) is always enabled by + * default, no matter the x86 system. + * + * L1 TERMINAL FAULT + * + * L1 Terminal Fault (L1TF) takes advantage of an issue in how speculative + * execution uses page table entries. Effectively, it is two different problems. + * The first is that it ignores the not present bit in the page table entries + * when performing speculative execution. This means that something can + * speculatively read the listed physical address if it's present in the L1 + * cache under certain conditions (see Intel's documentation for the full set of + * conditions). Secondly, this can be used to bypass hardware virtualization + * extended page tables (EPT) that are part of Intel's hardware virtual machine + * instructions. + * + * For the non-hardware virtualized case, this is relatively easy to deal with. + * We must make sure that all unmapped pages have an address of zero. This means + * that they could read the first 4k of physical memory; however, we never use + * that first page in the operating system and always skip putting it in our + * memory map, even if firmware tells us we can use it in our memory map. While + * other systems try to put extra metadata in the address and reserved bits, + * which led to this being problematic in those cases, we do not. + * + * For hardware virtual machines things are more complicated. Because they can + * construct their own page tables, it isn't hard for them to perform this + * attack against any physical address. The one wrinkle is that this physical + * address must be in the L1 data cache. Thus Intel added an MSR that we can use + * to flush the L1 data cache. We wrap this up in the function + * spec_uarch_flush(). This function is also used in the mitigation of + * microarchitectural data sampling (MDS) discussed later on. Kernel based + * hypervisors such as KVM or bhyve are responsible for performing this before + * entering the guest. + * + * Because this attack takes place in the L1 cache, there's another wrinkle + * here. The L1 cache is shared between all logical CPUs in a core in most Intel + * designs. This means that when a thread enters a hardware virtualized context + * and flushes the L1 data cache, the other thread on the processor may then go + * ahead and put new data in it that can be potentially attacked. While one + * solution is to disable SMT on the system, another option that is available is + * to use a feature for hardware virtualization called 'SMT exclusion'. This + * goes through and makes sure that if a HVM is being scheduled on one thread, + * then the thing on the other thread is from the same hardware virtual machine. + * If an interrupt comes in or the guest exits to the broader system, then the + * other SMT thread will be kicked out. + * + * L1TF can be fully mitigated by hardware. If the RDCL_NO feature is set in the + * architecture capabilities MSR (MSR_IA32_ARCH_CAPABILITIES), then we will not + * perform L1TF related mitigations. + * + * MICROARCHITECTURAL DATA SAMPLING + * + * Microarchitectural data sampling (MDS) is a combination of four discrete + * vulnerabilities that are similar issues affecting various parts of the CPU's + * microarchitectural implementation around load, store, and fill buffers. + * Specifically it is made up of the following subcomponents: + * + * 1. Microarchitectural Store Buffer Data Sampling (MSBDS) + * 2. Microarchitectural Fill Buffer Data Sampling (MFBDS) + * 3. Microarchitectural Load Port Data Sampling (MLPDS) + * 4. Microarchitectural Data Sampling Uncacheable Memory (MDSUM) + * + * To begin addressing these, Intel has introduced another feature in microcode + * called MD_CLEAR. This changes the verw instruction to operate in a different + * way. This allows us to execute the verw instruction in a particular way to + * flush the state of the affected parts. The L1TF L1D flush mechanism is also + * updated when this microcode is present to flush this state. + * + * Primarily we need to flush this state whenever we transition from the kernel + * to a less privileged context such as user mode or an HVM guest. MSBDS is a + * little bit different. Here the structures are statically sized when a logical + * CPU is in use and resized when it goes to sleep. Therefore, we also need to + * flush the microarchitectural state before the CPU goes idles by calling hlt, + * mwait, or another ACPI method. To perform these flushes, we call + * x86_md_clear() at all of these transition points. + * + * If hardware enumerates RDCL_NO, indicating that it is not vulnerable to L1TF, + * then we change the spec_uarch_flush() function to point to x86_md_clear(). If + * MDS_NO has been set, then this is fully mitigated and x86_md_clear() becomes + * a no-op. + * + * Unfortunately, with this issue hyperthreading rears its ugly head. In + * particular, everything we've discussed above is only valid for a single + * thread executing on a core. In the case where you have hyper-threading + * present, this attack can be performed between threads. The theoretical fix + * for this is to ensure that both threads are always in the same security + * domain. This means that they are executing in the same ring and mutually + * trust each other. Practically speaking, this would mean that a system call + * would have to issue an inter-processor interrupt (IPI) to the other thread. + * Rather than implement this, we recommend that one disables hyper-threading + * through the use of psradm -aS. + * + * SUMMARY + * + * The following table attempts to summarize the mitigations for various issues + * and what's done in various places: + * + * - Spectre v1: Not currently mitigated + * - Spectre v2: Retpolines/RSB Stuffing or EIBRS if HW support + * - Meltdown: Kernel Page Table Isolation + * - Spectre v3a: Updated CPU microcode + * - Spectre v4: Not currently mitigated + * - SpectreRSB: SMEP and RSB Stuffing + * - L1TF: spec_uarch_flush, smt exclusion, requires microcode + * - MDS: x86_md_clear, requires microcode, disabling hyper threading + * + * The following table indicates the x86 feature set bits that indicate that a + * given problem has been solved or a notable feature is present: + * + * - RDCL_NO: Meltdown, L1TF, MSBDS subset of MDS + * - MDS_NO: All forms of MDS */ #include <sys/types.h> #include <sys/archsystm.h> #include <sys/x86_archext.h>
*** 919,928 **** --- 1194,1205 ---- #include <sys/pci_cfgspace.h> #include <sys/comm_page.h> #include <sys/mach_mmu.h> #include <sys/ucode.h> #include <sys/tsc.h> + #include <sys/kobj.h> + #include <sys/asm_misc.h> #ifdef __xpv #include <sys/hypervisor.h> #else #include <sys/ontrap.h>
*** 938,947 **** --- 1215,1235 ---- #else int x86_use_pcid = -1; int x86_use_invpcid = -1; #endif + typedef enum { + X86_SPECTREV2_RETPOLINE, + X86_SPECTREV2_RETPOLINE_AMD, + X86_SPECTREV2_ENHANCED_IBRS, + X86_SPECTREV2_DISABLED + } x86_spectrev2_mitigation_t; + + uint_t x86_disable_spectrev2 = 0; + static x86_spectrev2_mitigation_t x86_spectrev2_mitigation = + X86_SPECTREV2_RETPOLINE; + uint_t pentiumpro_bug4046376; uchar_t x86_featureset[BT_SIZEOFMAP(NUM_X86_FEATURES)]; static char *x86_feature_names[NUM_X86_FEATURES] = {
*** 2168,2179 **** * have a processor that is vulnerable to MDS, but is not vulnerable to L1TF * (RDCL_NO is set). */ void (*spec_uarch_flush)(void) = spec_uarch_flush_noop; - void (*x86_md_clear)(void) = x86_md_clear_noop; - static void cpuid_update_md_clear(cpu_t *cpu, uchar_t *featureset) { struct cpuid_info *cpi = cpu->cpu_m.mcpu_cpi; --- 2456,2465 ----
*** 2183,2199 **** * MDS. Therefore we can only rely on MDS_NO to determine that we don't * need to mitigate this. */ if (cpi->cpi_vendor != X86_VENDOR_Intel || is_x86_feature(featureset, X86FSET_MDS_NO)) { - x86_md_clear = x86_md_clear_noop; - membar_producer(); return; } if (is_x86_feature(featureset, X86FSET_MD_CLEAR)) { ! x86_md_clear = x86_md_clear_verw; } membar_producer(); } --- 2469,2486 ---- * MDS. Therefore we can only rely on MDS_NO to determine that we don't * need to mitigate this. */ if (cpi->cpi_vendor != X86_VENDOR_Intel || is_x86_feature(featureset, X86FSET_MDS_NO)) { return; } if (is_x86_feature(featureset, X86FSET_MD_CLEAR)) { ! const uint8_t nop = NOP_INSTR; ! uint8_t *md = (uint8_t *)x86_md_clear; ! ! *md = nop; } membar_producer(); }
*** 2253,2287 **** spec_uarch_flush = spec_uarch_flush_noop; } membar_producer(); } static void cpuid_scan_security(cpu_t *cpu, uchar_t *featureset) { struct cpuid_info *cpi = cpu->cpu_m.mcpu_cpi; if (cpi->cpi_vendor == X86_VENDOR_AMD && cpi->cpi_xmaxeax >= CPUID_LEAF_EXT_8) { if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_IBPB) add_x86_feature(featureset, X86FSET_IBPB); if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_IBRS) add_x86_feature(featureset, X86FSET_IBRS); if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_STIBP) add_x86_feature(featureset, X86FSET_STIBP); - if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_IBRS_ALL) - add_x86_feature(featureset, X86FSET_IBRS_ALL); if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_STIBP_ALL) add_x86_feature(featureset, X86FSET_STIBP_ALL); - if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_PREFER_IBRS) - add_x86_feature(featureset, X86FSET_RSBA); if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_SSBD) add_x86_feature(featureset, X86FSET_SSBD); if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_VIRT_SSBD) add_x86_feature(featureset, X86FSET_SSBD_VIRT); if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_SSB_NO) add_x86_feature(featureset, X86FSET_SSB_NO); } else if (cpi->cpi_vendor == X86_VENDOR_Intel && cpi->cpi_maxeax >= 7) { struct cpuid_regs *ecp; ecp = &cpi->cpi_std[7]; --- 2540,2707 ---- spec_uarch_flush = spec_uarch_flush_noop; } membar_producer(); } + /* + * We default to enabling RSB mitigations. + */ static void + cpuid_patch_rsb(x86_spectrev2_mitigation_t mit) + { + const uint8_t ret = RET_INSTR; + uint8_t *stuff = (uint8_t *)x86_rsb_stuff; + + switch (mit) { + case X86_SPECTREV2_ENHANCED_IBRS: + case X86_SPECTREV2_DISABLED: + *stuff = ret; + break; + default: + break; + } + } + + static void + cpuid_patch_retpolines(x86_spectrev2_mitigation_t mit) + { + const char *thunks[] = { "_rax", "_rbx", "_rcx", "_rdx", "_rdi", + "_rsi", "_rbp", "_r8", "_r9", "_r10", "_r11", "_r12", "_r13", + "_r14", "_r15" }; + const uint_t nthunks = ARRAY_SIZE(thunks); + const char *type; + uint_t i; + + if (mit == x86_spectrev2_mitigation) + return; + + switch (mit) { + case X86_SPECTREV2_RETPOLINE: + type = "gen"; + break; + case X86_SPECTREV2_RETPOLINE_AMD: + type = "amd"; + break; + case X86_SPECTREV2_ENHANCED_IBRS: + case X86_SPECTREV2_DISABLED: + type = "jmp"; + break; + default: + panic("asked to updated retpoline state with unknown state!"); + } + + for (i = 0; i < nthunks; i++) { + uintptr_t source, dest; + int ssize, dsize; + char sourcebuf[64], destbuf[64]; + size_t len; + + (void) snprintf(destbuf, sizeof (destbuf), + "__x86_indirect_thunk%s", thunks[i]); + (void) snprintf(sourcebuf, sizeof (sourcebuf), + "__x86_indirect_thunk_%s%s", type, thunks[i]); + + source = kobj_getelfsym(sourcebuf, NULL, &ssize); + dest = kobj_getelfsym(destbuf, NULL, &dsize); + VERIFY3U(source, !=, 0); + VERIFY3U(dest, !=, 0); + VERIFY3S(dsize, >=, ssize); + bcopy((void *)source, (void *)dest, ssize); + } + } + + static void + cpuid_enable_enhanced_ibrs(void) + { + uint64_t val; + + val = rdmsr(MSR_IA32_SPEC_CTRL); + val |= IA32_SPEC_CTRL_IBRS; + wrmsr(MSR_IA32_SPEC_CTRL, val); + } + + #ifndef __xpv + /* + * Determine whether or not we can use the AMD optimized retpoline + * functionality. We use this when we know we're on an AMD system and we can + * successfully verify that lfence is dispatch serializing. + */ + static boolean_t + cpuid_use_amd_retpoline(struct cpuid_info *cpi) + { + uint64_t val; + on_trap_data_t otd; + + if (cpi->cpi_vendor != X86_VENDOR_AMD) + return (B_FALSE); + + /* + * We need to determine whether or not lfence is serializing. It always + * is on families 0xf and 0x11. On others, it's controlled by + * MSR_AMD_DECODE_CONFIG (MSRC001_1029). If some hypervisor gives us a + * crazy old family, don't try and do anything. + */ + if (cpi->cpi_family < 0xf) + return (B_FALSE); + if (cpi->cpi_family == 0xf || cpi->cpi_family == 0x11) + return (B_TRUE); + + /* + * While it may be tempting to use get_hwenv(), there are no promises + * that a hypervisor will actually declare themselves to be so in a + * friendly way. As such, try to read and set the MSR. If we can then + * read back the value we set (it wasn't just set to zero), then we go + * for it. + */ + if (!on_trap(&otd, OT_DATA_ACCESS)) { + val = rdmsr(MSR_AMD_DECODE_CONFIG); + val |= AMD_DECODE_CONFIG_LFENCE_DISPATCH; + wrmsr(MSR_AMD_DECODE_CONFIG, val); + val = rdmsr(MSR_AMD_DECODE_CONFIG); + } else { + val = 0; + } + no_trap(); + + if ((val & AMD_DECODE_CONFIG_LFENCE_DISPATCH) != 0) + return (B_TRUE); + return (B_FALSE); + } + #endif /* !__xpv */ + + static void cpuid_scan_security(cpu_t *cpu, uchar_t *featureset) { struct cpuid_info *cpi = cpu->cpu_m.mcpu_cpi; + x86_spectrev2_mitigation_t v2mit; if (cpi->cpi_vendor == X86_VENDOR_AMD && cpi->cpi_xmaxeax >= CPUID_LEAF_EXT_8) { if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_IBPB) add_x86_feature(featureset, X86FSET_IBPB); if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_IBRS) add_x86_feature(featureset, X86FSET_IBRS); if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_STIBP) add_x86_feature(featureset, X86FSET_STIBP); if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_STIBP_ALL) add_x86_feature(featureset, X86FSET_STIBP_ALL); if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_SSBD) add_x86_feature(featureset, X86FSET_SSBD); if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_VIRT_SSBD) add_x86_feature(featureset, X86FSET_SSBD_VIRT); if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_SSB_NO) add_x86_feature(featureset, X86FSET_SSB_NO); + /* + * Don't enable enhanced IBRS unless we're told that we should + * prefer it and it has the same semantics as Intel. This is + * split into two bits rather than a single one. + */ + if ((cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_PREFER_IBRS) && + (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_IBRS_ALL)) { + add_x86_feature(featureset, X86FSET_IBRS_ALL); + } + } else if (cpi->cpi_vendor == X86_VENDOR_Intel && cpi->cpi_maxeax >= 7) { struct cpuid_regs *ecp; ecp = &cpi->cpi_std[7];
*** 2347,2360 **** if (ecp->cp_edx & CPUID_INTC_EDX_7_0_FLUSH_CMD) add_x86_feature(featureset, X86FSET_FLUSH_CMD); } ! if (cpu->cpu_id != 0) return; /* * We need to determine what changes are required for mitigating L1TF * and MDS. If the CPU suffers from either of them, then SMT exclusion * is required. * * If any of these are present, then we need to flush u-arch state at --- 2767,2812 ---- if (ecp->cp_edx & CPUID_INTC_EDX_7_0_FLUSH_CMD) add_x86_feature(featureset, X86FSET_FLUSH_CMD); } ! if (cpu->cpu_id != 0) { ! if (x86_spectrev2_mitigation == X86_SPECTREV2_ENHANCED_IBRS) { ! cpuid_enable_enhanced_ibrs(); ! } return; + } /* + * Go through and initialize various security mechanisms that we should + * only do on a single CPU. This includes Spectre V2, L1TF, and MDS. + */ + + /* + * By default we've come in with retpolines enabled. Check whether we + * should disable them or enable enhanced IBRS. RSB stuffing is enabled + * by default, but disabled if we are using enhanced IBRS. + */ + if (x86_disable_spectrev2 != 0) { + v2mit = X86_SPECTREV2_DISABLED; + } else if (is_x86_feature(featureset, X86FSET_IBRS_ALL)) { + cpuid_enable_enhanced_ibrs(); + v2mit = X86_SPECTREV2_ENHANCED_IBRS; + #ifndef __xpv + } else if (cpuid_use_amd_retpoline(cpi)) { + v2mit = X86_SPECTREV2_RETPOLINE_AMD; + #endif /* !__xpv */ + } else { + v2mit = X86_SPECTREV2_RETPOLINE; + } + + cpuid_patch_retpolines(v2mit); + cpuid_patch_rsb(v2mit); + x86_spectrev2_mitigation = v2mit; + membar_producer(); + + /* * We need to determine what changes are required for mitigating L1TF * and MDS. If the CPU suffers from either of them, then SMT exclusion * is required. * * If any of these are present, then we need to flush u-arch state at
*** 6772,6783 **** --- 7224,7240 ---- /* ARGSUSED */ static int cpuid_post_ucodeadm_xc(xc_arg_t arg0, xc_arg_t arg1, xc_arg_t arg2) { uchar_t *fset; + boolean_t first_pass = (boolean_t)arg1; fset = (uchar_t *)(arg0 + sizeof (x86_featureset) * CPU->cpu_id); + if (first_pass && CPU->cpu_id != 0) + return (0); + if (!first_pass && CPU->cpu_id == 0) + return (0); cpuid_pass_ucode(CPU, fset); return (0); }
*** 6816,6828 **** i, cpu->cpu_m.mcpu_ucode_info->cui_rev, rev); } CPUSET_ADD(cpuset, i); } kpreempt_disable(); ! xc_sync((xc_arg_t)argdata, 0, 0, CPUSET2BV(cpuset), cpuid_post_ucodeadm_xc); kpreempt_enable(); /* * OK, now look at each CPU and see if their feature sets are equal. */ --- 7273,7294 ---- i, cpu->cpu_m.mcpu_ucode_info->cui_rev, rev); } CPUSET_ADD(cpuset, i); } + /* + * We do the cross calls in two passes. The first pass is only for the + * boot CPU. The second pass is for all of the other CPUs. This allows + * the boot CPU to go through and change behavior related to patching or + * whether or not Enhanced IBRS needs to be enabled and then allow all + * other CPUs to follow suit. + */ kpreempt_disable(); ! xc_sync((xc_arg_t)argdata, B_TRUE, 0, CPUSET2BV(cpuset), cpuid_post_ucodeadm_xc); + xc_sync((xc_arg_t)argdata, B_FALSE, 0, CPUSET2BV(cpuset), + cpuid_post_ucodeadm_xc); kpreempt_enable(); /* * OK, now look at each CPU and see if their feature sets are equal. */