Print this page
11787 Kernel needs to be built with retpolines
11788 Kernel needs to generally use RSB stuffing
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: John Levon <john.levon@joyent.com>
*** 895,904 ****
--- 895,1179 ----
* topology information, etc. Some of these subsystems include processor groups
* (uts/common/os/pg.c.), CPU Module Interface (uts/i86pc/os/cmi.c), ACPI,
* microcode, and performance monitoring. These functions all ASSERT that the
* CPU they're being called on has reached a certain cpuid pass. If the passes
* are rearranged, then this needs to be adjusted.
+ *
+ * -----------------------------------------------
+ * Speculative Execution CPU Side Channel Security
+ * -----------------------------------------------
+ *
+ * With the advent of the Spectre and Meltdown attacks which exploit speculative
+ * execution in the CPU to create side channels there have been a number of
+ * different attacks and corresponding issues that the operating system needs to
+ * mitigate against. The following list is some of the common, but not
+ * exhaustive, set of issues that we know about and have done some or need to do
+ * more work in the system to mitigate against:
+ *
+ * - Spectre v1
+ * - Spectre v2
+ * - Meltdown (Spectre v3)
+ * - Rogue Register Read (Spectre v3a)
+ * - Speculative Store Bypass (Spectre v4)
+ * - ret2spec, SpectreRSB
+ * - L1 Terminal Fault (L1TF)
+ * - Microarchitectural Data Sampling (MDS)
+ *
+ * Each of these requires different sets of mitigations and has different attack
+ * surfaces. For the most part, this discussion is about protecting the kernel
+ * from non-kernel executing environments such as user processes and hardware
+ * virtual machines. Unfortunately, there are a number of user vs. user
+ * scenarios that exist with these. The rest of this section will describe the
+ * overall approach that the system has taken to address these as well as their
+ * shortcomings. Unfortunately, not all of the above have been handled today.
+ *
+ * SPECTRE FAMILY (Spectre v2, ret2spec, SpectreRSB)
+ *
+ * The second variant of the spectre attack focuses on performing branch target
+ * injection. This generally impacts indirect call instructions in the system.
+ * There are three different ways to mitigate this issue that are commonly
+ * described today:
+ *
+ * 1. Using Indirect Branch Restricted Speculation (IBRS).
+ * 2. Using Retpolines and RSB Stuffing
+ * 3. Using Enhanced Indirect Branch Restricted Speculation (EIBRS)
+ *
+ * IBRS uses a feature added to microcode to restrict speculation, among other
+ * things. This form of mitigation has not been used as it has been generally
+ * seen as too expensive and requires reactivation upon various transitions in
+ * the system.
+ *
+ * As a less impactful alternative to IBRS, retpolines were developed by
+ * Google. These basically require one to replace indirect calls with a specific
+ * trampoline that will cause speculation to fail and break the attack.
+ * Retpolines require compiler support. We always build with retpolines in the
+ * external thunk mode. This means that a traditional indirect call is replaced
+ * with a call to one of the __x86_indirect_thunk_<reg> functions. A side effect
+ * of this is that all indirect function calls are performed through a register.
+ *
+ * We have to use a common external location of the thunk and not inline it into
+ * the callsite so that way we can have a single place to patch these functions.
+ * As it turns out, we actually have three different forms of retpolines that
+ * exist in the system:
+ *
+ * 1. A full retpoline
+ * 2. An AMD-specific optimized retpoline
+ * 3. A no-op version
+ *
+ * The first one is used in the general case. The second one is used if we can
+ * determine that we're on an AMD system and we can successfully toggle the
+ * lfence serializing MSR that exists on the platform. Basically with this
+ * present, an lfence is sufficient and we don't need to do anywhere near as
+ * complicated a dance to successfully use retpolines.
+ *
+ * The third form described above is the most curious. It turns out that the way
+ * that retpolines are implemented is that they rely on how speculation is
+ * performed on a 'ret' instruction. Intel has continued to optimize this
+ * process (which is partly why we need to have return stack buffer stuffing,
+ * but more on that in a bit) and in processors starting with Cascade Lake
+ * on the server side, it's dangerous to rely on retpolines. Instead, a new
+ * mechanism has been introduced called Enhanced IBRS (EIBRS).
+ *
+ * Unlike IBRS, EIBRS is designed to be enabled once at boot and left on each
+ * physical core. However, if this is the case, we don't want to use retpolines
+ * any more. Therefore if EIBRS is present, we end up turning each retpoline
+ * function (called a thunk) into a jmp instruction. This means that we're still
+ * paying the cost of an extra jump to the external thunk, but it gives us
+ * flexibility and the ability to have a single kernel image that works across a
+ * wide variety of systems and hardware features.
+ *
+ * Unfortunately, this alone is insufficient. First, Skylake systems have
+ * additional speculation for the Return Stack Buffer (RSB) which is used to
+ * return from call instructions which retpolines take advantage of. However,
+ * this problem is not just limited to Skylake and is actually more pernicious.
+ * The SpectreRSB paper introduces several more problems that can arise with
+ * dealing with this. The RSB can be poisoned just like the indirect branch
+ * predictor. This means that one needs to clear the RSB when transitioning
+ * between two different privilege domains. Some examples include:
+ *
+ * - Switching between two different user processes
+ * - Going between user land and the kernel
+ * - Returning to the kernel from a hardware virtual machine
+ *
+ * Mitigating this involves combining a couple of different things. The first is
+ * SMEP (supervisor mode execution protection) which was introduced in Ivy
+ * Bridge. When an RSB entry refers to a user address and we're executing in the
+ * kernel, speculation through it will be stopped when SMEP is enabled. This
+ * protects against a number of the different cases that we would normally be
+ * worried about such as when we enter the kernel from user land.
+ *
+ * To prevent against additional manipulation of the RSB from other contexts
+ * such as a non-root VMX context attacking the kernel we first look to enhanced
+ * IBRS. When EIBRS is present and enabled, then there is nothing else that we
+ * need to do to protect the kernel at this time.
+ *
+ * On CPUs without EIBRS we need to manually overwrite the contents of the
+ * return stack buffer. We do this through the x86_rsb_stuff() function.
+ * Currently this is employed on context switch. The x86_rsb_stuff() function is
+ * disabled when enhanced IBRS is present because Intel claims on such systems
+ * it will be ineffective. Stuffing the RSB in context switch helps prevent user
+ * to user attacks via the RSB.
+ *
+ * If SMEP is not present, then we would have to stuff the RSB every time we
+ * transitioned from user mode to the kernel, which isn't very practical right
+ * now.
+ *
+ * To fully protect user to user and vmx to vmx attacks from these classes of
+ * issues, we would also need to allow them to opt into performing an Indirect
+ * Branch Prediction Barrier (IBPB) on switch. This is not currently wired up.
+ *
+ * By default, the system will enable RSB stuffing and the required variant of
+ * retpolines and store that information in the x86_spectrev2_mitigation value.
+ * This will be evaluated after a microcode update as well, though it is
+ * expected that microcode updates will not take away features. This may mean
+ * that a late loaded microcode may not end up in the optimal configuration
+ * (though this should be rare).
+ *
+ * Currently we do not build kmdb with retpolines or perform any additional side
+ * channel security mitigations for it. One complication with kmdb is that it
+ * requires its own retpoline thunks and it would need to adjust itself based on
+ * what the kernel does. The threat model of kmdb is more limited and therefore
+ * it may make more sense to investigate using prediction barriers as the whole
+ * system is only executing a single instruction at a time while in kmdb.
+ *
+ * SPECTRE FAMILY (v1, v4)
+ *
+ * The v1 and v4 variants of spectre are not currently mitigated in the
+ * system and require other classes of changes to occur in the code.
+ *
+ * MELTDOWN
+ *
+ * Meltdown, or spectre v3, allowed a user process to read any data in their
+ * address space regardless of whether or not the page tables in question
+ * allowed the user to have the ability to read them. The solution to meltdown
+ * is kernel page table isolation. In this world, there are two page tables that
+ * are used for a process, one in user land and one in the kernel. To implement
+ * this we use per-CPU page tables and switch between the user and kernel
+ * variants when entering and exiting the kernel. For more information about
+ * this process and how the trampolines work, please see the big theory
+ * statements and additional comments in:
+ *
+ * - uts/i86pc/ml/kpti_trampolines.s
+ * - uts/i86pc/vm/hat_i86.c
+ *
+ * While Meltdown only impacted Intel systems and there are also Intel systems
+ * that have Meltdown fixed (called Rogue Data Cache Load), we always have
+ * kernel page table isolation enabled. While this may at first seem weird, an
+ * important thing to remember is that you can't speculatively read an address
+ * if it's never in your page table at all. Having user processes without kernel
+ * pages present provides us with an important layer of defense in the kernel
+ * against any other side channel attacks that exist and have yet to be
+ * discovered. As such, kernel page table isolation (KPTI) is always enabled by
+ * default, no matter the x86 system.
+ *
+ * L1 TERMINAL FAULT
+ *
+ * L1 Terminal Fault (L1TF) takes advantage of an issue in how speculative
+ * execution uses page table entries. Effectively, it is two different problems.
+ * The first is that it ignores the not present bit in the page table entries
+ * when performing speculative execution. This means that something can
+ * speculatively read the listed physical address if it's present in the L1
+ * cache under certain conditions (see Intel's documentation for the full set of
+ * conditions). Secondly, this can be used to bypass hardware virtualization
+ * extended page tables (EPT) that are part of Intel's hardware virtual machine
+ * instructions.
+ *
+ * For the non-hardware virtualized case, this is relatively easy to deal with.
+ * We must make sure that all unmapped pages have an address of zero. This means
+ * that they could read the first 4k of physical memory; however, we never use
+ * that first page in the operating system and always skip putting it in our
+ * memory map, even if firmware tells us we can use it in our memory map. While
+ * other systems try to put extra metadata in the address and reserved bits,
+ * which led to this being problematic in those cases, we do not.
+ *
+ * For hardware virtual machines things are more complicated. Because they can
+ * construct their own page tables, it isn't hard for them to perform this
+ * attack against any physical address. The one wrinkle is that this physical
+ * address must be in the L1 data cache. Thus Intel added an MSR that we can use
+ * to flush the L1 data cache. We wrap this up in the function
+ * spec_uarch_flush(). This function is also used in the mitigation of
+ * microarchitectural data sampling (MDS) discussed later on. Kernel based
+ * hypervisors such as KVM or bhyve are responsible for performing this before
+ * entering the guest.
+ *
+ * Because this attack takes place in the L1 cache, there's another wrinkle
+ * here. The L1 cache is shared between all logical CPUs in a core in most Intel
+ * designs. This means that when a thread enters a hardware virtualized context
+ * and flushes the L1 data cache, the other thread on the processor may then go
+ * ahead and put new data in it that can be potentially attacked. While one
+ * solution is to disable SMT on the system, another option that is available is
+ * to use a feature for hardware virtualization called 'SMT exclusion'. This
+ * goes through and makes sure that if a HVM is being scheduled on one thread,
+ * then the thing on the other thread is from the same hardware virtual machine.
+ * If an interrupt comes in or the guest exits to the broader system, then the
+ * other SMT thread will be kicked out.
+ *
+ * L1TF can be fully mitigated by hardware. If the RDCL_NO feature is set in the
+ * architecture capabilities MSR (MSR_IA32_ARCH_CAPABILITIES), then we will not
+ * perform L1TF related mitigations.
+ *
+ * MICROARCHITECTURAL DATA SAMPLING
+ *
+ * Microarchitectural data sampling (MDS) is a combination of four discrete
+ * vulnerabilities that are similar issues affecting various parts of the CPU's
+ * microarchitectural implementation around load, store, and fill buffers.
+ * Specifically it is made up of the following subcomponents:
+ *
+ * 1. Microarchitectural Store Buffer Data Sampling (MSBDS)
+ * 2. Microarchitectural Fill Buffer Data Sampling (MFBDS)
+ * 3. Microarchitectural Load Port Data Sampling (MLPDS)
+ * 4. Microarchitectural Data Sampling Uncacheable Memory (MDSUM)
+ *
+ * To begin addressing these, Intel has introduced another feature in microcode
+ * called MD_CLEAR. This changes the verw instruction to operate in a different
+ * way. This allows us to execute the verw instruction in a particular way to
+ * flush the state of the affected parts. The L1TF L1D flush mechanism is also
+ * updated when this microcode is present to flush this state.
+ *
+ * Primarily we need to flush this state whenever we transition from the kernel
+ * to a less privileged context such as user mode or an HVM guest. MSBDS is a
+ * little bit different. Here the structures are statically sized when a logical
+ * CPU is in use and resized when it goes to sleep. Therefore, we also need to
+ * flush the microarchitectural state before the CPU goes idles by calling hlt,
+ * mwait, or another ACPI method. To perform these flushes, we call
+ * x86_md_clear() at all of these transition points.
+ *
+ * If hardware enumerates RDCL_NO, indicating that it is not vulnerable to L1TF,
+ * then we change the spec_uarch_flush() function to point to x86_md_clear(). If
+ * MDS_NO has been set, then this is fully mitigated and x86_md_clear() becomes
+ * a no-op.
+ *
+ * Unfortunately, with this issue hyperthreading rears its ugly head. In
+ * particular, everything we've discussed above is only valid for a single
+ * thread executing on a core. In the case where you have hyper-threading
+ * present, this attack can be performed between threads. The theoretical fix
+ * for this is to ensure that both threads are always in the same security
+ * domain. This means that they are executing in the same ring and mutually
+ * trust each other. Practically speaking, this would mean that a system call
+ * would have to issue an inter-processor interrupt (IPI) to the other thread.
+ * Rather than implement this, we recommend that one disables hyper-threading
+ * through the use of psradm -aS.
+ *
+ * SUMMARY
+ *
+ * The following table attempts to summarize the mitigations for various issues
+ * and what's done in various places:
+ *
+ * - Spectre v1: Not currently mitigated
+ * - Spectre v2: Retpolines/RSB Stuffing or EIBRS if HW support
+ * - Meltdown: Kernel Page Table Isolation
+ * - Spectre v3a: Updated CPU microcode
+ * - Spectre v4: Not currently mitigated
+ * - SpectreRSB: SMEP and RSB Stuffing
+ * - L1TF: spec_uarch_flush, smt exclusion, requires microcode
+ * - MDS: x86_md_clear, requires microcode, disabling hyper threading
+ *
+ * The following table indicates the x86 feature set bits that indicate that a
+ * given problem has been solved or a notable feature is present:
+ *
+ * - RDCL_NO: Meltdown, L1TF, MSBDS subset of MDS
+ * - MDS_NO: All forms of MDS
*/
#include <sys/types.h>
#include <sys/archsystm.h>
#include <sys/x86_archext.h>
*** 919,928 ****
--- 1194,1205 ----
#include <sys/pci_cfgspace.h>
#include <sys/comm_page.h>
#include <sys/mach_mmu.h>
#include <sys/ucode.h>
#include <sys/tsc.h>
+ #include <sys/kobj.h>
+ #include <sys/asm_misc.h>
#ifdef __xpv
#include <sys/hypervisor.h>
#else
#include <sys/ontrap.h>
*** 938,947 ****
--- 1215,1235 ----
#else
int x86_use_pcid = -1;
int x86_use_invpcid = -1;
#endif
+ typedef enum {
+ X86_SPECTREV2_RETPOLINE,
+ X86_SPECTREV2_RETPOLINE_AMD,
+ X86_SPECTREV2_ENHANCED_IBRS,
+ X86_SPECTREV2_DISABLED
+ } x86_spectrev2_mitigation_t;
+
+ uint_t x86_disable_spectrev2 = 0;
+ static x86_spectrev2_mitigation_t x86_spectrev2_mitigation =
+ X86_SPECTREV2_RETPOLINE;
+
uint_t pentiumpro_bug4046376;
uchar_t x86_featureset[BT_SIZEOFMAP(NUM_X86_FEATURES)];
static char *x86_feature_names[NUM_X86_FEATURES] = {
*** 2168,2179 ****
* have a processor that is vulnerable to MDS, but is not vulnerable to L1TF
* (RDCL_NO is set).
*/
void (*spec_uarch_flush)(void) = spec_uarch_flush_noop;
- void (*x86_md_clear)(void) = x86_md_clear_noop;
-
static void
cpuid_update_md_clear(cpu_t *cpu, uchar_t *featureset)
{
struct cpuid_info *cpi = cpu->cpu_m.mcpu_cpi;
--- 2456,2465 ----
*** 2183,2199 ****
* MDS. Therefore we can only rely on MDS_NO to determine that we don't
* need to mitigate this.
*/
if (cpi->cpi_vendor != X86_VENDOR_Intel ||
is_x86_feature(featureset, X86FSET_MDS_NO)) {
- x86_md_clear = x86_md_clear_noop;
- membar_producer();
return;
}
if (is_x86_feature(featureset, X86FSET_MD_CLEAR)) {
! x86_md_clear = x86_md_clear_verw;
}
membar_producer();
}
--- 2469,2486 ----
* MDS. Therefore we can only rely on MDS_NO to determine that we don't
* need to mitigate this.
*/
if (cpi->cpi_vendor != X86_VENDOR_Intel ||
is_x86_feature(featureset, X86FSET_MDS_NO)) {
return;
}
if (is_x86_feature(featureset, X86FSET_MD_CLEAR)) {
! const uint8_t nop = NOP_INSTR;
! uint8_t *md = (uint8_t *)x86_md_clear;
!
! *md = nop;
}
membar_producer();
}
*** 2253,2287 ****
spec_uarch_flush = spec_uarch_flush_noop;
}
membar_producer();
}
static void
cpuid_scan_security(cpu_t *cpu, uchar_t *featureset)
{
struct cpuid_info *cpi = cpu->cpu_m.mcpu_cpi;
if (cpi->cpi_vendor == X86_VENDOR_AMD &&
cpi->cpi_xmaxeax >= CPUID_LEAF_EXT_8) {
if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_IBPB)
add_x86_feature(featureset, X86FSET_IBPB);
if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_IBRS)
add_x86_feature(featureset, X86FSET_IBRS);
if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_STIBP)
add_x86_feature(featureset, X86FSET_STIBP);
- if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_IBRS_ALL)
- add_x86_feature(featureset, X86FSET_IBRS_ALL);
if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_STIBP_ALL)
add_x86_feature(featureset, X86FSET_STIBP_ALL);
- if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_PREFER_IBRS)
- add_x86_feature(featureset, X86FSET_RSBA);
if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_SSBD)
add_x86_feature(featureset, X86FSET_SSBD);
if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_VIRT_SSBD)
add_x86_feature(featureset, X86FSET_SSBD_VIRT);
if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_SSB_NO)
add_x86_feature(featureset, X86FSET_SSB_NO);
} else if (cpi->cpi_vendor == X86_VENDOR_Intel &&
cpi->cpi_maxeax >= 7) {
struct cpuid_regs *ecp;
ecp = &cpi->cpi_std[7];
--- 2540,2707 ----
spec_uarch_flush = spec_uarch_flush_noop;
}
membar_producer();
}
+ /*
+ * We default to enabling RSB mitigations.
+ */
static void
+ cpuid_patch_rsb(x86_spectrev2_mitigation_t mit)
+ {
+ const uint8_t ret = RET_INSTR;
+ uint8_t *stuff = (uint8_t *)x86_rsb_stuff;
+
+ switch (mit) {
+ case X86_SPECTREV2_ENHANCED_IBRS:
+ case X86_SPECTREV2_DISABLED:
+ *stuff = ret;
+ break;
+ default:
+ break;
+ }
+ }
+
+ static void
+ cpuid_patch_retpolines(x86_spectrev2_mitigation_t mit)
+ {
+ const char *thunks[] = { "_rax", "_rbx", "_rcx", "_rdx", "_rdi",
+ "_rsi", "_rbp", "_r8", "_r9", "_r10", "_r11", "_r12", "_r13",
+ "_r14", "_r15" };
+ const uint_t nthunks = ARRAY_SIZE(thunks);
+ const char *type;
+ uint_t i;
+
+ if (mit == x86_spectrev2_mitigation)
+ return;
+
+ switch (mit) {
+ case X86_SPECTREV2_RETPOLINE:
+ type = "gen";
+ break;
+ case X86_SPECTREV2_RETPOLINE_AMD:
+ type = "amd";
+ break;
+ case X86_SPECTREV2_ENHANCED_IBRS:
+ case X86_SPECTREV2_DISABLED:
+ type = "jmp";
+ break;
+ default:
+ panic("asked to updated retpoline state with unknown state!");
+ }
+
+ for (i = 0; i < nthunks; i++) {
+ uintptr_t source, dest;
+ int ssize, dsize;
+ char sourcebuf[64], destbuf[64];
+ size_t len;
+
+ (void) snprintf(destbuf, sizeof (destbuf),
+ "__x86_indirect_thunk%s", thunks[i]);
+ (void) snprintf(sourcebuf, sizeof (sourcebuf),
+ "__x86_indirect_thunk_%s%s", type, thunks[i]);
+
+ source = kobj_getelfsym(sourcebuf, NULL, &ssize);
+ dest = kobj_getelfsym(destbuf, NULL, &dsize);
+ VERIFY3U(source, !=, 0);
+ VERIFY3U(dest, !=, 0);
+ VERIFY3S(dsize, >=, ssize);
+ bcopy((void *)source, (void *)dest, ssize);
+ }
+ }
+
+ static void
+ cpuid_enable_enhanced_ibrs(void)
+ {
+ uint64_t val;
+
+ val = rdmsr(MSR_IA32_SPEC_CTRL);
+ val |= IA32_SPEC_CTRL_IBRS;
+ wrmsr(MSR_IA32_SPEC_CTRL, val);
+ }
+
+ #ifndef __xpv
+ /*
+ * Determine whether or not we can use the AMD optimized retpoline
+ * functionality. We use this when we know we're on an AMD system and we can
+ * successfully verify that lfence is dispatch serializing.
+ */
+ static boolean_t
+ cpuid_use_amd_retpoline(struct cpuid_info *cpi)
+ {
+ uint64_t val;
+ on_trap_data_t otd;
+
+ if (cpi->cpi_vendor != X86_VENDOR_AMD)
+ return (B_FALSE);
+
+ /*
+ * We need to determine whether or not lfence is serializing. It always
+ * is on families 0xf and 0x11. On others, it's controlled by
+ * MSR_AMD_DECODE_CONFIG (MSRC001_1029). If some hypervisor gives us a
+ * crazy old family, don't try and do anything.
+ */
+ if (cpi->cpi_family < 0xf)
+ return (B_FALSE);
+ if (cpi->cpi_family == 0xf || cpi->cpi_family == 0x11)
+ return (B_TRUE);
+
+ /*
+ * While it may be tempting to use get_hwenv(), there are no promises
+ * that a hypervisor will actually declare themselves to be so in a
+ * friendly way. As such, try to read and set the MSR. If we can then
+ * read back the value we set (it wasn't just set to zero), then we go
+ * for it.
+ */
+ if (!on_trap(&otd, OT_DATA_ACCESS)) {
+ val = rdmsr(MSR_AMD_DECODE_CONFIG);
+ val |= AMD_DECODE_CONFIG_LFENCE_DISPATCH;
+ wrmsr(MSR_AMD_DECODE_CONFIG, val);
+ val = rdmsr(MSR_AMD_DECODE_CONFIG);
+ } else {
+ val = 0;
+ }
+ no_trap();
+
+ if ((val & AMD_DECODE_CONFIG_LFENCE_DISPATCH) != 0)
+ return (B_TRUE);
+ return (B_FALSE);
+ }
+ #endif /* !__xpv */
+
+ static void
cpuid_scan_security(cpu_t *cpu, uchar_t *featureset)
{
struct cpuid_info *cpi = cpu->cpu_m.mcpu_cpi;
+ x86_spectrev2_mitigation_t v2mit;
if (cpi->cpi_vendor == X86_VENDOR_AMD &&
cpi->cpi_xmaxeax >= CPUID_LEAF_EXT_8) {
if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_IBPB)
add_x86_feature(featureset, X86FSET_IBPB);
if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_IBRS)
add_x86_feature(featureset, X86FSET_IBRS);
if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_STIBP)
add_x86_feature(featureset, X86FSET_STIBP);
if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_STIBP_ALL)
add_x86_feature(featureset, X86FSET_STIBP_ALL);
if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_SSBD)
add_x86_feature(featureset, X86FSET_SSBD);
if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_VIRT_SSBD)
add_x86_feature(featureset, X86FSET_SSBD_VIRT);
if (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_SSB_NO)
add_x86_feature(featureset, X86FSET_SSB_NO);
+ /*
+ * Don't enable enhanced IBRS unless we're told that we should
+ * prefer it and it has the same semantics as Intel. This is
+ * split into two bits rather than a single one.
+ */
+ if ((cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_PREFER_IBRS) &&
+ (cpi->cpi_extd[8].cp_ebx & CPUID_AMD_EBX_IBRS_ALL)) {
+ add_x86_feature(featureset, X86FSET_IBRS_ALL);
+ }
+
} else if (cpi->cpi_vendor == X86_VENDOR_Intel &&
cpi->cpi_maxeax >= 7) {
struct cpuid_regs *ecp;
ecp = &cpi->cpi_std[7];
*** 2347,2360 ****
if (ecp->cp_edx & CPUID_INTC_EDX_7_0_FLUSH_CMD)
add_x86_feature(featureset, X86FSET_FLUSH_CMD);
}
! if (cpu->cpu_id != 0)
return;
/*
* We need to determine what changes are required for mitigating L1TF
* and MDS. If the CPU suffers from either of them, then SMT exclusion
* is required.
*
* If any of these are present, then we need to flush u-arch state at
--- 2767,2812 ----
if (ecp->cp_edx & CPUID_INTC_EDX_7_0_FLUSH_CMD)
add_x86_feature(featureset, X86FSET_FLUSH_CMD);
}
! if (cpu->cpu_id != 0) {
! if (x86_spectrev2_mitigation == X86_SPECTREV2_ENHANCED_IBRS) {
! cpuid_enable_enhanced_ibrs();
! }
return;
+ }
/*
+ * Go through and initialize various security mechanisms that we should
+ * only do on a single CPU. This includes Spectre V2, L1TF, and MDS.
+ */
+
+ /*
+ * By default we've come in with retpolines enabled. Check whether we
+ * should disable them or enable enhanced IBRS. RSB stuffing is enabled
+ * by default, but disabled if we are using enhanced IBRS.
+ */
+ if (x86_disable_spectrev2 != 0) {
+ v2mit = X86_SPECTREV2_DISABLED;
+ } else if (is_x86_feature(featureset, X86FSET_IBRS_ALL)) {
+ cpuid_enable_enhanced_ibrs();
+ v2mit = X86_SPECTREV2_ENHANCED_IBRS;
+ #ifndef __xpv
+ } else if (cpuid_use_amd_retpoline(cpi)) {
+ v2mit = X86_SPECTREV2_RETPOLINE_AMD;
+ #endif /* !__xpv */
+ } else {
+ v2mit = X86_SPECTREV2_RETPOLINE;
+ }
+
+ cpuid_patch_retpolines(v2mit);
+ cpuid_patch_rsb(v2mit);
+ x86_spectrev2_mitigation = v2mit;
+ membar_producer();
+
+ /*
* We need to determine what changes are required for mitigating L1TF
* and MDS. If the CPU suffers from either of them, then SMT exclusion
* is required.
*
* If any of these are present, then we need to flush u-arch state at
*** 6772,6783 ****
--- 7224,7240 ----
/* ARGSUSED */
static int
cpuid_post_ucodeadm_xc(xc_arg_t arg0, xc_arg_t arg1, xc_arg_t arg2)
{
uchar_t *fset;
+ boolean_t first_pass = (boolean_t)arg1;
fset = (uchar_t *)(arg0 + sizeof (x86_featureset) * CPU->cpu_id);
+ if (first_pass && CPU->cpu_id != 0)
+ return (0);
+ if (!first_pass && CPU->cpu_id == 0)
+ return (0);
cpuid_pass_ucode(CPU, fset);
return (0);
}
*** 6816,6828 ****
i, cpu->cpu_m.mcpu_ucode_info->cui_rev, rev);
}
CPUSET_ADD(cpuset, i);
}
kpreempt_disable();
! xc_sync((xc_arg_t)argdata, 0, 0, CPUSET2BV(cpuset),
cpuid_post_ucodeadm_xc);
kpreempt_enable();
/*
* OK, now look at each CPU and see if their feature sets are equal.
*/
--- 7273,7294 ----
i, cpu->cpu_m.mcpu_ucode_info->cui_rev, rev);
}
CPUSET_ADD(cpuset, i);
}
+ /*
+ * We do the cross calls in two passes. The first pass is only for the
+ * boot CPU. The second pass is for all of the other CPUs. This allows
+ * the boot CPU to go through and change behavior related to patching or
+ * whether or not Enhanced IBRS needs to be enabled and then allow all
+ * other CPUs to follow suit.
+ */
kpreempt_disable();
! xc_sync((xc_arg_t)argdata, B_TRUE, 0, CPUSET2BV(cpuset),
cpuid_post_ucodeadm_xc);
+ xc_sync((xc_arg_t)argdata, B_FALSE, 0, CPUSET2BV(cpuset),
+ cpuid_post_ucodeadm_xc);
kpreempt_enable();
/*
* OK, now look at each CPU and see if their feature sets are equal.
*/