Merge branch 'v5.15/standard/base' into v5.15/standard/cn-sdkv5.15/octeon

Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>

# Conflicts:
#	drivers/pci/quirks.c
This commit is contained in:
Bruce Ashfield 2024-04-15 16:28:56 -04:00
commit 5badfe362e
490 changed files with 12219 additions and 6998 deletions

View File

@ -517,6 +517,7 @@ What: /sys/devices/system/cpu/vulnerabilities
/sys/devices/system/cpu/vulnerabilities/mds
/sys/devices/system/cpu/vulnerabilities/meltdown
/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
/sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling
/sys/devices/system/cpu/vulnerabilities/retbleed
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
/sys/devices/system/cpu/vulnerabilities/spectre_v1

View File

@ -0,0 +1,74 @@
.. SPDX-License-Identifier: GPL-2.0
====================================
File system Monitoring with fanotify
====================================
File system Error Reporting
===========================
Fanotify supports the FAN_FS_ERROR event type for file system-wide error
reporting. It is meant to be used by file system health monitoring
daemons, which listen for these events and take actions (notify
sysadmin, start recovery) when a file system problem is detected.
By design, a FAN_FS_ERROR notification exposes sufficient information
for a monitoring tool to know a problem in the file system has happened.
It doesn't necessarily provide a user space application with semantics
to verify an IO operation was successfully executed. That is out of
scope for this feature. Instead, it is only meant as a framework for
early file system problem detection and reporting recovery tools.
When a file system operation fails, it is common for dozens of kernel
errors to cascade after the initial failure, hiding the original failure
log, which is usually the most useful debug data to troubleshoot the
problem. For this reason, FAN_FS_ERROR tries to report only the first
error that occurred for a file system since the last notification, and
it simply counts additional errors. This ensures that the most
important pieces of information are never lost.
FAN_FS_ERROR requires the fanotify group to be setup with the
FAN_REPORT_FID flag.
At the time of this writing, the only file system that emits FAN_FS_ERROR
notifications is Ext4.
A FAN_FS_ERROR Notification has the following format::
[ Notification Metadata (Mandatory) ]
[ Generic Error Record (Mandatory) ]
[ FID record (Mandatory) ]
The order of records is not guaranteed, and new records might be added
in the future. Therefore, applications must not rely on the order and
must be prepared to skip over unknown records. Please refer to
``samples/fanotify/fs-monitor.c`` for an example parser.
Generic error record
--------------------
The generic error record provides enough information for a file system
agnostic tool to learn about a problem in the file system, without
providing any additional details about the problem. This record is
identified by ``struct fanotify_event_info_header.info_type`` being set
to FAN_EVENT_INFO_TYPE_ERROR.
struct fanotify_event_info_error {
struct fanotify_event_info_header hdr;
__s32 error;
__u32 error_count;
};
The `error` field identifies the type of error using errno values.
`error_count` tracks the number of errors that occurred and were
suppressed to preserve the original error information, since the last
notification.
FID record
----------
The FID record can be used to uniquely identify the inode that triggered
the error through the combination of fsid and file handle. A file system
specific application can use that information to attempt a recovery
procedure. Errors that are not related to an inode are reported with an
empty file handle of type FILEID_INVALID.

View File

@ -21,3 +21,4 @@ are configurable at compile, boot or run time.
cross-thread-rsb.rst
gather_data_sampling.rst
srso
reg-file-data-sampling

View File

@ -0,0 +1,104 @@
==================================
Register File Data Sampling (RFDS)
==================================
Register File Data Sampling (RFDS) is a microarchitectural vulnerability that
only affects Intel Atom parts(also branded as E-cores). RFDS may allow
a malicious actor to infer data values previously used in floating point
registers, vector registers, or integer registers. RFDS does not provide the
ability to choose which data is inferred. CVE-2023-28746 is assigned to RFDS.
Affected Processors
===================
Below is the list of affected Intel processors [#f1]_:
=================== ============
Common name Family_Model
=================== ============
ATOM_GOLDMONT 06_5CH
ATOM_GOLDMONT_D 06_5FH
ATOM_GOLDMONT_PLUS 06_7AH
ATOM_TREMONT_D 06_86H
ATOM_TREMONT 06_96H
ALDERLAKE 06_97H
ALDERLAKE_L 06_9AH
ATOM_TREMONT_L 06_9CH
RAPTORLAKE 06_B7H
RAPTORLAKE_P 06_BAH
ALDERLAKE_N 06_BEH
RAPTORLAKE_S 06_BFH
=================== ============
As an exception to this table, Intel Xeon E family parts ALDERLAKE(06_97H) and
RAPTORLAKE(06_B7H) codenamed Catlow are not affected. They are reported as
vulnerable in Linux because they share the same family/model with an affected
part. Unlike their affected counterparts, they do not enumerate RFDS_CLEAR or
CPUID.HYBRID. This information could be used to distinguish between the
affected and unaffected parts, but it is deemed not worth adding complexity as
the reporting is fixed automatically when these parts enumerate RFDS_NO.
Mitigation
==========
Intel released a microcode update that enables software to clear sensitive
information using the VERW instruction. Like MDS, RFDS deploys the same
mitigation strategy to force the CPU to clear the affected buffers before an
attacker can extract the secrets. This is achieved by using the otherwise
unused and obsolete VERW instruction in combination with a microcode update.
The microcode clears the affected CPU buffers when the VERW instruction is
executed.
Mitigation points
-----------------
VERW is executed by the kernel before returning to user space, and by KVM
before VMentry. None of the affected cores support SMT, so VERW is not required
at C-state transitions.
New bits in IA32_ARCH_CAPABILITIES
----------------------------------
Newer processors and microcode update on existing affected processors added new
bits to IA32_ARCH_CAPABILITIES MSR. These bits can be used to enumerate
vulnerability and mitigation capability:
- Bit 27 - RFDS_NO - When set, processor is not affected by RFDS.
- Bit 28 - RFDS_CLEAR - When set, processor is affected by RFDS, and has the
microcode that clears the affected buffers on VERW execution.
Mitigation control on the kernel command line
---------------------------------------------
The kernel command line allows to control RFDS mitigation at boot time with the
parameter "reg_file_data_sampling=". The valid arguments are:
========== =================================================================
on If the CPU is vulnerable, enable mitigation; CPU buffer clearing
on exit to userspace and before entering a VM.
off Disables mitigation.
========== =================================================================
Mitigation default is selected by CONFIG_MITIGATION_RFDS.
Mitigation status information
-----------------------------
The Linux kernel provides a sysfs interface to enumerate the current
vulnerability status of the system: whether the system is vulnerable, and
which mitigations are active. The relevant sysfs file is:
/sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling
The possible values in this file are:
.. list-table::
* - 'Not affected'
- The processor is not vulnerable
* - 'Vulnerable'
- The processor is vulnerable, but no mitigation enabled
* - 'Vulnerable: No microcode'
- The processor is vulnerable but microcode is not updated.
* - 'Mitigation: Clear Register File'
- The processor is vulnerable and the CPU buffer clearing mitigation is
enabled.
References
----------
.. [#f1] Affected Processors
https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html

View File

@ -138,11 +138,10 @@ associated with the source address of the indirect branch. Specifically,
the BHB might be shared across privilege levels even in the presence of
Enhanced IBRS.
Currently the only known real-world BHB attack vector is via
unprivileged eBPF. Therefore, it's highly recommended to not enable
unprivileged eBPF, especially when eIBRS is used (without retpolines).
For a full mitigation against BHB attacks, it's recommended to use
retpolines (or eIBRS combined with retpolines).
Previously the only known real-world BHB attack vector was via unprivileged
eBPF. Further research has found attacks that don't require unprivileged eBPF.
For a full mitigation against BHB attacks it is recommended to set BHI_DIS_S or
use the BHB clearing sequence.
Attack scenarios
----------------
@ -430,6 +429,23 @@ The possible values in this file are:
'PBRSB-eIBRS: Not affected' CPU is not affected by PBRSB
=========================== =======================================================
- Branch History Injection (BHI) protection status:
.. list-table::
* - BHI: Not affected
- System is not affected
* - BHI: Retpoline
- System is protected by retpoline
* - BHI: BHI_DIS_S
- System is protected by BHI_DIS_S
* - BHI: SW loop; KVM SW loop
- System is protected by software clearing sequence
* - BHI: Syscall hardening
- Syscalls are hardened against BHI
* - BHI: Syscall hardening; KVM: SW loop
- System is protected from userspace attacks by syscall hardening; KVM is protected by software clearing sequence
Full mitigation might require a microcode update from the CPU
vendor. When the necessary microcode is not available, the kernel will
report vulnerability.
@ -484,11 +500,18 @@ Spectre variant 2
Systems which support enhanced IBRS (eIBRS) enable IBRS protection once at
boot, by setting the IBRS bit, and they're automatically protected against
Spectre v2 variant attacks, including cross-thread branch target injections
on SMT systems (STIBP). In other words, eIBRS enables STIBP too.
some Spectre v2 variant attacks. The BHB can still influence the choice of
indirect branch predictor entry, and although branch predictor entries are
isolated between modes when eIBRS is enabled, the BHB itself is not isolated
between modes. Systems which support BHI_DIS_S will set it to protect against
BHI attacks.
Legacy IBRS systems clear the IBRS bit on exit to userspace and
therefore explicitly enable STIBP for that
On Intel's enhanced IBRS systems, this includes cross-thread branch target
injections on SMT systems (STIBP). In other words, Intel eIBRS enables
STIBP, too.
AMD Automatic IBRS does not protect userspace, and Legacy IBRS systems clear
the IBRS bit on exit to userspace, therefore both explicitly enable STIBP.
The retpoline mitigation is turned on by default on vulnerable
CPUs. It can be forced on or off by the administrator
@ -622,9 +645,10 @@ kernel command line.
retpoline,generic Retpolines
retpoline,lfence LFENCE; indirect branch
retpoline,amd alias for retpoline,lfence
eibrs enhanced IBRS
eibrs,retpoline enhanced IBRS + Retpolines
eibrs,lfence enhanced IBRS + LFENCE
eibrs Enhanced/Auto IBRS
eibrs,retpoline Enhanced/Auto IBRS + Retpolines
eibrs,lfence Enhanced/Auto IBRS + LFENCE
ibrs use IBRS to protect kernel
Not specifying this option is equivalent to
spectre_v2=auto.
@ -684,6 +708,24 @@ For user space mitigation:
spectre_v2=off. Spectre variant 1 mitigations
cannot be disabled.
spectre_bhi=
[X86] Control mitigation of Branch History Injection
(BHI) vulnerability. Syscalls are hardened against BHI
regardless of this setting. This setting affects the deployment
of the HW BHI control and the SW BHB clearing sequence.
on
unconditionally enable.
off
unconditionally disable.
auto
enable if hardware mitigation
control(BHI_DIS_S) is available, otherwise
enable alternate mitigation in KVM.
For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt
Mitigation selection guide
--------------------------

View File

@ -82,6 +82,7 @@ configure specific aspects of kernel behavior to your liking.
edid
efi-stub
ext4
filesystem-monitoring
nfs/index
gpio/index
highuid

View File

@ -1037,6 +1037,26 @@
The filter can be disabled or changed to another
driver later using sysfs.
reg_file_data_sampling=
[X86] Controls mitigation for Register File Data
Sampling (RFDS) vulnerability. RFDS is a CPU
vulnerability which may allow userspace to infer
kernel data values previously stored in floating point
registers, vector registers, or integer registers.
RFDS only affects Intel Atom processors.
on: Turns ON the mitigation.
off: Turns OFF the mitigation.
This parameter overrides the compile time default set
by CONFIG_MITIGATION_RFDS. Mitigation cannot be
disabled when other VERW based mitigations (like MDS)
are enabled. In order to disable RFDS mitigation all
VERW based mitigations need to be disabled.
For details see:
Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
driver_async_probe= [KNL]
List of driver names to be probed asynchronously.
Format: <driver_name1>,<driver_name2>...
@ -3070,6 +3090,7 @@
nopti [X86,PPC]
nospectre_v1 [X86,PPC]
nospectre_v2 [X86,PPC,S390,ARM64]
reg_file_data_sampling=off [X86]
retbleed=off [X86]
spec_store_bypass_disable=off [X86,PPC]
spectre_v2_user=off [X86]
@ -5383,6 +5404,18 @@
sonypi.*= [HW] Sony Programmable I/O Control Device driver
See Documentation/admin-guide/laptops/sonypi.rst
spectre_bhi= [X86] Control mitigation of Branch History Injection
(BHI) vulnerability. Syscalls are hardened against BHI
reglardless of this setting. This setting affects the
deployment of the HW BHI control and the SW BHB
clearing sequence.
on - unconditionally enable.
off - unconditionally disable.
auto - (default) enable hardware mitigation
(BHI_DIS_S) if available, otherwise enable
alternate mitigation in KVM.
spectre_v2= [X86] Control mitigation of Spectre variant 2
(indirect branch speculation) vulnerability.
The default operation protects the kernel from
@ -5413,9 +5446,9 @@
retpoline,generic - Retpolines
retpoline,lfence - LFENCE; indirect branch
retpoline,amd - alias for retpoline,lfence
eibrs - enhanced IBRS
eibrs,retpoline - enhanced IBRS + Retpolines
eibrs,lfence - enhanced IBRS + LFENCE
eibrs - Enhanced/Auto IBRS
eibrs,retpoline - Enhanced/Auto IBRS + Retpolines
eibrs,lfence - Enhanced/Auto IBRS + LFENCE
ibrs - use IBRS to protect kernel
Not specifying this option is equivalent to

View File

@ -204,6 +204,20 @@ Returns the maximum size of a mapping for the device. The size parameter
of the mapping functions like dma_map_single(), dma_map_page() and
others should not be larger than the returned value.
::
size_t
dma_opt_mapping_size(struct device *dev);
Returns the maximum optimal size of a mapping for the device.
Mapping larger buffers may take much longer in certain scenarios. In
addition, for high-rate short-lived streaming mappings, the upfront time
spent on the mapping may account for an appreciable part of the total
request lifetime. As such, if splitting larger requests incurs no
significant performance penalty, then device drivers are advised to
limit total DMA streaming mappings length to the returned value.
::
bool

View File

@ -442,17 +442,21 @@ prototypes::
void (*lm_break)(struct file_lock *); /* break_lease callback */
int (*lm_change)(struct file_lock **, int);
bool (*lm_breaker_owns_lease)(struct file_lock *);
bool (*lm_lock_expirable)(struct file_lock *);
void (*lm_expire_lock)(void);
locking rules:
====================== ============= ================= =========
ops inode->i_lock blocked_lock_lock may block
ops flc_lock blocked_lock_lock may block
====================== ============= ================= =========
lm_notify: yes yes no
lm_notify: no yes no
lm_grant: no no no
lm_break: yes no no
lm_change yes no no
lm_breaker_owns_lease: no no no
lm_breaker_owns_lease: yes no no
lm_lock_expirable yes no no
lm_expire_lock no no yes
====================== ============= ================= =========
buffer_head

View File

@ -215,3 +215,36 @@ following flags are defined:
This flag causes nfsd to close any open files for this inode _before_
calling into the vfs to do an unlink or a rename that would replace
an existing file.
EXPORT_OP_REMOTE_FS - Backing storage for this filesystem is remote
PF_LOCAL_THROTTLE exists for loopback NFSD, where a thread needs to
write to one bdi (the final bdi) in order to free up writes queued
to another bdi (the client bdi). Such threads get a private balance
of dirty pages so that dirty pages for the client bdi do not imact
the daemon writing to the final bdi. For filesystems whose durable
storage is not local (such as exported NFS filesystems), this
constraint has negative consequences. EXPORT_OP_REMOTE_FS enables
an export to disable writeback throttling.
EXPORT_OP_NOATOMIC_ATTR - Filesystem does not update attributes atomically
EXPORT_OP_NOATOMIC_ATTR indicates that the exported filesystem
cannot provide the semantics required by the "atomic" boolean in
NFSv4's change_info4. This boolean indicates to a client whether the
returned before and after change attributes were obtained atomically
with the respect to the requested metadata operation (UNLINK,
OPEN/CREATE, MKDIR, etc).
EXPORT_OP_FLUSH_ON_CLOSE - Filesystem flushes file data on close(2)
On most filesystems, inodes can remain under writeback after the
file is closed. NFSD relies on client activity or local flusher
threads to handle writeback. Certain filesystems, such as NFS, flush
all of an inode's dirty data on last close. Exports that behave this
way should set EXPORT_OP_FLUSH_ON_CLOSE so that NFSD knows to skip
waiting for writeback when closing such files.
EXPORT_OP_ASYNC_LOCK - Indicates a capable filesystem to do async lock
requests from lockd. Only set EXPORT_OP_ASYNC_LOCK if the filesystem has
it's own ->lock() functionality as core posix_lock_file() implementation
has no async lock request handling yet. For more information about how to
indicate an async lock request from a ->lock() file_operations struct, see
fs/locks.c and comment for the function vfs_lock_file().

View File

@ -95,6 +95,9 @@ The kernel provides a function to invoke the buffer clearing:
mds_clear_cpu_buffers()
Also macro CLEAR_CPU_BUFFERS can be used in ASM late in exit-to-user path.
Other than CFLAGS.ZF, this macro doesn't clobber any registers.
The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state
(idle) transitions.
@ -138,17 +141,30 @@ Mitigation points
When transitioning from kernel to user space the CPU buffers are flushed
on affected CPUs when the mitigation is not disabled on the kernel
command line. The migitation is enabled through the static key
mds_user_clear.
command line. The mitigation is enabled through the feature flag
X86_FEATURE_CLEAR_CPU_BUF.
The mitigation is invoked in prepare_exit_to_usermode() which covers
all but one of the kernel to user space transitions. The exception
is when we return from a Non Maskable Interrupt (NMI), which is
handled directly in do_nmi().
The mitigation is invoked just before transitioning to userspace after
user registers are restored. This is done to minimize the window in
which kernel data could be accessed after VERW e.g. via an NMI after
VERW.
(The reason that NMI is special is that prepare_exit_to_usermode() can
enable IRQs. In NMI context, NMIs are blocked, and we don't want to
enable IRQs with NMIs blocked.)
**Corner case not handled**
Interrupts returning to kernel don't clear CPUs buffers since the
exit-to-user path is expected to do that anyways. But, there could be
a case when an NMI is generated in kernel after the exit-to-user path
has cleared the buffers. This case is not handled and NMI returning to
kernel don't clear CPU buffers because:
1. It is rare to get an NMI after VERW, but before returning to userspace.
2. For an unprivileged user, there is no known way to make that NMI
less rare or target it.
3. It would take a large number of these precisely-timed NMIs to mount
an actual attack. There's presumably not enough bandwidth.
4. The NMI in question occurs after a VERW, i.e. when user state is
restored and most interesting data is already scrubbed. Whats left
is only the data that NMI touches, and that may or may not be of
any interest.
2. C-State transition

View File

@ -9213,6 +9213,7 @@ F: drivers/infiniband/
F: include/rdma/
F: include/trace/events/ib_mad.h
F: include/trace/events/ib_umad.h
F: include/trace/misc/rdma.h
F: include/uapi/linux/if_infiniband.h
F: include/uapi/rdma/
F: samples/bpf/ibumad_kern.c
@ -10194,6 +10195,12 @@ F: fs/nfs_common/
F: fs/nfsd/
F: include/linux/lockd/
F: include/linux/sunrpc/
F: include/trace/events/rpcgss.h
F: include/trace/events/rpcrdma.h
F: include/trace/events/sunrpc.h
F: include/trace/misc/fs.h
F: include/trace/misc/nfs.h
F: include/trace/misc/sunrpc.h
F: include/uapi/linux/nfsd/
F: include/uapi/linux/sunrpc/
F: net/sunrpc/

View File

@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
VERSION = 5
PATCHLEVEL = 15
SUBLEVEL = 153
SUBLEVEL = 155
EXTRAVERSION =
NAME = Trick or Treat
@ -1000,8 +1000,8 @@ KBUILD_CFLAGS += $(CC_FLAGS_CFI)
export CC_FLAGS_CFI
endif
ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B
KBUILD_CFLAGS += -falign-functions=64
ifneq ($(CONFIG_FUNCTION_ALIGNMENT),0)
KBUILD_CFLAGS += -falign-functions=$(CONFIG_FUNCTION_ALIGNMENT)
endif
# arch Makefile may override CC so keep this after arch Makefile is included

View File

@ -1311,4 +1311,28 @@ source "kernel/gcov/Kconfig"
source "scripts/gcc-plugins/Kconfig"
config FUNCTION_ALIGNMENT_4B
bool
config FUNCTION_ALIGNMENT_8B
bool
config FUNCTION_ALIGNMENT_16B
bool
config FUNCTION_ALIGNMENT_32B
bool
config FUNCTION_ALIGNMENT_64B
bool
config FUNCTION_ALIGNMENT
int
default 64 if FUNCTION_ALIGNMENT_64B
default 32 if FUNCTION_ALIGNMENT_32B
default 16 if FUNCTION_ALIGNMENT_16B
default 8 if FUNCTION_ALIGNMENT_8B
default 4 if FUNCTION_ALIGNMENT_4B
default 0
endmenu

View File

@ -28,7 +28,7 @@
&twsi1 {
status = "okay";
pmic: max8925@3c {
compatible = "maxium,max8925";
compatible = "maxim,max8925";
reg = <0x3c>;
interrupts = <1>;
interrupt-parent = <&intcmux4>;

View File

@ -911,6 +911,8 @@ ap_spi_fp: &spi10 {
vddrf-supply = <&pp1300_l2c>;
vddch0-supply = <&pp3300_l10c>;
max-speed = <3200000>;
qcom,local-bd-address-broken;
};
};

View File

@ -728,11 +728,20 @@
status = "disabled";
ports {
hdmi_in: port {
#address-cells = <1>;
#size-cells = <0>;
hdmi_in: port@0 {
reg = <0>;
hdmi_in_vop: endpoint {
remote-endpoint = <&vop_out_hdmi>;
};
};
hdmi_out: port@1 {
reg = <1>;
};
};
};

View File

@ -1801,6 +1801,7 @@
hdmi: hdmi@ff940000 {
compatible = "rockchip,rk3399-dw-hdmi";
reg = <0x0 0xff940000 0x0 0x20000>;
reg-io-width = <4>;
interrupts = <GIC_SPI 23 IRQ_TYPE_LEVEL_HIGH 0>;
clocks = <&cru PCLK_HDMI_CTRL>,
<&cru SCLK_HDMI_SFR>,
@ -1809,13 +1810,16 @@
<&cru PLL_VPLL>;
clock-names = "iahb", "isfr", "cec", "grf", "vpll";
power-domains = <&power RK3399_PD_HDCP>;
reg-io-width = <4>;
rockchip,grf = <&grf>;
#sound-dai-cells = <0>;
status = "disabled";
ports {
hdmi_in: port {
#address-cells = <1>;
#size-cells = <0>;
hdmi_in: port@0 {
reg = <0>;
#address-cells = <1>;
#size-cells = <0>;
@ -1828,6 +1832,10 @@
remote-endpoint = <&vopl_out_hdmi>;
};
};
hdmi_out: port@1 {
reg = <1>;
};
};
};

View File

@ -13,6 +13,18 @@
#define KVM_PGTABLE_MAX_LEVELS 4U
/*
* The largest supported block sizes for KVM (no 52-bit PA support):
* - 4K (level 1): 1GB
* - 16K (level 2): 32MB
* - 64K (level 2): 512MB
*/
#ifdef CONFIG_ARM64_4K_PAGES
#define KVM_PGTABLE_MIN_BLOCK_LEVEL 1U
#else
#define KVM_PGTABLE_MIN_BLOCK_LEVEL 2U
#endif
static inline u64 kvm_get_parange(u64 mmfr0)
{
u64 parange = cpuid_feature_extract_unsigned_field(mmfr0,
@ -58,11 +70,7 @@ static inline u64 kvm_granule_size(u32 level)
static inline bool kvm_level_supports_block_mapping(u32 level)
{
/*
* Reject invalid block mappings and don't bother with 4TB mappings for
* 52-bit PAs.
*/
return !(level == 0 || (PAGE_SIZE != SZ_4K && level == 1));
return level >= KVM_PGTABLE_MIN_BLOCK_LEVEL;
}
/**

View File

@ -10,13 +10,6 @@
#include <linux/pgtable.h>
/*
* PGDIR_SHIFT determines the size a top-level page table entry can map
* and depends on the number of levels in the page table. Compute the
* PGDIR_SHIFT for a given number of levels.
*/
#define pt_levels_pgdir_shift(lvls) ARM64_HW_PGTABLE_LEVEL_SHIFT(4 - (lvls))
/*
* The hardware supports concatenation of up to 16 tables at stage2 entry
* level and we use the feature whenever possible, which means we resolve 4
@ -30,11 +23,6 @@
#define stage2_pgtable_levels(ipa) ARM64_HW_PGTABLE_LEVELS((ipa) - 4)
#define kvm_stage2_levels(kvm) VTCR_EL2_LVLS(kvm->arch.vtcr)
/* stage2_pgdir_shift() is the size mapped by top-level stage2 entry for the VM */
#define stage2_pgdir_shift(kvm) pt_levels_pgdir_shift(kvm_stage2_levels(kvm))
#define stage2_pgdir_size(kvm) (1ULL << stage2_pgdir_shift(kvm))
#define stage2_pgdir_mask(kvm) ~(stage2_pgdir_size(kvm) - 1)
/*
* kvm_mmmu_cache_min_pages() is the number of pages required to install
* a stage-2 translation. We pre-allocate the entry level page table at
@ -42,12 +30,4 @@
*/
#define kvm_mmu_cache_min_pages(kvm) (kvm_stage2_levels(kvm) - 1)
static inline phys_addr_t
stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
{
phys_addr_t boundary = (addr + stage2_pgdir_size(kvm)) & stage2_pgdir_mask(kvm);
return (boundary - 1 < end - 1) ? boundary : end;
}
#endif /* __ARM64_S2_PGTABLE_H_ */

View File

@ -31,6 +31,13 @@ static phys_addr_t hyp_idmap_vector;
static unsigned long io_map_base;
static phys_addr_t stage2_range_addr_end(phys_addr_t addr, phys_addr_t end)
{
phys_addr_t size = kvm_granule_size(KVM_PGTABLE_MIN_BLOCK_LEVEL);
phys_addr_t boundary = ALIGN_DOWN(addr + size, size);
return (boundary - 1 < end - 1) ? boundary : end;
}
/*
* Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
@ -52,7 +59,7 @@ static int stage2_apply_range(struct kvm *kvm, phys_addr_t addr,
if (!pgt)
return -EINVAL;
next = stage2_pgd_addr_end(kvm, addr, end);
next = stage2_range_addr_end(addr, end);
ret = fn(pgt, addr, next - addr);
if (ret)
break;

View File

@ -64,6 +64,7 @@ SECTIONS
STABS_DEBUG
DWARF_DEBUG
ELF_DETAILS
.hexagon.attributes 0 : { *(.hexagon.attributes) }
DISCARDS
}

View File

@ -63,6 +63,7 @@ config IA64
select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
select SET_FS
select ZONE_DMA32
select FUNCTION_ALIGNMENT_32B
default y
help
The Itanium Processor Family is Intel's 64-bit successor to

View File

@ -23,7 +23,7 @@ KBUILD_AFLAGS_KERNEL := -mconstant-gp
EXTRA :=
cflags-y := -pipe $(EXTRA) -ffixed-r13 -mfixed-range=f12-f15,f32-f127 \
-falign-functions=32 -frename-registers -fno-optimize-sibling-calls
-frename-registers -fno-optimize-sibling-calls
KBUILD_CFLAGS_KERNEL := -mconstant-gp
GAS_STATUS = $(shell $(srctree)/arch/ia64/scripts/check-gas "$(CC)" "$(OBJDUMP)")

View File

@ -74,10 +74,10 @@ void *arch_dma_set_uncached(void *cpu_addr, size_t size)
* We need to iterate through the pages, clearing the dcache for
* them and setting the cache-inhibit bit.
*/
mmap_read_lock(&init_mm);
error = walk_page_range(&init_mm, va, va + size, &set_nocache_walk_ops,
NULL);
mmap_read_unlock(&init_mm);
mmap_write_lock(&init_mm);
error = walk_page_range_novma(&init_mm, va, va + size,
&set_nocache_walk_ops, NULL, NULL);
mmap_write_unlock(&init_mm);
if (error)
return ERR_PTR(error);
@ -88,11 +88,11 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size)
{
unsigned long va = (unsigned long)cpu_addr;
mmap_read_lock(&init_mm);
mmap_write_lock(&init_mm);
/* walk_page_range shouldn't be able to fail here */
WARN_ON(walk_page_range(&init_mm, va, va + size,
&clear_nocache_walk_ops, NULL));
mmap_read_unlock(&init_mm);
WARN_ON(walk_page_range_novma(&init_mm, va, va + size,
&clear_nocache_walk_ops, NULL, NULL));
mmap_write_unlock(&init_mm);
}
void arch_sync_dma_for_device(phys_addr_t addr, size_t size,

View File

@ -79,26 +79,28 @@
* version takes two arguments: a src and destination register.
* However, the source and destination registers can not be
* the same register.
*
* We use add,l to avoid clobbering the C/B bits in the PSW.
*/
.macro tophys grvirt, grphys
ldil L%(__PAGE_OFFSET), \grphys
sub \grvirt, \grphys, \grphys
ldil L%(-__PAGE_OFFSET), \grphys
addl \grvirt, \grphys, \grphys
.endm
.macro tovirt grphys, grvirt
ldil L%(__PAGE_OFFSET), \grvirt
add \grphys, \grvirt, \grvirt
addl \grphys, \grvirt, \grvirt
.endm
.macro tophys_r1 gr
ldil L%(__PAGE_OFFSET), %r1
sub \gr, %r1, \gr
ldil L%(-__PAGE_OFFSET), %r1
addl \gr, %r1, \gr
.endm
.macro tovirt_r1 gr
ldil L%(__PAGE_OFFSET), %r1
add \gr, %r1, \gr
addl \gr, %r1, \gr
.endm
.macro delay value

View File

@ -40,7 +40,7 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
" addc %0, %5, %0\n"
" addc %0, %3, %0\n"
"1: ldws,ma 4(%1), %3\n"
" addib,< 0, %2, 1b\n"
" addib,> -1, %2, 1b\n"
" addc %0, %3, %0\n"
"\n"
" extru %0, 31, 16, %4\n"
@ -126,6 +126,7 @@ static __inline__ __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
** Try to keep 4 registers with "live" values ahead of the ALU.
*/
" depdi 0, 31, 32, %0\n"/* clear upper half of incoming checksum */
" ldd,ma 8(%1), %4\n" /* get 1st saddr word */
" ldd,ma 8(%2), %5\n" /* get 1st daddr word */
" add %4, %0, %0\n"
@ -137,8 +138,8 @@ static __inline__ __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
" add,dc %3, %0, %0\n" /* fold in proto+len | carry bit */
" extrd,u %0, 31, 32, %4\n"/* copy upper half down */
" depdi 0, 31, 32, %0\n"/* clear upper half */
" add %4, %0, %0\n" /* fold into 32-bits */
" addc 0, %0, %0\n" /* add carry */
" add,dc %4, %0, %0\n" /* fold into 32-bits, plus carry */
" addc 0, %0, %0\n" /* add final carry */
#else
@ -163,7 +164,8 @@ static __inline__ __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
" ldw,ma 4(%2), %7\n" /* 4th daddr */
" addc %6, %0, %0\n"
" addc %7, %0, %0\n"
" addc %3, %0, %0\n" /* fold in proto+len, catch carry */
" addc %3, %0, %0\n" /* fold in proto+len */
" addc 0, %0, %0\n" /* add carry */
#endif
: "=r" (sum), "=r" (saddr), "=r" (daddr), "=r" (len),

View File

@ -12,9 +12,16 @@
#ifndef __ASSEMBLY__
/* Performance Monitor Registers */
#define mfpmr(rn) ({unsigned int rval; \
asm volatile("mfpmr %0," __stringify(rn) \
asm volatile(".machine push; " \
".machine e300; " \
"mfpmr %0," __stringify(rn) ";" \
".machine pop; " \
: "=r" (rval)); rval;})
#define mtpmr(rn, v) asm volatile("mtpmr " __stringify(rn) ",%0" : : "r" (v))
#define mtpmr(rn, v) asm volatile(".machine push; " \
".machine e300; " \
"mtpmr " __stringify(rn) ",%0; " \
".machine pop; " \
: : "r" (v))
#endif /* __ASSEMBLY__ */
/* Freescale Book E Performance Monitor APU Registers */

View File

@ -67,6 +67,6 @@ obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o
obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o
obj-$(CONFIG_ALTIVEC) += xor_vmx.o xor_vmx_glue.o
CFLAGS_xor_vmx.o += -maltivec $(call cc-option,-mabi=altivec)
CFLAGS_xor_vmx.o += -mhard-float -maltivec $(call cc-option,-mabi=altivec)
obj-$(CONFIG_PPC64) += $(obj64-y)

View File

@ -467,7 +467,7 @@ unsigned long __must_check clear_user(void __user *to, unsigned long n)
#define __get_kernel_nofault(dst, src, type, err_label) \
do { \
long __kr_err; \
long __kr_err = 0; \
\
__get_user_nocheck(*((type *)(dst)), (type *)(src), __kr_err); \
if (unlikely(__kr_err)) \
@ -476,7 +476,7 @@ do { \
#define __put_kernel_nofault(dst, src, type, err_label) \
do { \
long __kr_err; \
long __kr_err = 0; \
\
__put_user_nocheck(*((type *)(src)), (type *)(dst), __kr_err); \
if (unlikely(__kr_err)) \

View File

@ -24,8 +24,6 @@
#include <asm/switch_to.h>
#include <asm/thread_info.h>
register unsigned long gp_in_global __asm__("gp");
#if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK)
#include <linux/stackprotector.h>
unsigned long __stack_chk_guard __read_mostly;
@ -130,7 +128,6 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
/* Kernel thread */
memset(childregs, 0, sizeof(struct pt_regs));
childregs->gp = gp_in_global;
/* Supervisor/Machine, irqs on: */
childregs->status = SR_PP | SR_PIE;

View File

@ -685,6 +685,7 @@ ENDPROC(stack_overflow)
.Lthis_cpu: .short 0
.Lstosm_tmp: .byte 0
.section .rodata, "a"
.balign 8
#define SYSCALL(esame,emu) .quad __s390x_ ## esame
.globl sys_call_table
sys_call_table:

View File

@ -274,7 +274,7 @@ static int __init setup_nmi_watchdog(char *str)
if (!strncmp(str, "panic", 5))
panic_on_timeout = 1;
return 0;
return 1;
}
__setup("nmi_watchdog=", setup_nmi_watchdog);

View File

@ -449,9 +449,8 @@ static __init int vdso_setup(char *s)
unsigned long val;
err = kstrtoul(s, 10, &val);
if (err)
return err;
if (!err)
vdso_enabled = val;
return 0;
return 1;
}
__setup("vdso=", vdso_setup);

View File

@ -267,6 +267,8 @@ config X86
select HAVE_ARCH_KCSAN if X86_64
select X86_FEATURE_NAMES if PROC_FS
select PROC_PID_ARCH_STATUS if PROC_FS
select FUNCTION_ALIGNMENT_16B if X86_64 || X86_ALIGNMENT_16
select FUNCTION_ALIGNMENT_4B
imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI
config INSTRUCTION_DECODER
@ -2499,6 +2501,42 @@ config GDS_FORCE_MITIGATION
If in doubt, say N.
config MITIGATION_RFDS
bool "RFDS Mitigation"
depends on CPU_SUP_INTEL
default y
help
Enable mitigation for Register File Data Sampling (RFDS) by default.
RFDS is a hardware vulnerability which affects Intel Atom CPUs. It
allows unprivileged speculative access to stale data previously
stored in floating point, vector and integer registers.
See also <file:Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst>
choice
prompt "Clear branch history"
depends on CPU_SUP_INTEL
default SPECTRE_BHI_ON
help
Enable BHI mitigations. BHI attacks are a form of Spectre V2 attacks
where the branch history buffer is poisoned to speculatively steer
indirect branches.
See <file:Documentation/admin-guide/hw-vuln/spectre.rst>
config SPECTRE_BHI_ON
bool "on"
help
Equivalent to setting spectre_bhi=on command line parameter.
config SPECTRE_BHI_OFF
bool "off"
help
Equivalent to setting spectre_bhi=off command line parameter.
config SPECTRE_BHI_AUTO
bool "auto"
help
Equivalent to setting spectre_bhi=auto command line parameter.
endchoice
endif
config ARCH_HAS_ADD_PAGES

View File

@ -37,6 +37,14 @@
#include <asm/trapnr.h>
#include "pgtable.h"
/*
* Fix alignment at 16 bytes. Following CONFIG_FUNCTION_ALIGNMENT will result
* in assembly errors due to trying to move .org backward due to the excessive
* alignment.
*/
#undef __ALIGN
#define __ALIGN .balign 16, 0x90
/*
* Locally defined symbols should be marked hidden:
*/

View File

@ -47,7 +47,7 @@ static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
if (likely(unr < NR_syscalls)) {
unr = array_index_nospec(unr, NR_syscalls);
regs->ax = sys_call_table[unr](regs);
regs->ax = x64_sys_call(regs, unr);
return true;
}
return false;
@ -64,7 +64,7 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
if (IS_ENABLED(CONFIG_X86_X32_ABI) && likely(xnr < X32_NR_syscalls)) {
xnr = array_index_nospec(xnr, X32_NR_syscalls);
regs->ax = x32_sys_call_table[xnr](regs);
regs->ax = x32_sys_call(regs, xnr);
return true;
}
return false;
@ -109,7 +109,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs, int nr)
if (likely(unr < IA32_NR_syscalls)) {
unr = array_index_nospec(unr, IA32_NR_syscalls);
regs->ax = ia32_sys_call_table[unr](regs);
regs->ax = ia32_sys_call(regs, unr);
} else if (nr != -1) {
regs->ax = __ia32_sys_ni_syscall(regs);
}

View File

@ -6,6 +6,9 @@
#include <linux/linkage.h>
#include <asm/export.h>
#include <asm/msr-index.h>
#include <asm/unwind_hints.h>
#include <asm/segment.h>
#include <asm/cache.h>
.pushsection .noinstr.text, "ax"
@ -20,3 +23,23 @@ SYM_FUNC_END(entry_ibpb)
EXPORT_SYMBOL_GPL(entry_ibpb);
.popsection
/*
* Define the VERW operand that is disguised as entry code so that
* it can be referenced with KPTI enabled. This ensure VERW can be
* used late in exit-to-user path after page tables are switched.
*/
.pushsection .entry.text, "ax"
.align L1_CACHE_BYTES, 0xcc
SYM_CODE_START_NOALIGN(mds_verw_sel)
UNWIND_HINT_EMPTY
ANNOTATE_NOENDBR
.word __KERNEL_DS
.align L1_CACHE_BYTES, 0xcc
SYM_CODE_END(mds_verw_sel);
/* For KVM */
EXPORT_SYMBOL_GPL(mds_verw_sel);
.popsection

View File

@ -912,6 +912,7 @@ SYM_FUNC_START(entry_SYSENTER_32)
BUG_IF_WRONG_CR3 no_user_check=1
popfl
popl %eax
CLEAR_CPU_BUFFERS
/*
* Return back to the vDSO, which will pop ecx and edx.
@ -981,6 +982,7 @@ restore_all_switch_stack:
/* Restore user state */
RESTORE_REGS pop=4 # skip orig_eax/error_code
CLEAR_CPU_BUFFERS
.Lirq_return:
/*
* ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
@ -1173,6 +1175,7 @@ SYM_CODE_START(asm_exc_nmi)
/* Not on SYSENTER stack. */
call exc_nmi
CLEAR_CPU_BUFFERS
jmp .Lnmi_return
.Lnmi_from_sysenter_stack:

View File

@ -114,6 +114,7 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)
/* clobbers %rax, make sure it is after saving the syscall nr */
IBRS_ENTER
UNTRAIN_RET
CLEAR_BRANCH_HISTORY
call do_syscall_64 /* returns with IRQs disabled */
@ -219,6 +220,7 @@ syscall_return_via_sysret:
popq %rdi
popq %rsp
swapgs
CLEAR_CPU_BUFFERS
sysretq
SYM_CODE_END(entry_SYSCALL_64)
@ -637,6 +639,7 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
/* Restore RDI. */
popq %rdi
SWAPGS
CLEAR_CPU_BUFFERS
INTERRUPT_RETURN
@ -743,6 +746,8 @@ native_irq_return_ldt:
*/
popq %rax /* Restore user RAX */
CLEAR_CPU_BUFFERS
/*
* RSP now points to an ordinary IRET frame, except that the page
* is read-only and RSP[31:16] are preloaded with the userspace
@ -1465,6 +1470,12 @@ nmi_restore:
std
movq $0, 5*8(%rsp) /* clear "NMI executing" */
/*
* Skip CLEAR_CPU_BUFFERS here, since it only helps in rare cases like
* NMI in kernel after user state is restored. For an unprivileged user
* these conditions are hard to meet.
*/
/*
* iretq reads the "iret" frame and exits the NMI stack in a
* single instruction. We are returning to kernel mode, so this
@ -1482,6 +1493,7 @@ SYM_CODE_END(asm_exc_nmi)
SYM_CODE_START(ignore_sysret)
UNWIND_HINT_EMPTY
mov $-ENOSYS, %eax
CLEAR_CPU_BUFFERS
sysretl
SYM_CODE_END(ignore_sysret)
#endif
@ -1499,3 +1511,63 @@ SYM_CODE_START(rewind_stack_and_make_dead)
call make_task_dead
SYM_CODE_END(rewind_stack_and_make_dead)
.popsection
/*
* This sequence executes branches in order to remove user branch information
* from the branch history tracker in the Branch Predictor, therefore removing
* user influence on subsequent BTB lookups.
*
* It should be used on parts prior to Alder Lake. Newer parts should use the
* BHI_DIS_S hardware control instead. If a pre-Alder Lake part is being
* virtualized on newer hardware the VMM should protect against BHI attacks by
* setting BHI_DIS_S for the guests.
*
* CALLs/RETs are necessary to prevent Loop Stream Detector(LSD) from engaging
* and not clearing the branch history. The call tree looks like:
*
* call 1
* call 2
* call 2
* call 2
* call 2
* call 2
* ret
* ret
* ret
* ret
* ret
* ret
*
* This means that the stack is non-constant and ORC can't unwind it with %rsp
* alone. Therefore we unconditionally set up the frame pointer, which allows
* ORC to unwind properly.
*
* The alignment is for performance and not for safety, and may be safely
* refactored in the future if needed.
*/
SYM_FUNC_START(clear_bhb_loop)
push %rbp
mov %rsp, %rbp
movl $5, %ecx
ANNOTATE_INTRA_FUNCTION_CALL
call 1f
jmp 5f
.align 64, 0xcc
ANNOTATE_INTRA_FUNCTION_CALL
1: call 2f
RET
.align 64, 0xcc
2: movl $5, %eax
3: jmp 4f
nop
4: sub $1, %eax
jnz 3b
sub $1, %ecx
jnz 1b
RET
5: lfence
pop %rbp
RET
SYM_FUNC_END(clear_bhb_loop)
EXPORT_SYMBOL_GPL(clear_bhb_loop)
STACK_FRAME_NON_STANDARD(clear_bhb_loop)

View File

@ -116,6 +116,7 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SYM_L_GLOBAL)
IBRS_ENTER
UNTRAIN_RET
CLEAR_BRANCH_HISTORY
/*
* SYSENTER doesn't filter flags, so we need to clear NT and AC
@ -259,6 +260,7 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM_L_GLOBAL)
IBRS_ENTER
UNTRAIN_RET
CLEAR_BRANCH_HISTORY
movq %rsp, %rdi
call do_fast_syscall_32
@ -319,6 +321,7 @@ sysret32_from_system_call:
xorl %r9d, %r9d
xorl %r10d, %r10d
swapgs
CLEAR_CPU_BUFFERS
sysretl
SYM_CODE_END(entry_SYSCALL_compat)
@ -421,6 +424,7 @@ SYM_CODE_START(entry_INT80_compat)
IBRS_ENTER
UNTRAIN_RET
CLEAR_BRANCH_HISTORY
movq %rsp, %rdi
call do_int80_syscall_32

View File

@ -18,8 +18,25 @@
#include <asm/syscalls_32.h>
#undef __SYSCALL
/*
* The sys_call_table[] is no longer used for system calls, but
* kernel/trace/trace_syscalls.c still wants to know the system
* call address.
*/
#ifdef CONFIG_X86_32
#define __SYSCALL(nr, sym) __ia32_##sym,
__visible const sys_call_ptr_t ia32_sys_call_table[] = {
const sys_call_ptr_t sys_call_table[] = {
#include <asm/syscalls_32.h>
};
#undef __SYSCALL
#endif
#define __SYSCALL(nr, sym) case nr: return __ia32_##sym(regs);
long ia32_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
#include <asm/syscalls_32.h>
default: return __ia32_sys_ni_syscall(regs);
}
};

View File

@ -11,8 +11,23 @@
#include <asm/syscalls_64.h>
#undef __SYSCALL
/*
* The sys_call_table[] is no longer used for system calls, but
* kernel/trace/trace_syscalls.c still wants to know the system
* call address.
*/
#define __SYSCALL(nr, sym) __x64_##sym,
asmlinkage const sys_call_ptr_t sys_call_table[] = {
const sys_call_ptr_t sys_call_table[] = {
#include <asm/syscalls_64.h>
};
#undef __SYSCALL
#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
long x64_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
#include <asm/syscalls_64.h>
default: return __x64_sys_ni_syscall(regs);
}
};

View File

@ -11,8 +11,12 @@
#include <asm/syscalls_x32.h>
#undef __SYSCALL
#define __SYSCALL(nr, sym) __x64_##sym,
#define __SYSCALL(nr, sym) case nr: return __x64_##sym(regs);
asmlinkage const sys_call_ptr_t x32_sys_call_table[] = {
#include <asm/syscalls_x32.h>
long x32_sys_call(const struct pt_regs *regs, unsigned int nr)
{
switch (nr) {
#include <asm/syscalls_x32.h>
default: return __x64_sys_ni_syscall(regs);
}
};

View File

@ -12,6 +12,7 @@
#include <asm/special_insns.h>
#include <asm/preempt.h>
#include <asm/asm.h>
#include <asm/nospec-branch.h>
#ifndef CONFIG_X86_CMPXCHG64
extern void cmpxchg8b_emu(void);

View File

@ -6,11 +6,13 @@
# define __ASM_FORM(x, ...) x,## __VA_ARGS__
# define __ASM_FORM_RAW(x, ...) x,## __VA_ARGS__
# define __ASM_FORM_COMMA(x, ...) x,## __VA_ARGS__,
# define __ASM_REGPFX %
#else
#include <linux/stringify.h>
# define __ASM_FORM(x, ...) " " __stringify(x,##__VA_ARGS__) " "
# define __ASM_FORM_RAW(x, ...) __stringify(x,##__VA_ARGS__)
# define __ASM_FORM_COMMA(x, ...) " " __stringify(x,##__VA_ARGS__) ","
# define __ASM_REGPFX %%
#endif
#define _ASM_BYTES(x, ...) __ASM_FORM(.byte x,##__VA_ARGS__ ;)
@ -49,6 +51,9 @@
#define _ASM_SI __ASM_REG(si)
#define _ASM_DI __ASM_REG(di)
/* Adds a (%rip) suffix on 64 bits only; for immediate memory references */
#define _ASM_RIP(x) __ASM_SEL_RAW(x, x (__ASM_REGPFX rip))
#ifndef __x86_64__
/* 32 bit */

View File

@ -33,6 +33,8 @@ enum cpuid_leafs
CPUID_7_EDX,
CPUID_8000_001F_EAX,
CPUID_8000_0021_EAX,
CPUID_LNX_5,
NR_CPUID_WORDS,
};
#ifdef CONFIG_X86_FEATURE_NAMES
@ -93,8 +95,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 18, feature_bit) || \
CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 19, feature_bit) || \
CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 20, feature_bit) || \
CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 21, feature_bit) || \
REQUIRED_MASK_CHECK || \
BUILD_BUG_ON_ZERO(NCAPINTS != 21))
BUILD_BUG_ON_ZERO(NCAPINTS != 22))
#define DISABLED_MASK_BIT_SET(feature_bit) \
( CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 0, feature_bit) || \
@ -118,8 +121,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 18, feature_bit) || \
CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 19, feature_bit) || \
CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 20, feature_bit) || \
CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 21, feature_bit) || \
DISABLED_MASK_CHECK || \
BUILD_BUG_ON_ZERO(NCAPINTS != 21))
BUILD_BUG_ON_ZERO(NCAPINTS != 22))
#define cpu_has(c, bit) \
(__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \

View File

@ -13,7 +13,7 @@
/*
* Defines x86 CPU feature bits
*/
#define NCAPINTS 21 /* N 32-bit words worth of info */
#define NCAPINTS 22 /* N 32-bit words worth of info */
#define NBUGINTS 2 /* N 32-bit bug flags */
/*
@ -302,7 +302,7 @@
#define X86_FEATURE_UNRET (11*32+15) /* "" AMD BTB untrain return */
#define X86_FEATURE_USE_IBPB_FW (11*32+16) /* "" Use IBPB during runtime firmware calls */
#define X86_FEATURE_RSB_VMEXIT_LITE (11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
#define X86_FEATURE_CLEAR_CPU_BUF (11*32+18) /* "" Clear CPU buffers using VERW */
#define X86_FEATURE_MSR_TSX_CTRL (11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
@ -416,10 +416,22 @@
#define X86_FEATURE_SEV_ES (19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
#define X86_FEATURE_SME_COHERENT (19*32+10) /* "" AMD hardware-enforced cache coherency */
#define X86_FEATURE_AUTOIBRS (20*32+ 8) /* "" Automatic IBRS */
#define X86_FEATURE_SBPB (20*32+27) /* "" Selective Branch Prediction Barrier */
#define X86_FEATURE_IBPB_BRTYPE (20*32+28) /* "" MSR_PRED_CMD[IBPB] flushes all branch type predictions */
#define X86_FEATURE_SRSO_NO (20*32+29) /* "" CPU is not affected by SRSO */
/*
* Extended auxiliary flags: Linux defined - for features scattered in various
* CPUID levels like 0x80000022, etc and Linux defined features.
*
* Reuse free bits when adding new feature flags!
*/
#define X86_FEATURE_CLEAR_BHB_LOOP (21*32+ 1) /* "" Clear branch history at syscall entry using SW loop */
#define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */
#define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW control enabled */
#define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
/*
* BUG word(s)
*/
@ -466,4 +478,6 @@
/* BUG word 2 */
#define X86_BUG_SRSO X86_BUG(1*32 + 0) /* AMD SRSO bug */
#define X86_BUG_DIV0 X86_BUG(1*32 + 1) /* AMD DIV0 speculation bug */
#define X86_BUG_RFDS X86_BUG(1*32 + 2) /* CPU is vulnerable to Register File Data Sampling */
#define X86_BUG_BHI X86_BUG(1*32 + 3) /* CPU is affected by Branch History Injection */
#endif /* _ASM_X86_CPUFEATURES_H */

View File

@ -109,6 +109,7 @@
#define DISABLED_MASK18 0
#define DISABLED_MASK19 0
#define DISABLED_MASK20 0
#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21)
#define DISABLED_MASK21 0
#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 22)
#endif /* _ASM_X86_DISABLED_FEATURES_H */

View File

@ -91,7 +91,6 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
static __always_inline void arch_exit_to_user_mode(void)
{
mds_user_clear_cpu_buffers();
amd_clear_divider();
}
#define arch_exit_to_user_mode arch_exit_to_user_mode

View File

@ -11,12 +11,14 @@
#define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
#endif /* CONFIG_X86_32 */
#ifdef __ASSEMBLY__
#if defined(CONFIG_X86_64) || defined(CONFIG_X86_ALIGNMENT_16)
#define __ALIGN .p2align 4, 0x90
#define __ALIGN .balign CONFIG_FUNCTION_ALIGNMENT, 0x90;
#define __ALIGN_STR __stringify(__ALIGN)
#endif
#define ASM_FUNC_ALIGN __ALIGN_STR
#define __FUNC_ALIGN __ALIGN
#define SYM_F_ALIGN __FUNC_ALIGN
#ifdef __ASSEMBLY__
#if defined(CONFIG_RETHUNK) && !defined(__DISABLE_EXPORTS) && !defined(BUILD_VDSO)
#define RET jmp __x86_return_thunk

View File

@ -30,6 +30,7 @@
#define _EFER_SVME 12 /* Enable virtualization */
#define _EFER_LMSLE 13 /* Long Mode Segment Limit Enable */
#define _EFER_FFXSR 14 /* Enable Fast FXSAVE/FXRSTOR */
#define _EFER_AUTOIBRS 21 /* Enable Automatic IBRS */
#define EFER_SCE (1<<_EFER_SCE)
#define EFER_LME (1<<_EFER_LME)
@ -38,6 +39,7 @@
#define EFER_SVME (1<<_EFER_SVME)
#define EFER_LMSLE (1<<_EFER_LMSLE)
#define EFER_FFXSR (1<<_EFER_FFXSR)
#define EFER_AUTOIBRS (1<<_EFER_AUTOIBRS)
/* Intel MSRs. Some also available on other CPUs */
@ -53,10 +55,13 @@
#define SPEC_CTRL_SSBD BIT(SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */
#define SPEC_CTRL_RRSBA_DIS_S_SHIFT 6 /* Disable RRSBA behavior */
#define SPEC_CTRL_RRSBA_DIS_S BIT(SPEC_CTRL_RRSBA_DIS_S_SHIFT)
#define SPEC_CTRL_BHI_DIS_S_SHIFT 10 /* Disable Branch History Injection behavior */
#define SPEC_CTRL_BHI_DIS_S BIT(SPEC_CTRL_BHI_DIS_S_SHIFT)
/* A mask for bits which the kernel toggles when controlling mitigations */
#define SPEC_CTRL_MITIGATIONS_MASK (SPEC_CTRL_IBRS | SPEC_CTRL_STIBP | SPEC_CTRL_SSBD \
| SPEC_CTRL_RRSBA_DIS_S)
| SPEC_CTRL_RRSBA_DIS_S \
| SPEC_CTRL_BHI_DIS_S)
#define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */
#define PRED_CMD_IBPB BIT(0) /* Indirect Branch Prediction Barrier */
@ -153,6 +158,10 @@
* are restricted to targets in
* kernel.
*/
#define ARCH_CAP_BHI_NO BIT(20) /*
* CPU is not affected by Branch
* History Injection.
*/
#define ARCH_CAP_PBRSB_NO BIT(24) /*
* Not susceptible to Post-Barrier
* Return Stack Buffer Predictions.
@ -166,6 +175,14 @@
* CPU is not vulnerable to Gather
* Data Sampling (GDS).
*/
#define ARCH_CAP_RFDS_NO BIT(27) /*
* Not susceptible to Register
* File Data Sampling.
*/
#define ARCH_CAP_RFDS_CLEAR BIT(28) /*
* VERW clears CPU Register
* File.
*/
#define MSR_IA32_FLUSH_CMD 0x0000010b
#define L1D_FLUSH BIT(0) /*

View File

@ -155,11 +155,20 @@
.Lskip_rsb_\@:
.endm
/*
* The CALL to srso_alias_untrain_ret() must be patched in directly at
* the spot where untraining must be done, ie., srso_alias_untrain_ret()
* must be the target of a CALL instruction instead of indirectly
* jumping to a wrapper which then calls it. Therefore, this macro is
* called outside of __UNTRAIN_RET below, for the time being, before the
* kernel can support nested alternatives with arbitrary nesting.
*/
.macro CALL_UNTRAIN_RET
#ifdef CONFIG_CPU_UNRET_ENTRY
#define CALL_UNTRAIN_RET "call entry_untrain_ret"
#else
#define CALL_UNTRAIN_RET ""
ALTERNATIVE_2 "", "call entry_untrain_ret", X86_FEATURE_UNRET, \
"call srso_alias_untrain_ret", X86_FEATURE_SRSO_ALIAS
#endif
.endm
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. Requires KERNEL CR3 because the
@ -176,12 +185,37 @@
#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY) || \
defined(CONFIG_CPU_SRSO)
ANNOTATE_UNRET_END
ALTERNATIVE_2 "", \
CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \
"call entry_ibpb", X86_FEATURE_ENTRY_IBPB
CALL_UNTRAIN_RET
ALTERNATIVE "", "call entry_ibpb", X86_FEATURE_ENTRY_IBPB
#endif
.endm
/*
* Macro to execute VERW instruction that mitigate transient data sampling
* attacks such as MDS. On affected systems a microcode update overloaded VERW
* instruction to also clear the CPU buffers. VERW clobbers CFLAGS.ZF.
*
* Note: Only the memory operand variant of VERW clears the CPU buffers.
*/
.macro CLEAR_CPU_BUFFERS
ALTERNATIVE "jmp .Lskip_verw_\@", "", X86_FEATURE_CLEAR_CPU_BUF
verw _ASM_RIP(mds_verw_sel)
.Lskip_verw_\@:
.endm
#ifdef CONFIG_X86_64
.macro CLEAR_BRANCH_HISTORY
ALTERNATIVE "", "call clear_bhb_loop", X86_FEATURE_CLEAR_BHB_LOOP
.endm
.macro CLEAR_BRANCH_HISTORY_VMEXIT
ALTERNATIVE "", "call clear_bhb_loop", X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT
.endm
#else
#define CLEAR_BRANCH_HISTORY
#define CLEAR_BRANCH_HISTORY_VMEXIT
#endif
#else /* __ASSEMBLY__ */
#define ANNOTATE_RETPOLINE_SAFE \
@ -207,6 +241,10 @@ extern void srso_alias_untrain_ret(void);
extern void entry_untrain_ret(void);
extern void entry_ibpb(void);
#ifdef CONFIG_X86_64
extern void clear_bhb_loop(void);
#endif
extern void (*x86_return_thunk)(void);
#ifdef CONFIG_RETPOLINE
@ -357,13 +395,14 @@ DECLARE_STATIC_KEY_FALSE(switch_to_cond_stibp);
DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
DECLARE_STATIC_KEY_FALSE(mds_user_clear);
DECLARE_STATIC_KEY_FALSE(mds_idle_clear);
DECLARE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);
DECLARE_STATIC_KEY_FALSE(mmio_stale_data_clear);
extern u16 mds_verw_sel;
#include <asm/segment.h>
/**
@ -389,17 +428,6 @@ static __always_inline void mds_clear_cpu_buffers(void)
asm volatile("verw %[ds]" : : [ds] "m" (ds) : "cc");
}
/**
* mds_user_clear_cpu_buffers - Mitigation for MDS and TAA vulnerability
*
* Clear CPU buffers if the corresponding static key is enabled
*/
static __always_inline void mds_user_clear_cpu_buffers(void)
{
if (static_branch_likely(&mds_user_clear))
mds_clear_cpu_buffers();
}
/**
* mds_idle_clear_cpu_buffers - Mitigation for MDS vulnerability
*

View File

@ -103,6 +103,7 @@
#define REQUIRED_MASK18 0
#define REQUIRED_MASK19 0
#define REQUIRED_MASK20 0
#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 21)
#define REQUIRED_MASK21 0
#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 22)
#endif /* _ASM_X86_REQUIRED_FEATURES_H */

View File

@ -12,11 +12,6 @@
/* image of the saved processor state */
struct saved_context {
/*
* On x86_32, all segment registers except gs are saved at kernel
* entry in pt_regs.
*/
u16 gs;
unsigned long cr0, cr2, cr3, cr4;
u64 misc_enable;
struct saved_msrs saved_msrs;
@ -27,6 +22,11 @@ struct saved_context {
unsigned long tr;
unsigned long safety;
unsigned long return_address;
/*
* On x86_32, all segment registers except gs are saved at kernel
* entry in pt_regs.
*/
u16 gs;
bool misc_enable_saved;
} __attribute__((packed));

View File

@ -16,19 +16,17 @@
#include <asm/thread_info.h> /* for TS_COMPAT */
#include <asm/unistd.h>
/* This is used purely for kernel/trace/trace_syscalls.c */
typedef long (*sys_call_ptr_t)(const struct pt_regs *);
extern const sys_call_ptr_t sys_call_table[];
#if defined(CONFIG_X86_32)
#define ia32_sys_call_table sys_call_table
#else
/*
* These may not exist, but still put the prototypes in so we
* can use IS_ENABLED().
*/
extern const sys_call_ptr_t ia32_sys_call_table[];
extern const sys_call_ptr_t x32_sys_call_table[];
#endif
extern long ia32_sys_call(const struct pt_regs *, unsigned int nr);
extern long x32_sys_call(const struct pt_regs *, unsigned int nr);
extern long x64_sys_call(const struct pt_regs *, unsigned int nr);
/*
* Only the low 32 bits of orig_ax are meaningful, so we return int.

View File

@ -181,6 +181,37 @@ void int3_emulate_ret(struct pt_regs *regs)
unsigned long ip = int3_emulate_pop(regs);
int3_emulate_jmp(regs, ip);
}
static __always_inline
void int3_emulate_jcc(struct pt_regs *regs, u8 cc, unsigned long ip, unsigned long disp)
{
static const unsigned long jcc_mask[6] = {
[0] = X86_EFLAGS_OF,
[1] = X86_EFLAGS_CF,
[2] = X86_EFLAGS_ZF,
[3] = X86_EFLAGS_CF | X86_EFLAGS_ZF,
[4] = X86_EFLAGS_SF,
[5] = X86_EFLAGS_PF,
};
bool invert = cc & 1;
bool match;
if (cc < 0xc) {
match = regs->flags & jcc_mask[cc >> 1];
} else {
match = ((regs->flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^
((regs->flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT);
if (cc >= 0xe)
match = match || (regs->flags & X86_EFLAGS_ZF);
}
if ((match && !invert) || (!match && invert))
ip += disp;
int3_emulate_jmp(regs, ip);
}
#endif /* !CONFIG_UML_X86 */
#endif /* _ASM_X86_TEXT_PATCHING_H */

View File

@ -351,6 +351,12 @@ next:
kasan_enable_current();
}
static inline bool is_jcc32(struct insn *insn)
{
/* Jcc.d32 second opcode byte is in the range: 0x80-0x8f */
return insn->opcode.bytes[0] == 0x0f && (insn->opcode.bytes[1] & 0xf0) == 0x80;
}
#if defined(CONFIG_RETPOLINE) && defined(CONFIG_STACK_VALIDATION)
/*
@ -1201,6 +1207,11 @@ void text_poke_sync(void)
on_each_cpu(do_sync_core, NULL, 1);
}
/*
* NOTE: crazy scheme to allow patching Jcc.d32 but not increase the size of
* this thing. When len == 6 everything is prefixed with 0x0f and we map
* opcode to Jcc.d8, using len to distinguish.
*/
struct text_poke_loc {
/* addr := _stext + rel_addr */
s32 rel_addr;
@ -1322,6 +1333,10 @@ noinstr int poke_int3_handler(struct pt_regs *regs)
int3_emulate_jmp(regs, (long)ip + tp->disp);
break;
case 0x70 ... 0x7f: /* Jcc */
int3_emulate_jcc(regs, tp->opcode & 0xf, (long)ip, tp->disp);
break;
default:
BUG();
}
@ -1395,16 +1410,26 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
* Second step: update all but the first byte of the patched range.
*/
for (do_sync = 0, i = 0; i < nr_entries; i++) {
u8 old[POKE_MAX_OPCODE_SIZE] = { tp[i].old, };
u8 old[POKE_MAX_OPCODE_SIZE+1] = { tp[i].old, };
u8 _new[POKE_MAX_OPCODE_SIZE+1];
const u8 *new = tp[i].text;
int len = tp[i].len;
if (len - INT3_INSN_SIZE > 0) {
memcpy(old + INT3_INSN_SIZE,
text_poke_addr(&tp[i]) + INT3_INSN_SIZE,
len - INT3_INSN_SIZE);
if (len == 6) {
_new[0] = 0x0f;
memcpy(_new + 1, new, 5);
new = _new;
}
text_poke(text_poke_addr(&tp[i]) + INT3_INSN_SIZE,
(const char *)tp[i].text + INT3_INSN_SIZE,
new + INT3_INSN_SIZE,
len - INT3_INSN_SIZE);
do_sync++;
}
@ -1432,8 +1457,7 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
* The old instruction is recorded so that the event can be
* processed forwards or backwards.
*/
perf_event_text_poke(text_poke_addr(&tp[i]), old, len,
tp[i].text, len);
perf_event_text_poke(text_poke_addr(&tp[i]), old, len, new, len);
}
if (do_sync) {
@ -1450,10 +1474,15 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries
* replacing opcode.
*/
for (do_sync = 0, i = 0; i < nr_entries; i++) {
if (tp[i].text[0] == INT3_INSN_OPCODE)
u8 byte = tp[i].text[0];
if (tp[i].len == 6)
byte = 0x0f;
if (byte == INT3_INSN_OPCODE)
continue;
text_poke(text_poke_addr(&tp[i]), tp[i].text, INT3_INSN_SIZE);
text_poke(text_poke_addr(&tp[i]), &byte, INT3_INSN_SIZE);
do_sync++;
}
@ -1471,9 +1500,11 @@ static void text_poke_loc_init(struct text_poke_loc *tp, void *addr,
const void *opcode, size_t len, const void *emulate)
{
struct insn insn;
int ret, i;
int ret, i = 0;
memcpy((void *)tp->text, opcode, len);
if (len == 6)
i = 1;
memcpy((void *)tp->text, opcode+i, len-i);
if (!emulate)
emulate = opcode;
@ -1484,6 +1515,13 @@ static void text_poke_loc_init(struct text_poke_loc *tp, void *addr,
tp->len = len;
tp->opcode = insn.opcode.bytes[0];
if (is_jcc32(&insn)) {
/*
* Map Jcc.d32 onto Jcc.d8 and use len to distinguish.
*/
tp->opcode = insn.opcode.bytes[1] - 0x10;
}
switch (tp->opcode) {
case RET_INSN_OPCODE:
case JMP32_INSN_OPCODE:
@ -1500,7 +1538,6 @@ static void text_poke_loc_init(struct text_poke_loc *tp, void *addr,
BUG_ON(len != insn.length);
};
switch (tp->opcode) {
case INT3_INSN_OPCODE:
case RET_INSN_OPCODE:
@ -1509,6 +1546,7 @@ static void text_poke_loc_init(struct text_poke_loc *tp, void *addr,
case CALL_INSN_OPCODE:
case JMP32_INSN_OPCODE:
case JMP8_INSN_OPCODE:
case 0x70 ... 0x7f: /* Jcc */
tp->disp = insn.immediate.value;
break;

View File

@ -1021,11 +1021,11 @@ static bool cpu_has_zenbleed_microcode(void)
u32 good_rev = 0;
switch (boot_cpu_data.x86_model) {
case 0x30 ... 0x3f: good_rev = 0x0830107a; break;
case 0x60 ... 0x67: good_rev = 0x0860010b; break;
case 0x68 ... 0x6f: good_rev = 0x08608105; break;
case 0x70 ... 0x7f: good_rev = 0x08701032; break;
case 0xa0 ... 0xaf: good_rev = 0x08a00008; break;
case 0x30 ... 0x3f: good_rev = 0x0830107b; break;
case 0x60 ... 0x67: good_rev = 0x0860010c; break;
case 0x68 ... 0x6f: good_rev = 0x08608107; break;
case 0x70 ... 0x7f: good_rev = 0x08701033; break;
case 0xa0 ... 0xaf: good_rev = 0x08a00009; break;
default:
return false;

View File

@ -110,9 +110,6 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
/* Control unconditional IBPB in switch_mm() */
DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
/* Control MDS CPU buffer clear before returning to user space */
DEFINE_STATIC_KEY_FALSE(mds_user_clear);
EXPORT_SYMBOL_GPL(mds_user_clear);
/* Control MDS CPU buffer clear before idling (halt, mwait) */
DEFINE_STATIC_KEY_FALSE(mds_idle_clear);
EXPORT_SYMBOL_GPL(mds_idle_clear);
@ -258,7 +255,7 @@ static void __init mds_select_mitigation(void)
if (!boot_cpu_has(X86_FEATURE_MD_CLEAR))
mds_mitigation = MDS_MITIGATION_VMWERV;
static_branch_enable(&mds_user_clear);
setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
if (!boot_cpu_has(X86_BUG_MSBDS_ONLY) &&
(mds_nosmt || cpu_mitigations_auto_nosmt()))
@ -362,7 +359,7 @@ static void __init taa_select_mitigation(void)
* For guests that can't determine whether the correct microcode is
* present on host, enable the mitigation for UCODE_NEEDED as well.
*/
static_branch_enable(&mds_user_clear);
setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
if (taa_nosmt || cpu_mitigations_auto_nosmt())
cpu_smt_disable(false);
@ -430,7 +427,14 @@ static void __init mmio_select_mitigation(void)
*/
if (boot_cpu_has_bug(X86_BUG_MDS) || (boot_cpu_has_bug(X86_BUG_TAA) &&
boot_cpu_has(X86_FEATURE_RTM)))
static_branch_enable(&mds_user_clear);
setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
/*
* X86_FEATURE_CLEAR_CPU_BUF could be enabled by other VERW based
* mitigations, disable KVM-only mitigation in that case.
*/
if (boot_cpu_has(X86_FEATURE_CLEAR_CPU_BUF))
static_branch_disable(&mmio_stale_data_clear);
else
static_branch_enable(&mmio_stale_data_clear);
@ -482,6 +486,57 @@ static int __init mmio_stale_data_parse_cmdline(char *str)
}
early_param("mmio_stale_data", mmio_stale_data_parse_cmdline);
#undef pr_fmt
#define pr_fmt(fmt) "Register File Data Sampling: " fmt
enum rfds_mitigations {
RFDS_MITIGATION_OFF,
RFDS_MITIGATION_VERW,
RFDS_MITIGATION_UCODE_NEEDED,
};
/* Default mitigation for Register File Data Sampling */
static enum rfds_mitigations rfds_mitigation __ro_after_init =
IS_ENABLED(CONFIG_MITIGATION_RFDS) ? RFDS_MITIGATION_VERW : RFDS_MITIGATION_OFF;
static const char * const rfds_strings[] = {
[RFDS_MITIGATION_OFF] = "Vulnerable",
[RFDS_MITIGATION_VERW] = "Mitigation: Clear Register File",
[RFDS_MITIGATION_UCODE_NEEDED] = "Vulnerable: No microcode",
};
static void __init rfds_select_mitigation(void)
{
if (!boot_cpu_has_bug(X86_BUG_RFDS) || cpu_mitigations_off()) {
rfds_mitigation = RFDS_MITIGATION_OFF;
return;
}
if (rfds_mitigation == RFDS_MITIGATION_OFF)
return;
if (x86_read_arch_cap_msr() & ARCH_CAP_RFDS_CLEAR)
setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
else
rfds_mitigation = RFDS_MITIGATION_UCODE_NEEDED;
}
static __init int rfds_parse_cmdline(char *str)
{
if (!str)
return -EINVAL;
if (!boot_cpu_has_bug(X86_BUG_RFDS))
return 0;
if (!strcmp(str, "off"))
rfds_mitigation = RFDS_MITIGATION_OFF;
else if (!strcmp(str, "on"))
rfds_mitigation = RFDS_MITIGATION_VERW;
return 0;
}
early_param("reg_file_data_sampling", rfds_parse_cmdline);
#undef pr_fmt
#define pr_fmt(fmt) "" fmt
@ -490,12 +545,12 @@ static void __init md_clear_update_mitigation(void)
if (cpu_mitigations_off())
return;
if (!static_key_enabled(&mds_user_clear))
if (!boot_cpu_has(X86_FEATURE_CLEAR_CPU_BUF))
goto out;
/*
* mds_user_clear is now enabled. Update MDS, TAA and MMIO Stale Data
* mitigation, if necessary.
* X86_FEATURE_CLEAR_CPU_BUF is now enabled. Update MDS, TAA and MMIO
* Stale Data mitigation, if necessary.
*/
if (mds_mitigation == MDS_MITIGATION_OFF &&
boot_cpu_has_bug(X86_BUG_MDS)) {
@ -507,11 +562,19 @@ static void __init md_clear_update_mitigation(void)
taa_mitigation = TAA_MITIGATION_VERW;
taa_select_mitigation();
}
if (mmio_mitigation == MMIO_MITIGATION_OFF &&
boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA)) {
/*
* MMIO_MITIGATION_OFF is not checked here so that mmio_stale_data_clear
* gets updated correctly as per X86_FEATURE_CLEAR_CPU_BUF state.
*/
if (boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA)) {
mmio_mitigation = MMIO_MITIGATION_VERW;
mmio_select_mitigation();
}
if (rfds_mitigation == RFDS_MITIGATION_OFF &&
boot_cpu_has_bug(X86_BUG_RFDS)) {
rfds_mitigation = RFDS_MITIGATION_VERW;
rfds_select_mitigation();
}
out:
if (boot_cpu_has_bug(X86_BUG_MDS))
pr_info("MDS: %s\n", mds_strings[mds_mitigation]);
@ -521,6 +584,8 @@ out:
pr_info("MMIO Stale Data: %s\n", mmio_strings[mmio_mitigation]);
else if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN))
pr_info("MMIO Stale Data: Unknown: No mitigations\n");
if (boot_cpu_has_bug(X86_BUG_RFDS))
pr_info("Register File Data Sampling: %s\n", rfds_strings[rfds_mitigation]);
}
static void __init md_clear_select_mitigation(void)
@ -528,11 +593,12 @@ static void __init md_clear_select_mitigation(void)
mds_select_mitigation();
taa_select_mitigation();
mmio_select_mitigation();
rfds_select_mitigation();
/*
* As MDS, TAA and MMIO Stale Data mitigations are inter-related, update
* and print their mitigation after MDS, TAA and MMIO Stale Data
* mitigation selection is done.
* As these mitigations are inter-related and rely on VERW instruction
* to clear the microarchitural buffers, update and print their status
* after mitigation selection is done for each of these vulnerabilities.
*/
md_clear_update_mitigation();
}
@ -1288,19 +1354,21 @@ spectre_v2_user_select_mitigation(void)
}
/*
* If no STIBP, enhanced IBRS is enabled, or SMT impossible, STIBP
* If no STIBP, Intel enhanced IBRS is enabled, or SMT impossible, STIBP
* is not required.
*
* Enhanced IBRS also protects against cross-thread branch target
* Intel's Enhanced IBRS also protects against cross-thread branch target
* injection in user-mode as the IBRS bit remains always set which
* implicitly enables cross-thread protections. However, in legacy IBRS
* mode, the IBRS bit is set only on kernel entry and cleared on return
* to userspace. This disables the implicit cross-thread protection,
* so allow for STIBP to be selected in that case.
* to userspace. AMD Automatic IBRS also does not protect userspace.
* These modes therefore disable the implicit cross-thread protection,
* so allow for STIBP to be selected in those cases.
*/
if (!boot_cpu_has(X86_FEATURE_STIBP) ||
!smt_possible ||
spectre_v2_in_eibrs_mode(spectre_v2_enabled))
(spectre_v2_in_eibrs_mode(spectre_v2_enabled) &&
!boot_cpu_has(X86_FEATURE_AUTOIBRS)))
return;
/*
@ -1330,9 +1398,9 @@ static const char * const spectre_v2_strings[] = {
[SPECTRE_V2_NONE] = "Vulnerable",
[SPECTRE_V2_RETPOLINE] = "Mitigation: Retpolines",
[SPECTRE_V2_LFENCE] = "Mitigation: LFENCE",
[SPECTRE_V2_EIBRS] = "Mitigation: Enhanced IBRS",
[SPECTRE_V2_EIBRS_LFENCE] = "Mitigation: Enhanced IBRS + LFENCE",
[SPECTRE_V2_EIBRS_RETPOLINE] = "Mitigation: Enhanced IBRS + Retpolines",
[SPECTRE_V2_EIBRS] = "Mitigation: Enhanced / Automatic IBRS",
[SPECTRE_V2_EIBRS_LFENCE] = "Mitigation: Enhanced / Automatic IBRS + LFENCE",
[SPECTRE_V2_EIBRS_RETPOLINE] = "Mitigation: Enhanced / Automatic IBRS + Retpolines",
[SPECTRE_V2_IBRS] = "Mitigation: IBRS",
};
@ -1401,7 +1469,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
cmd == SPECTRE_V2_CMD_EIBRS_LFENCE ||
cmd == SPECTRE_V2_CMD_EIBRS_RETPOLINE) &&
!boot_cpu_has(X86_FEATURE_IBRS_ENHANCED)) {
pr_err("%s selected but CPU doesn't have eIBRS. Switching to AUTO select\n",
pr_err("%s selected but CPU doesn't have Enhanced or Automatic IBRS. Switching to AUTO select\n",
mitigation_options[i].option);
return SPECTRE_V2_CMD_AUTO;
}
@ -1516,6 +1584,79 @@ static void __init spectre_v2_determine_rsb_fill_type_at_vmexit(enum spectre_v2_
dump_stack();
}
/*
* Set BHI_DIS_S to prevent indirect branches in kernel to be influenced by
* branch history in userspace. Not needed if BHI_NO is set.
*/
static bool __init spec_ctrl_bhi_dis(void)
{
if (!boot_cpu_has(X86_FEATURE_BHI_CTRL))
return false;
x86_spec_ctrl_base |= SPEC_CTRL_BHI_DIS_S;
update_spec_ctrl(x86_spec_ctrl_base);
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_HW);
return true;
}
enum bhi_mitigations {
BHI_MITIGATION_OFF,
BHI_MITIGATION_ON,
BHI_MITIGATION_AUTO,
};
static enum bhi_mitigations bhi_mitigation __ro_after_init =
IS_ENABLED(CONFIG_SPECTRE_BHI_ON) ? BHI_MITIGATION_ON :
IS_ENABLED(CONFIG_SPECTRE_BHI_OFF) ? BHI_MITIGATION_OFF :
BHI_MITIGATION_AUTO;
static int __init spectre_bhi_parse_cmdline(char *str)
{
if (!str)
return -EINVAL;
if (!strcmp(str, "off"))
bhi_mitigation = BHI_MITIGATION_OFF;
else if (!strcmp(str, "on"))
bhi_mitigation = BHI_MITIGATION_ON;
else if (!strcmp(str, "auto"))
bhi_mitigation = BHI_MITIGATION_AUTO;
else
pr_err("Ignoring unknown spectre_bhi option (%s)", str);
return 0;
}
early_param("spectre_bhi", spectre_bhi_parse_cmdline);
static void __init bhi_select_mitigation(void)
{
if (bhi_mitigation == BHI_MITIGATION_OFF)
return;
/* Retpoline mitigates against BHI unless the CPU has RRSBA behavior */
if (cpu_feature_enabled(X86_FEATURE_RETPOLINE) &&
!(x86_read_arch_cap_msr() & ARCH_CAP_RRSBA))
return;
if (spec_ctrl_bhi_dis())
return;
if (!IS_ENABLED(CONFIG_X86_64))
return;
/* Mitigate KVM by default */
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT);
pr_info("Spectre BHI mitigation: SW BHB clearing on vm exit\n");
if (bhi_mitigation == BHI_MITIGATION_AUTO)
return;
/* Mitigate syscalls when the mitigation is forced =on */
setup_force_cpu_cap(X86_FEATURE_CLEAR_BHB_LOOP);
pr_info("Spectre BHI mitigation: SW BHB clearing on syscall\n");
}
static void __init spectre_v2_select_mitigation(void)
{
enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
@ -1586,9 +1727,13 @@ static void __init spectre_v2_select_mitigation(void)
pr_err(SPECTRE_V2_EIBRS_EBPF_MSG);
if (spectre_v2_in_ibrs_mode(mode)) {
if (boot_cpu_has(X86_FEATURE_AUTOIBRS)) {
msr_set_bit(MSR_EFER, _EFER_AUTOIBRS);
} else {
x86_spec_ctrl_base |= SPEC_CTRL_IBRS;
update_spec_ctrl(x86_spec_ctrl_base);
}
}
switch (mode) {
case SPECTRE_V2_NONE:
@ -1622,6 +1767,9 @@ static void __init spectre_v2_select_mitigation(void)
mode == SPECTRE_V2_RETPOLINE)
spec_ctrl_disable_kernel_rrsba();
if (boot_cpu_has(X86_BUG_BHI))
bhi_select_mitigation();
spectre_v2_enabled = mode;
pr_info("%s\n", spectre_v2_strings[mode]);
@ -1671,8 +1819,8 @@ static void __init spectre_v2_select_mitigation(void)
/*
* Retpoline protects the kernel, but doesn't protect firmware. IBRS
* and Enhanced IBRS protect firmware too, so enable IBRS around
* firmware calls only when IBRS / Enhanced IBRS aren't otherwise
* enabled.
* firmware calls only when IBRS / Enhanced / Automatic IBRS aren't
* otherwise enabled.
*
* Use "mode" to check Enhanced IBRS instead of boot_cpu_has(), because
* the user might select retpoline on the kernel command line and if
@ -2502,16 +2650,16 @@ static const char * const l1tf_vmx_states[] = {
static ssize_t l1tf_show_state(char *buf)
{
if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_AUTO)
return sprintf(buf, "%s\n", L1TF_DEFAULT_MSG);
return sysfs_emit(buf, "%s\n", L1TF_DEFAULT_MSG);
if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_EPT_DISABLED ||
(l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_NEVER &&
sched_smt_active())) {
return sprintf(buf, "%s; VMX: %s\n", L1TF_DEFAULT_MSG,
return sysfs_emit(buf, "%s; VMX: %s\n", L1TF_DEFAULT_MSG,
l1tf_vmx_states[l1tf_vmx_mitigation]);
}
return sprintf(buf, "%s; VMX: %s, SMT %s\n", L1TF_DEFAULT_MSG,
return sysfs_emit(buf, "%s; VMX: %s, SMT %s\n", L1TF_DEFAULT_MSG,
l1tf_vmx_states[l1tf_vmx_mitigation],
sched_smt_active() ? "vulnerable" : "disabled");
}
@ -2520,40 +2668,40 @@ static ssize_t itlb_multihit_show_state(char *buf)
{
if (!boot_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
!boot_cpu_has(X86_FEATURE_VMX))
return sprintf(buf, "KVM: Mitigation: VMX unsupported\n");
return sysfs_emit(buf, "KVM: Mitigation: VMX unsupported\n");
else if (!(cr4_read_shadow() & X86_CR4_VMXE))
return sprintf(buf, "KVM: Mitigation: VMX disabled\n");
return sysfs_emit(buf, "KVM: Mitigation: VMX disabled\n");
else if (itlb_multihit_kvm_mitigation)
return sprintf(buf, "KVM: Mitigation: Split huge pages\n");
return sysfs_emit(buf, "KVM: Mitigation: Split huge pages\n");
else
return sprintf(buf, "KVM: Vulnerable\n");
return sysfs_emit(buf, "KVM: Vulnerable\n");
}
#else
static ssize_t l1tf_show_state(char *buf)
{
return sprintf(buf, "%s\n", L1TF_DEFAULT_MSG);
return sysfs_emit(buf, "%s\n", L1TF_DEFAULT_MSG);
}
static ssize_t itlb_multihit_show_state(char *buf)
{
return sprintf(buf, "Processor vulnerable\n");
return sysfs_emit(buf, "Processor vulnerable\n");
}
#endif
static ssize_t mds_show_state(char *buf)
{
if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
return sprintf(buf, "%s; SMT Host state unknown\n",
return sysfs_emit(buf, "%s; SMT Host state unknown\n",
mds_strings[mds_mitigation]);
}
if (boot_cpu_has(X86_BUG_MSBDS_ONLY)) {
return sprintf(buf, "%s; SMT %s\n", mds_strings[mds_mitigation],
return sysfs_emit(buf, "%s; SMT %s\n", mds_strings[mds_mitigation],
(mds_mitigation == MDS_MITIGATION_OFF ? "vulnerable" :
sched_smt_active() ? "mitigated" : "disabled"));
}
return sprintf(buf, "%s; SMT %s\n", mds_strings[mds_mitigation],
return sysfs_emit(buf, "%s; SMT %s\n", mds_strings[mds_mitigation],
sched_smt_active() ? "vulnerable" : "disabled");
}
@ -2561,14 +2709,14 @@ static ssize_t tsx_async_abort_show_state(char *buf)
{
if ((taa_mitigation == TAA_MITIGATION_TSX_DISABLED) ||
(taa_mitigation == TAA_MITIGATION_OFF))
return sprintf(buf, "%s\n", taa_strings[taa_mitigation]);
return sysfs_emit(buf, "%s\n", taa_strings[taa_mitigation]);
if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
return sprintf(buf, "%s; SMT Host state unknown\n",
return sysfs_emit(buf, "%s; SMT Host state unknown\n",
taa_strings[taa_mitigation]);
}
return sprintf(buf, "%s; SMT %s\n", taa_strings[taa_mitigation],
return sysfs_emit(buf, "%s; SMT %s\n", taa_strings[taa_mitigation],
sched_smt_active() ? "vulnerable" : "disabled");
}
@ -2589,22 +2737,28 @@ static ssize_t mmio_stale_data_show_state(char *buf)
sched_smt_active() ? "vulnerable" : "disabled");
}
static ssize_t rfds_show_state(char *buf)
{
return sysfs_emit(buf, "%s\n", rfds_strings[rfds_mitigation]);
}
static char *stibp_state(void)
{
if (spectre_v2_in_eibrs_mode(spectre_v2_enabled))
if (spectre_v2_in_eibrs_mode(spectre_v2_enabled) &&
!boot_cpu_has(X86_FEATURE_AUTOIBRS))
return "";
switch (spectre_v2_user_stibp) {
case SPECTRE_V2_USER_NONE:
return ", STIBP: disabled";
return "; STIBP: disabled";
case SPECTRE_V2_USER_STRICT:
return ", STIBP: forced";
return "; STIBP: forced";
case SPECTRE_V2_USER_STRICT_PREFERRED:
return ", STIBP: always-on";
return "; STIBP: always-on";
case SPECTRE_V2_USER_PRCTL:
case SPECTRE_V2_USER_SECCOMP:
if (static_key_enabled(&switch_to_cond_stibp))
return ", STIBP: conditional";
return "; STIBP: conditional";
}
return "";
}
@ -2613,10 +2767,10 @@ static char *ibpb_state(void)
{
if (boot_cpu_has(X86_FEATURE_IBPB)) {
if (static_key_enabled(&switch_mm_always_ibpb))
return ", IBPB: always-on";
return "; IBPB: always-on";
if (static_key_enabled(&switch_mm_cond_ibpb))
return ", IBPB: conditional";
return ", IBPB: disabled";
return "; IBPB: conditional";
return "; IBPB: disabled";
}
return "";
}
@ -2626,39 +2780,58 @@ static char *pbrsb_eibrs_state(void)
if (boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB)) {
if (boot_cpu_has(X86_FEATURE_RSB_VMEXIT_LITE) ||
boot_cpu_has(X86_FEATURE_RSB_VMEXIT))
return ", PBRSB-eIBRS: SW sequence";
return "; PBRSB-eIBRS: SW sequence";
else
return ", PBRSB-eIBRS: Vulnerable";
return "; PBRSB-eIBRS: Vulnerable";
} else {
return ", PBRSB-eIBRS: Not affected";
return "; PBRSB-eIBRS: Not affected";
}
}
static const char * const spectre_bhi_state(void)
{
if (!boot_cpu_has_bug(X86_BUG_BHI))
return "; BHI: Not affected";
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_HW))
return "; BHI: BHI_DIS_S";
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP))
return "; BHI: SW loop, KVM: SW loop";
else if (boot_cpu_has(X86_FEATURE_RETPOLINE) &&
!(x86_read_arch_cap_msr() & ARCH_CAP_RRSBA))
return "; BHI: Retpoline";
else if (boot_cpu_has(X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT))
return "; BHI: Syscall hardening, KVM: SW loop";
return "; BHI: Vulnerable (Syscall hardening enabled)";
}
static ssize_t spectre_v2_show_state(char *buf)
{
if (spectre_v2_enabled == SPECTRE_V2_LFENCE)
return sprintf(buf, "Vulnerable: LFENCE\n");
return sysfs_emit(buf, "Vulnerable: LFENCE\n");
if (spectre_v2_enabled == SPECTRE_V2_EIBRS && unprivileged_ebpf_enabled())
return sprintf(buf, "Vulnerable: eIBRS with unprivileged eBPF\n");
return sysfs_emit(buf, "Vulnerable: eIBRS with unprivileged eBPF\n");
if (sched_smt_active() && unprivileged_ebpf_enabled() &&
spectre_v2_enabled == SPECTRE_V2_EIBRS_LFENCE)
return sprintf(buf, "Vulnerable: eIBRS+LFENCE with unprivileged eBPF and SMT\n");
return sysfs_emit(buf, "Vulnerable: eIBRS+LFENCE with unprivileged eBPF and SMT\n");
return sprintf(buf, "%s%s%s%s%s%s%s\n",
return sysfs_emit(buf, "%s%s%s%s%s%s%s%s\n",
spectre_v2_strings[spectre_v2_enabled],
ibpb_state(),
boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "",
boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? "; IBRS_FW" : "",
stibp_state(),
boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? ", RSB filling" : "",
boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? "; RSB filling" : "",
pbrsb_eibrs_state(),
spectre_bhi_state(),
/* this should always be at the end */
spectre_v2_module_string());
}
static ssize_t srbds_show_state(char *buf)
{
return sprintf(buf, "%s\n", srbds_strings[srbds_mitigation]);
return sysfs_emit(buf, "%s\n", srbds_strings[srbds_mitigation]);
}
static ssize_t retbleed_show_state(char *buf)
@ -2667,17 +2840,16 @@ static ssize_t retbleed_show_state(char *buf)
retbleed_mitigation == RETBLEED_MITIGATION_IBPB) {
if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
return sprintf(buf, "Vulnerable: untrained return thunk / IBPB on non-AMD based uarch\n");
return sysfs_emit(buf, "Vulnerable: untrained return thunk / IBPB on non-AMD based uarch\n");
return sprintf(buf, "%s; SMT %s\n",
retbleed_strings[retbleed_mitigation],
return sysfs_emit(buf, "%s; SMT %s\n", retbleed_strings[retbleed_mitigation],
!sched_smt_active() ? "disabled" :
spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT ||
spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED ?
"enabled with STIBP protection" : "vulnerable");
}
return sprintf(buf, "%s\n", retbleed_strings[retbleed_mitigation]);
return sysfs_emit(buf, "%s\n", retbleed_strings[retbleed_mitigation]);
}
static ssize_t gds_show_state(char *buf)
@ -2699,26 +2871,26 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
char *buf, unsigned int bug)
{
if (!boot_cpu_has_bug(bug))
return sprintf(buf, "Not affected\n");
return sysfs_emit(buf, "Not affected\n");
switch (bug) {
case X86_BUG_CPU_MELTDOWN:
if (boot_cpu_has(X86_FEATURE_PTI))
return sprintf(buf, "Mitigation: PTI\n");
return sysfs_emit(buf, "Mitigation: PTI\n");
if (hypervisor_is_type(X86_HYPER_XEN_PV))
return sprintf(buf, "Unknown (XEN PV detected, hypervisor mitigation required)\n");
return sysfs_emit(buf, "Unknown (XEN PV detected, hypervisor mitigation required)\n");
break;
case X86_BUG_SPECTRE_V1:
return sprintf(buf, "%s\n", spectre_v1_strings[spectre_v1_mitigation]);
return sysfs_emit(buf, "%s\n", spectre_v1_strings[spectre_v1_mitigation]);
case X86_BUG_SPECTRE_V2:
return spectre_v2_show_state(buf);
case X86_BUG_SPEC_STORE_BYPASS:
return sprintf(buf, "%s\n", ssb_strings[ssb_mode]);
return sysfs_emit(buf, "%s\n", ssb_strings[ssb_mode]);
case X86_BUG_L1TF:
if (boot_cpu_has(X86_FEATURE_L1TF_PTEINV))
@ -2750,11 +2922,14 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
case X86_BUG_SRSO:
return srso_show_state(buf);
case X86_BUG_RFDS:
return rfds_show_state(buf);
default:
break;
}
return sprintf(buf, "Vulnerable\n");
return sysfs_emit(buf, "Vulnerable\n");
}
ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf)
@ -2824,4 +2999,9 @@ ssize_t cpu_show_spec_rstack_overflow(struct device *dev, struct device_attribut
{
return cpu_show_common(dev, attr, buf, X86_BUG_SRSO);
}
ssize_t cpu_show_reg_file_data_sampling(struct device *dev, struct device_attribute *attr, char *buf)
{
return cpu_show_common(dev, attr, buf, X86_BUG_RFDS);
}
#endif

View File

@ -1036,6 +1036,7 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
#define NO_SPECTRE_V2 BIT(8)
#define NO_MMIO BIT(9)
#define NO_EIBRS_PBRSB BIT(10)
#define NO_BHI BIT(11)
#define VULNWL(vendor, family, model, whitelist) \
X86_MATCH_VENDOR_FAM_MODEL(vendor, family, model, whitelist)
@ -1096,18 +1097,18 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
VULNWL_INTEL(ATOM_TREMONT_D, NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),
/* AMD Family 0xf - 0x12 */
VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_BHI),
VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_BHI),
VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_BHI),
VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_BHI),
/* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */
VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
VULNWL_HYGON(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_EIBRS_PBRSB | NO_BHI),
VULNWL_HYGON(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_EIBRS_PBRSB | NO_BHI),
/* Zhaoxin Family 7 */
VULNWL(CENTAUR, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS | NO_MMIO),
VULNWL(ZHAOXIN, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS | NO_MMIO),
VULNWL(CENTAUR, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS | NO_MMIO | NO_BHI),
VULNWL(ZHAOXIN, 7, X86_MODEL_ANY, NO_SPECTRE_V2 | NO_SWAPGS | NO_MMIO | NO_BHI),
{}
};
@ -1138,6 +1139,8 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
#define SRSO BIT(5)
/* CPU is affected by GDS */
#define GDS BIT(6)
/* CPU is affected by Register File Data Sampling */
#define RFDS BIT(7)
static const struct x86_cpu_id cpu_vuln_blacklist[] __initconst = {
VULNBL_INTEL_STEPPINGS(IVYBRIDGE, X86_STEPPING_ANY, SRBDS),
@ -1165,9 +1168,18 @@ static const struct x86_cpu_id cpu_vuln_blacklist[] __initconst = {
VULNBL_INTEL_STEPPINGS(TIGERLAKE, X86_STEPPING_ANY, GDS),
VULNBL_INTEL_STEPPINGS(LAKEFIELD, X86_STEPPING_ANY, MMIO | MMIO_SBDS | RETBLEED),
VULNBL_INTEL_STEPPINGS(ROCKETLAKE, X86_STEPPING_ANY, MMIO | RETBLEED | GDS),
VULNBL_INTEL_STEPPINGS(ATOM_TREMONT, X86_STEPPING_ANY, MMIO | MMIO_SBDS),
VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_D, X86_STEPPING_ANY, MMIO),
VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_L, X86_STEPPING_ANY, MMIO | MMIO_SBDS),
VULNBL_INTEL_STEPPINGS(ALDERLAKE, X86_STEPPING_ANY, RFDS),
VULNBL_INTEL_STEPPINGS(ALDERLAKE_L, X86_STEPPING_ANY, RFDS),
VULNBL_INTEL_STEPPINGS(RAPTORLAKE, X86_STEPPING_ANY, RFDS),
VULNBL_INTEL_STEPPINGS(RAPTORLAKE_P, X86_STEPPING_ANY, RFDS),
VULNBL_INTEL_STEPPINGS(RAPTORLAKE_S, X86_STEPPING_ANY, RFDS),
VULNBL_INTEL_STEPPINGS(ALDERLAKE_N, X86_STEPPING_ANY, RFDS),
VULNBL_INTEL_STEPPINGS(ATOM_TREMONT, X86_STEPPING_ANY, MMIO | MMIO_SBDS | RFDS),
VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_D, X86_STEPPING_ANY, MMIO | RFDS),
VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_L, X86_STEPPING_ANY, MMIO | MMIO_SBDS | RFDS),
VULNBL_INTEL_STEPPINGS(ATOM_GOLDMONT, X86_STEPPING_ANY, RFDS),
VULNBL_INTEL_STEPPINGS(ATOM_GOLDMONT_D, X86_STEPPING_ANY, RFDS),
VULNBL_INTEL_STEPPINGS(ATOM_GOLDMONT_PLUS, X86_STEPPING_ANY, RFDS),
VULNBL_AMD(0x15, RETBLEED),
VULNBL_AMD(0x16, RETBLEED),
@ -1201,6 +1213,24 @@ static bool arch_cap_mmio_immune(u64 ia32_cap)
ia32_cap & ARCH_CAP_SBDR_SSDP_NO);
}
static bool __init vulnerable_to_rfds(u64 ia32_cap)
{
/* The "immunity" bit trumps everything else: */
if (ia32_cap & ARCH_CAP_RFDS_NO)
return false;
/*
* VMMs set ARCH_CAP_RFDS_CLEAR for processors not in the blacklist to
* indicate that mitigation is needed because guest is running on a
* vulnerable hardware or may migrate to such hardware:
*/
if (ia32_cap & ARCH_CAP_RFDS_CLEAR)
return true;
/* Only consult the blacklist when there is no enumeration: */
return cpu_matches(cpu_vuln_blacklist, RFDS);
}
static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
{
u64 ia32_cap = x86_read_arch_cap_msr();
@ -1223,8 +1253,16 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
!cpu_has(c, X86_FEATURE_AMD_SSB_NO))
setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS);
if (ia32_cap & ARCH_CAP_IBRS_ALL)
/*
* AMD's AutoIBRS is equivalent to Intel's eIBRS - use the Intel feature
* flag and protect from vendor-specific bugs via the whitelist.
*/
if ((ia32_cap & ARCH_CAP_IBRS_ALL) || cpu_has(c, X86_FEATURE_AUTOIBRS)) {
setup_force_cpu_cap(X86_FEATURE_IBRS_ENHANCED);
if (!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
!(ia32_cap & ARCH_CAP_PBRSB_NO))
setup_force_cpu_bug(X86_BUG_EIBRS_PBRSB);
}
if (!cpu_matches(cpu_vuln_whitelist, NO_MDS) &&
!(ia32_cap & ARCH_CAP_MDS_NO)) {
@ -1286,11 +1324,6 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
setup_force_cpu_bug(X86_BUG_RETBLEED);
}
if (cpu_has(c, X86_FEATURE_IBRS_ENHANCED) &&
!cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
!(ia32_cap & ARCH_CAP_PBRSB_NO))
setup_force_cpu_bug(X86_BUG_EIBRS_PBRSB);
if (cpu_matches(cpu_vuln_blacklist, SMT_RSB))
setup_force_cpu_bug(X86_BUG_SMT_RSB);
@ -1309,6 +1342,16 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
setup_force_cpu_bug(X86_BUG_SRSO);
}
if (vulnerable_to_rfds(ia32_cap))
setup_force_cpu_bug(X86_BUG_RFDS);
/* When virtualized, eIBRS could be hidden, assume vulnerable */
if (!(ia32_cap & ARCH_CAP_BHI_NO) &&
!cpu_matches(cpu_vuln_whitelist, NO_BHI) &&
(boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) ||
boot_cpu_has(X86_FEATURE_HYPERVISOR)))
setup_force_cpu_bug(X86_BUG_BHI);
if (cpu_matches(cpu_vuln_whitelist, NO_MELTDOWN))
return;

View File

@ -2382,12 +2382,14 @@ static ssize_t set_bank(struct device *s, struct device_attribute *attr,
return -EINVAL;
b = &per_cpu(mce_banks_array, s->id)[bank];
if (!b->init)
return -ENODEV;
b->ctl = new;
mutex_lock(&mce_sysfs_mutex);
mce_restart();
mutex_unlock(&mce_sysfs_mutex);
return size;
}

View File

@ -27,6 +27,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_APERFMPERF, CPUID_ECX, 0, 0x00000006, 0 },
{ X86_FEATURE_EPB, CPUID_ECX, 3, 0x00000006, 0 },
{ X86_FEATURE_RRSBA_CTRL, CPUID_EDX, 2, 0x00000007, 2 },
{ X86_FEATURE_BHI_CTRL, CPUID_EDX, 4, 0x00000007, 2 },
{ X86_FEATURE_CQM_LLC, CPUID_EDX, 1, 0x0000000f, 0 },
{ X86_FEATURE_CQM_OCCUP_LLC, CPUID_EDX, 0, 0x0000000f, 1 },
{ X86_FEATURE_CQM_MBM_TOTAL, CPUID_EDX, 1, 0x0000000f, 1 },

View File

@ -463,50 +463,26 @@ static void kprobe_emulate_call(struct kprobe *p, struct pt_regs *regs)
}
NOKPROBE_SYMBOL(kprobe_emulate_call);
static nokprobe_inline
void __kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs, bool cond)
static void kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs)
{
unsigned long ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
if (cond)
ip += p->ainsn.rel32;
int3_emulate_jmp(regs, ip);
}
static void kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs)
{
__kprobe_emulate_jmp(p, regs, true);
}
NOKPROBE_SYMBOL(kprobe_emulate_jmp);
static const unsigned long jcc_mask[6] = {
[0] = X86_EFLAGS_OF,
[1] = X86_EFLAGS_CF,
[2] = X86_EFLAGS_ZF,
[3] = X86_EFLAGS_CF | X86_EFLAGS_ZF,
[4] = X86_EFLAGS_SF,
[5] = X86_EFLAGS_PF,
};
static void kprobe_emulate_jcc(struct kprobe *p, struct pt_regs *regs)
{
bool invert = p->ainsn.jcc.type & 1;
bool match;
unsigned long ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
if (p->ainsn.jcc.type < 0xc) {
match = regs->flags & jcc_mask[p->ainsn.jcc.type >> 1];
} else {
match = ((regs->flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^
((regs->flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT);
if (p->ainsn.jcc.type >= 0xe)
match = match || (regs->flags & X86_EFLAGS_ZF);
}
__kprobe_emulate_jmp(p, regs, (match && !invert) || (!match && invert));
int3_emulate_jcc(regs, p->ainsn.jcc.type, ip, p->ainsn.rel32);
}
NOKPROBE_SYMBOL(kprobe_emulate_jcc);
static void kprobe_emulate_loop(struct kprobe *p, struct pt_regs *regs)
{
unsigned long ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
bool match;
if (p->ainsn.loop.type != 3) { /* LOOP* */
@ -534,7 +510,9 @@ static void kprobe_emulate_loop(struct kprobe *p, struct pt_regs *regs)
else if (p->ainsn.loop.type == 1) /* LOOPE */
match = match && (regs->flags & X86_EFLAGS_ZF);
__kprobe_emulate_jmp(p, regs, match);
if (match)
ip += p->ainsn.rel32;
int3_emulate_jmp(regs, ip);
}
NOKPROBE_SYMBOL(kprobe_emulate_loop);

View File

@ -519,9 +519,6 @@ nmi_restart:
write_cr2(this_cpu_read(nmi_cr2));
if (this_cpu_dec_return(nmi_state))
goto nmi_restart;
if (user_mode(regs))
mds_user_clear_cpu_buffers();
}
#if defined(CONFIG_X86_64) && IS_ENABLED(CONFIG_KVM_INTEL)

View File

@ -9,6 +9,7 @@ enum insn_type {
NOP = 1, /* site cond-call */
JMP = 2, /* tramp / site tail-call */
RET = 3, /* tramp / site cond-tail-call */
JCC = 4,
};
/*
@ -25,12 +26,40 @@ static const u8 xor5rax[] = { 0x2e, 0x2e, 0x2e, 0x31, 0xc0 };
static const u8 retinsn[] = { RET_INSN_OPCODE, 0xcc, 0xcc, 0xcc, 0xcc };
static u8 __is_Jcc(u8 *insn) /* Jcc.d32 */
{
u8 ret = 0;
if (insn[0] == 0x0f) {
u8 tmp = insn[1];
if ((tmp & 0xf0) == 0x80)
ret = tmp;
}
return ret;
}
extern void __static_call_return(void);
asm (".global __static_call_return\n\t"
".type __static_call_return, @function\n\t"
ASM_FUNC_ALIGN "\n\t"
"__static_call_return:\n\t"
ANNOTATE_NOENDBR
ANNOTATE_RETPOLINE_SAFE
"ret; int3\n\t"
".size __static_call_return, . - __static_call_return \n\t");
static void __ref __static_call_transform(void *insn, enum insn_type type,
void *func, bool modinit)
{
const void *emulate = NULL;
int size = CALL_INSN_SIZE;
const void *code;
u8 op, buf[6];
if ((type == JMP || type == RET) && (op = __is_Jcc(insn)))
type = JCC;
switch (type) {
case CALL:
@ -56,6 +85,20 @@ static void __ref __static_call_transform(void *insn, enum insn_type type,
else
code = &retinsn;
break;
case JCC:
if (!func) {
func = __static_call_return;
if (cpu_feature_enabled(X86_FEATURE_RETHUNK))
func = x86_return_thunk;
}
buf[0] = 0x0f;
__text_gen_insn(buf+1, op, insn+1, func, 5);
code = buf;
size = 6;
break;
}
if (memcmp(insn, code, size) == 0)
@ -67,13 +110,14 @@ static void __ref __static_call_transform(void *insn, enum insn_type type,
text_poke_bp(insn, code, size, emulate);
}
static void __static_call_validate(void *insn, bool tail)
static void __static_call_validate(u8 *insn, bool tail)
{
u8 opcode = *(u8 *)insn;
u8 opcode = insn[0];
if (tail) {
if (opcode == JMP32_INSN_OPCODE ||
opcode == RET_INSN_OPCODE)
opcode == RET_INSN_OPCODE ||
__is_Jcc(insn))
return;
} else {
if (opcode == CALL_INSN_OPCODE ||

View File

@ -355,9 +355,9 @@ static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
}
static __always_inline
void kvm_cpu_cap_init_scattered(enum kvm_only_cpuid_leafs leaf, u32 mask)
void kvm_cpu_cap_init_kvm_defined(enum kvm_only_cpuid_leafs leaf, u32 mask)
{
/* Use kvm_cpu_cap_mask for non-scattered leafs. */
/* Use kvm_cpu_cap_mask for leafs that aren't KVM-only. */
BUILD_BUG_ON(leaf < NCAPINTS);
kvm_cpu_caps[leaf] = mask;
@ -367,7 +367,7 @@ void kvm_cpu_cap_init_scattered(enum kvm_only_cpuid_leafs leaf, u32 mask)
static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
{
/* Use kvm_cpu_cap_init_scattered for scattered leafs. */
/* Use kvm_cpu_cap_init_kvm_defined for KVM-only leafs. */
BUILD_BUG_ON(leaf >= NCAPINTS);
kvm_cpu_caps[leaf] &= mask;
@ -469,11 +469,16 @@ void kvm_set_cpu_caps(void)
F(AVX_VNNI) | F(AVX512_BF16)
);
kvm_cpu_cap_init_kvm_defined(CPUID_7_2_EDX,
F(INTEL_PSFD) | F(IPRED_CTRL) | F(RRSBA_CTRL) | F(DDPD_U) |
F(BHI_CTRL) | F(MCDT_NO)
);
kvm_cpu_cap_mask(CPUID_D_1_EAX,
F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES)
);
kvm_cpu_cap_init_scattered(CPUID_12_EAX,
kvm_cpu_cap_init_kvm_defined(CPUID_12_EAX,
SF(SGX1) | SF(SGX2)
);
@ -710,13 +715,13 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
break;
/* function 7 has additional index. */
case 7:
entry->eax = min(entry->eax, 1u);
max_idx = entry->eax = min(entry->eax, 2u);
cpuid_entry_override(entry, CPUID_7_0_EBX);
cpuid_entry_override(entry, CPUID_7_ECX);
cpuid_entry_override(entry, CPUID_7_EDX);
/* KVM only supports 0x7.0 and 0x7.1, capped above via min(). */
if (entry->eax == 1) {
/* KVM only supports up to 0x7.2, capped above via min(). */
if (max_idx >= 1) {
entry = do_host_cpuid(array, function, 1);
if (!entry)
goto out;
@ -726,6 +731,16 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
entry->ecx = 0;
entry->edx = 0;
}
if (max_idx >= 2) {
entry = do_host_cpuid(array, function, 2);
if (!entry)
goto out;
cpuid_entry_override(entry, CPUID_7_2_EDX);
entry->ecx = 0;
entry->ebx = 0;
entry->eax = 0;
}
break;
case 0xa: { /* Architectural Performance Monitoring */
struct x86_pmu_capability cap;

View File

@ -7,23 +7,44 @@
#include <asm/cpufeatures.h>
/*
* Hardware-defined CPUID leafs that are scattered in the kernel, but need to
* be directly used by KVM. Note, these word values conflict with the kernel's
* "bug" caps, but KVM doesn't use those.
* Hardware-defined CPUID leafs that are either scattered by the kernel or are
* unknown to the kernel, but need to be directly used by KVM. Note, these
* word values conflict with the kernel's "bug" caps, but KVM doesn't use those.
*/
enum kvm_only_cpuid_leafs {
CPUID_12_EAX = NCAPINTS,
CPUID_7_2_EDX,
NR_KVM_CPU_CAPS,
NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
};
/*
* Define a KVM-only feature flag.
*
* For features that are scattered by cpufeatures.h, __feature_translate() also
* needs to be updated to translate the kernel-defined feature into the
* KVM-defined feature.
*
* For features that are 100% KVM-only, i.e. not defined by cpufeatures.h,
* forego the intermediate KVM_X86_FEATURE and directly define X86_FEATURE_* so
* that X86_FEATURE_* can be used in KVM. No __feature_translate() handling is
* needed in this case.
*/
#define KVM_X86_FEATURE(w, f) ((w)*32 + (f))
/* Intel-defined SGX sub-features, CPUID level 0x12 (EAX). */
#define KVM_X86_FEATURE_SGX1 KVM_X86_FEATURE(CPUID_12_EAX, 0)
#define KVM_X86_FEATURE_SGX2 KVM_X86_FEATURE(CPUID_12_EAX, 1)
/* Intel-defined sub-features, CPUID level 0x00000007:2 (EDX) */
#define X86_FEATURE_INTEL_PSFD KVM_X86_FEATURE(CPUID_7_2_EDX, 0)
#define X86_FEATURE_IPRED_CTRL KVM_X86_FEATURE(CPUID_7_2_EDX, 1)
#define KVM_X86_FEATURE_RRSBA_CTRL KVM_X86_FEATURE(CPUID_7_2_EDX, 2)
#define X86_FEATURE_DDPD_U KVM_X86_FEATURE(CPUID_7_2_EDX, 3)
#define KVM_X86_FEATURE_BHI_CTRL KVM_X86_FEATURE(CPUID_7_2_EDX, 4)
#define X86_FEATURE_MCDT_NO KVM_X86_FEATURE(CPUID_7_2_EDX, 5)
struct cpuid_reg {
u32 function;
u32 index;
@ -49,6 +70,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
[CPUID_12_EAX] = {0x00000012, 0, CPUID_EAX},
[CPUID_8000_001F_EAX] = {0x8000001f, 0, CPUID_EAX},
[CPUID_8000_0021_EAX] = {0x80000021, 0, CPUID_EAX},
[CPUID_7_2_EDX] = { 7, 2, CPUID_EDX},
};
/*
@ -61,10 +83,12 @@ static const struct cpuid_reg reverse_cpuid[] = {
*/
static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
{
BUILD_BUG_ON(NR_CPUID_WORDS != NCAPINTS);
BUILD_BUG_ON(x86_leaf == CPUID_LNX_1);
BUILD_BUG_ON(x86_leaf == CPUID_LNX_2);
BUILD_BUG_ON(x86_leaf == CPUID_LNX_3);
BUILD_BUG_ON(x86_leaf == CPUID_LNX_4);
BUILD_BUG_ON(x86_leaf == CPUID_LNX_5);
BUILD_BUG_ON(x86_leaf >= ARRAY_SIZE(reverse_cpuid));
BUILD_BUG_ON(reverse_cpuid[x86_leaf].function == 0);
}
@ -75,12 +99,17 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
*/
static __always_inline u32 __feature_translate(int x86_feature)
{
if (x86_feature == X86_FEATURE_SGX1)
return KVM_X86_FEATURE_SGX1;
else if (x86_feature == X86_FEATURE_SGX2)
return KVM_X86_FEATURE_SGX2;
#define KVM_X86_TRANSLATE_FEATURE(f) \
case X86_FEATURE_##f: return KVM_X86_FEATURE_##f
switch (x86_feature) {
KVM_X86_TRANSLATE_FEATURE(SGX1);
KVM_X86_TRANSLATE_FEATURE(SGX2);
KVM_X86_TRANSLATE_FEATURE(RRSBA_CTRL);
KVM_X86_TRANSLATE_FEATURE(BHI_CTRL);
default:
return x86_feature;
}
}
static __always_inline u32 __feature_leaf(int x86_feature)

View File

@ -1662,20 +1662,22 @@ int svm_register_enc_region(struct kvm *kvm,
goto e_free;
}
/*
* The guest may change the memory encryption attribute from C=0 -> C=1
* or vice versa for this memory range. Lets make sure caches are
* flushed to ensure that guest data gets written into memory with
* correct C-bit. Note, this must be done before dropping kvm->lock,
* as region and its array of pages can be freed by a different task
* once kvm->lock is released.
*/
sev_clflush_pages(region->pages, region->npages);
region->uaddr = range->addr;
region->size = range->size;
list_add_tail(&region->list, &sev->regions_list);
mutex_unlock(&kvm->lock);
/*
* The guest may change the memory encryption attribute from C=0 -> C=1
* or vice versa for this memory range. Lets make sure caches are
* flushed to ensure that guest data gets written into memory with
* correct C-bit.
*/
sev_clflush_pages(region->pages, region->npages);
return ret;
e_free:

View File

@ -2,7 +2,10 @@
#ifndef __KVM_X86_VMX_RUN_FLAGS_H
#define __KVM_X86_VMX_RUN_FLAGS_H
#define VMX_RUN_VMRESUME (1 << 0)
#define VMX_RUN_SAVE_SPEC_CTRL (1 << 1)
#define VMX_RUN_VMRESUME_SHIFT 0
#define VMX_RUN_SAVE_SPEC_CTRL_SHIFT 1
#define VMX_RUN_VMRESUME BIT(VMX_RUN_VMRESUME_SHIFT)
#define VMX_RUN_SAVE_SPEC_CTRL BIT(VMX_RUN_SAVE_SPEC_CTRL_SHIFT)
#endif /* __KVM_X86_VMX_RUN_FLAGS_H */

View File

@ -77,7 +77,7 @@ SYM_FUNC_START(__vmx_vcpu_run)
mov (%_ASM_SP), %_ASM_AX
/* Check if vmlaunch or vmresume is needed */
testb $VMX_RUN_VMRESUME, %bl
bt $VMX_RUN_VMRESUME_SHIFT, %bx
/* Load guest registers. Don't clobber flags. */
mov VCPU_RCX(%_ASM_AX), %_ASM_CX
@ -99,8 +99,11 @@ SYM_FUNC_START(__vmx_vcpu_run)
/* Load guest RAX. This kills the @regs pointer! */
mov VCPU_RAX(%_ASM_AX), %_ASM_AX
/* Check EFLAGS.ZF from 'testb' above */
jz .Lvmlaunch
/* Clobbers EFLAGS.ZF */
CLEAR_CPU_BUFFERS
/* Check EFLAGS.CF from the VMX_RUN_VMRESUME bit test above. */
jnc .Lvmlaunch
/*
* After a successful VMRESUME/VMLAUNCH, control flow "magically"
@ -210,6 +213,8 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL)
call vmx_spec_ctrl_restore_host
CLEAR_BRANCH_HISTORY_VMEXIT
/* Put return value in AX */
mov %_ASM_BX, %_ASM_AX

View File

@ -398,7 +398,8 @@ static __always_inline void vmx_enable_fb_clear(struct vcpu_vmx *vmx)
static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
{
vmx->disable_fb_clear = vmx_fb_clear_ctrl_available;
vmx->disable_fb_clear = !cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF) &&
vmx_fb_clear_ctrl_available;
/*
* If guest will not execute VERW, there is no need to set FB_CLEAR_DIS
@ -6747,11 +6748,14 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
{
kvm_guest_enter_irqoff();
/* L1D Flush includes CPU buffer clear to mitigate MDS */
/*
* L1D Flush includes CPU buffer clear to mitigate MDS, but VERW
* mitigation for MDS is done late in VMentry and is still
* executed in spite of L1D Flush. This is because an extra VERW
* should not matter much after the big hammer L1D Flush.
*/
if (static_branch_unlikely(&vmx_l1d_should_flush))
vmx_l1d_flush(vcpu);
else if (static_branch_unlikely(&mds_user_clear))
mds_clear_cpu_buffers();
else if (static_branch_unlikely(&mmio_stale_data_clear) &&
kvm_arch_has_assigned_device(vcpu->kvm))
mds_clear_cpu_buffers();

View File

@ -1498,7 +1498,8 @@ static unsigned int num_msr_based_features;
ARCH_CAP_SKIP_VMENTRY_L1DFLUSH | ARCH_CAP_SSB_NO | ARCH_CAP_MDS_NO | \
ARCH_CAP_PSCHANGE_MC_NO | ARCH_CAP_TSX_CTRL_MSR | ARCH_CAP_TAA_NO | \
ARCH_CAP_SBDR_SSDP_NO | ARCH_CAP_FBSDP_NO | ARCH_CAP_PSDP_NO | \
ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO)
ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO | \
ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR | ARCH_CAP_BHI_NO)
static u64 kvm_get_arch_capabilities(void)
{
@ -1535,6 +1536,8 @@ static u64 kvm_get_arch_capabilities(void)
data |= ARCH_CAP_SSB_NO;
if (!boot_cpu_has_bug(X86_BUG_MDS))
data |= ARCH_CAP_MDS_NO;
if (!boot_cpu_has_bug(X86_BUG_RFDS))
data |= ARCH_CAP_RFDS_NO;
if (!boot_cpu_has(X86_FEATURE_RTM)) {
/*
@ -7105,7 +7108,17 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
}
if (r < 0)
goto emul_write;
return X86EMUL_UNHANDLEABLE;
/*
* Mark the page dirty _before_ checking whether or not the CMPXCHG was
* successful, as the old value is written back on failure. Note, for
* live migration, this is unnecessarily conservative as CMPXCHG writes
* back the original value and the access is atomic, but KVM's ABI is
* that all writes are dirty logged, regardless of the value written.
*/
kvm_vcpu_mark_page_dirty(vcpu, gpa_to_gfn(gpa));
if (r)
return X86EMUL_CMPXCHG_FAILED;

View File

@ -108,6 +108,7 @@ SYM_START(srso_alias_untrain_ret, SYM_L_GLOBAL, SYM_A_NONE)
ret
int3
SYM_FUNC_END(srso_alias_untrain_ret)
__EXPORT_THUNK(srso_alias_untrain_ret)
#endif
SYM_START(srso_alias_safe_ret, SYM_L_GLOBAL, SYM_A_NONE)
@ -249,9 +250,7 @@ SYM_CODE_START(srso_return_thunk)
SYM_CODE_END(srso_return_thunk)
SYM_FUNC_START(entry_untrain_ret)
ALTERNATIVE_2 "jmp retbleed_untrain_ret", \
"jmp srso_untrain_ret", X86_FEATURE_SRSO, \
"jmp srso_alias_untrain_ret", X86_FEATURE_SRSO_ALIAS
ALTERNATIVE "jmp retbleed_untrain_ret", "jmp srso_untrain_ret", X86_FEATURE_SRSO
SYM_FUNC_END(entry_untrain_ret)
__EXPORT_THUNK(entry_untrain_ret)

View File

@ -26,31 +26,18 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
for (; addr < end; addr = next) {
pud_t *pud = pud_page + pud_index(addr);
pmd_t *pmd;
bool use_gbpage;
next = (addr & PUD_MASK) + PUD_SIZE;
if (next > end)
next = end;
/* if this is already a gbpage, this portion is already mapped */
if (pud_large(*pud))
continue;
/* Is using a gbpage allowed? */
use_gbpage = info->direct_gbpages;
/* Don't use gbpage if it maps more than the requested region. */
/* at the begining: */
use_gbpage &= ((addr & ~PUD_MASK) == 0);
/* ... or at the end: */
use_gbpage &= ((next & ~PUD_MASK) == 0);
/* Never overwrite existing mappings */
use_gbpage &= !pud_present(*pud);
if (use_gbpage) {
if (info->direct_gbpages) {
pud_t pudval;
if (pud_present(*pud))
continue;
addr &= PUD_MASK;
pudval = __pud((addr - info->offset) | info->page_flag);
set_pud(pud, pudval);
continue;

View File

@ -989,6 +989,38 @@ static void free_pfn_range(u64 paddr, unsigned long size)
memtype_free(paddr, paddr + size);
}
static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr,
pgprot_t *pgprot)
{
unsigned long prot;
VM_WARN_ON_ONCE(!(vma->vm_flags & VM_PAT));
/*
* We need the starting PFN and cachemode used for track_pfn_remap()
* that covered the whole VMA. For most mappings, we can obtain that
* information from the page tables. For COW mappings, we might now
* suddenly have anon folios mapped and follow_phys() will fail.
*
* Fallback to using vma->vm_pgoff, see remap_pfn_range_notrack(), to
* detect the PFN. If we need the cachemode as well, we're out of luck
* for now and have to fail fork().
*/
if (!follow_phys(vma, vma->vm_start, 0, &prot, paddr)) {
if (pgprot)
*pgprot = __pgprot(prot);
return 0;
}
if (is_cow_mapping(vma->vm_flags)) {
if (pgprot)
return -EINVAL;
*paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
return 0;
}
WARN_ON_ONCE(1);
return -EINVAL;
}
/*
* track_pfn_copy is called when vma that is covering the pfnmap gets
* copied through copy_page_range().
@ -999,20 +1031,13 @@ static void free_pfn_range(u64 paddr, unsigned long size)
int track_pfn_copy(struct vm_area_struct *vma)
{
resource_size_t paddr;
unsigned long prot;
unsigned long vma_size = vma->vm_end - vma->vm_start;
pgprot_t pgprot;
if (vma->vm_flags & VM_PAT) {
/*
* reserve the whole chunk covered by vma. We need the
* starting address and protection from pte.
*/
if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
WARN_ON_ONCE(1);
if (get_pat_info(vma, &paddr, &pgprot))
return -EINVAL;
}
pgprot = __pgprot(prot);
/* reserve the whole chunk covered by vma. */
return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
}
@ -1087,7 +1112,6 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
unsigned long size)
{
resource_size_t paddr;
unsigned long prot;
if (vma && !(vma->vm_flags & VM_PAT))
return;
@ -1095,11 +1119,8 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
/* free the chunk starting from pfn or the whole chunk */
paddr = (resource_size_t)pfn << PAGE_SHIFT;
if (!paddr && !size) {
if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
WARN_ON_ONCE(1);
if (get_pat_info(vma, &paddr, NULL))
return;
}
size = vma->vm_end - vma->vm_start;
}
free_pfn_range(paddr, size);

View File

@ -647,6 +647,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
t->zone_write_granularity = max(t->zone_write_granularity,
b->zone_write_granularity);
t->zoned = max(t->zoned, b->zoned);
if (!t->zoned) {
t->zone_write_granularity = 0;
t->max_zone_append_sectors = 0;
}
return ret;
}
EXPORT_SYMBOL(blk_stack_limits);

View File

@ -28,7 +28,7 @@ void blk_rq_stat_init(struct blk_rq_stat *stat)
/* src is a per-cpu stat, mean isn't initialized */
void blk_rq_stat_sum(struct blk_rq_stat *dst, struct blk_rq_stat *src)
{
if (!src->nr_samples)
if (dst->nr_samples + src->nr_samples <= dst->nr_samples)
return;
dst->min = min(dst->min, src->min);

View File

@ -67,7 +67,7 @@ out:
complete_all(&param->larval->completion);
crypto_alg_put(&param->larval->alg);
kfree(param);
module_put_and_exit(0);
module_put_and_kthread_exit(0);
}
static int cryptomgr_schedule_probe(struct crypto_larval *larval)
@ -190,7 +190,7 @@ skiptest:
crypto_alg_tested(param->driver, err);
kfree(param);
module_put_and_exit(0);
module_put_and_kthread_exit(0);
}
static int cryptomgr_schedule_test(struct crypto_alg *alg)

View File

@ -208,8 +208,10 @@ void spk_do_flush(void)
wake_up_process(speakup_task);
}
void synth_write(const char *buf, size_t count)
void synth_write(const char *_buf, size_t count)
{
const unsigned char *buf = (const unsigned char *) _buf;
while (count--)
synth_buffer_add(*buf++);
synth_start();

View File

@ -550,8 +550,12 @@ acpi_db_walk_for_fields(acpi_handle obj_handle,
ACPI_FREE(buffer.pointer);
buffer.length = ACPI_ALLOCATE_LOCAL_BUFFER;
acpi_evaluate_object(obj_handle, NULL, NULL, &buffer);
status = acpi_evaluate_object(obj_handle, NULL, NULL, &buffer);
if (ACPI_FAILURE(status)) {
acpi_os_printf("Could Not evaluate object %p\n",
obj_handle);
return (AE_OK);
}
/*
* Since this is a field unit, surround the output in braces
*/

View File

@ -382,18 +382,6 @@ static const struct dmi_system_id acpisleep_dmi_table[] __initconst = {
DMI_MATCH(DMI_PRODUCT_NAME, "20GGA00L00"),
},
},
/*
* ASUS B1400CEAE hangs on resume from suspend (see
* https://bugzilla.kernel.org/show_bug.cgi?id=215742).
*/
{
.callback = init_default_s3,
.ident = "ASUS B1400CEAE",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
DMI_MATCH(DMI_PRODUCT_NAME, "ASUS EXPERTBOOK B1400CEAE"),
},
},
{},
};

View File

@ -666,11 +666,6 @@ MODULE_PARM_DESC(mobile_lpm_policy, "Default LPM policy for mobile chipsets");
static void ahci_pci_save_initial_config(struct pci_dev *pdev,
struct ahci_host_priv *hpriv)
{
if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA && pdev->device == 0x1166) {
dev_info(&pdev->dev, "ASM1166 has only six ports\n");
hpriv->saved_port_map = 0x3f;
}
if (pdev->vendor == PCI_VENDOR_ID_JMICRON && pdev->device == 0x2361) {
dev_info(&pdev->dev, "JMB361 has only one port\n");
hpriv->force_port_map = 1;

View File

@ -787,37 +787,6 @@ static const struct ata_port_info mv_port_info[] = {
},
};
static const struct pci_device_id mv_pci_tbl[] = {
{ PCI_VDEVICE(MARVELL, 0x5040), chip_504x },
{ PCI_VDEVICE(MARVELL, 0x5041), chip_504x },
{ PCI_VDEVICE(MARVELL, 0x5080), chip_5080 },
{ PCI_VDEVICE(MARVELL, 0x5081), chip_508x },
/* RocketRAID 1720/174x have different identifiers */
{ PCI_VDEVICE(TTI, 0x1720), chip_6042 },
{ PCI_VDEVICE(TTI, 0x1740), chip_6042 },
{ PCI_VDEVICE(TTI, 0x1742), chip_6042 },
{ PCI_VDEVICE(MARVELL, 0x6040), chip_604x },
{ PCI_VDEVICE(MARVELL, 0x6041), chip_604x },
{ PCI_VDEVICE(MARVELL, 0x6042), chip_6042 },
{ PCI_VDEVICE(MARVELL, 0x6080), chip_608x },
{ PCI_VDEVICE(MARVELL, 0x6081), chip_608x },
{ PCI_VDEVICE(ADAPTEC2, 0x0241), chip_604x },
/* Adaptec 1430SA */
{ PCI_VDEVICE(ADAPTEC2, 0x0243), chip_7042 },
/* Marvell 7042 support */
{ PCI_VDEVICE(MARVELL, 0x7042), chip_7042 },
/* Highpoint RocketRAID PCIe series */
{ PCI_VDEVICE(TTI, 0x2300), chip_7042 },
{ PCI_VDEVICE(TTI, 0x2310), chip_7042 },
{ } /* terminate list */
};
static const struct mv_hw_ops mv5xxx_ops = {
.phy_errata = mv5_phy_errata,
.enable_leds = mv5_enable_leds,
@ -4311,6 +4280,36 @@ static int mv_pci_init_one(struct pci_dev *pdev,
static int mv_pci_device_resume(struct pci_dev *pdev);
#endif
static const struct pci_device_id mv_pci_tbl[] = {
{ PCI_VDEVICE(MARVELL, 0x5040), chip_504x },
{ PCI_VDEVICE(MARVELL, 0x5041), chip_504x },
{ PCI_VDEVICE(MARVELL, 0x5080), chip_5080 },
{ PCI_VDEVICE(MARVELL, 0x5081), chip_508x },
/* RocketRAID 1720/174x have different identifiers */
{ PCI_VDEVICE(TTI, 0x1720), chip_6042 },
{ PCI_VDEVICE(TTI, 0x1740), chip_6042 },
{ PCI_VDEVICE(TTI, 0x1742), chip_6042 },
{ PCI_VDEVICE(MARVELL, 0x6040), chip_604x },
{ PCI_VDEVICE(MARVELL, 0x6041), chip_604x },
{ PCI_VDEVICE(MARVELL, 0x6042), chip_6042 },
{ PCI_VDEVICE(MARVELL, 0x6080), chip_608x },
{ PCI_VDEVICE(MARVELL, 0x6081), chip_608x },
{ PCI_VDEVICE(ADAPTEC2, 0x0241), chip_604x },
/* Adaptec 1430SA */
{ PCI_VDEVICE(ADAPTEC2, 0x0243), chip_7042 },
/* Marvell 7042 support */
{ PCI_VDEVICE(MARVELL, 0x7042), chip_7042 },
/* Highpoint RocketRAID PCIe series */
{ PCI_VDEVICE(TTI, 0x2300), chip_7042 },
{ PCI_VDEVICE(TTI, 0x2310), chip_7042 },
{ } /* terminate list */
};
static struct pci_driver mv_pci_driver = {
.name = DRV_NAME,
@ -4323,6 +4322,7 @@ static struct pci_driver mv_pci_driver = {
#endif
};
MODULE_DEVICE_TABLE(pci, mv_pci_tbl);
/**
* mv_print_info - Dump key info to kernel log for perusal.
@ -4495,7 +4495,6 @@ static void __exit mv_exit(void)
MODULE_AUTHOR("Brett Russ");
MODULE_DESCRIPTION("SCSI low-level driver for Marvell SATA controllers");
MODULE_LICENSE("GPL v2");
MODULE_DEVICE_TABLE(pci, mv_pci_tbl);
MODULE_VERSION(DRV_VERSION);
MODULE_ALIAS("platform:" DRV_NAME);

View File

@ -1004,8 +1004,7 @@ static void pdc20621_get_from_dimm(struct ata_host *host, void *psource,
offset -= (idx * window_size);
idx++;
dist = ((long) (window_size - (offset + size))) >= 0 ? size :
(long) (window_size - offset);
dist = min(size, window_size - offset);
memcpy_fromio(psource, dimm_mmio + offset / 4, dist);
psource += dist;
@ -1053,8 +1052,7 @@ static void pdc20621_put_to_dimm(struct ata_host *host, void *psource,
readl(mmio + PDC_DIMM_WINDOW_CTLR);
offset -= (idx * window_size);
idx++;
dist = ((long)(s32)(window_size - (offset + size))) >= 0 ? size :
(long) (window_size - offset);
dist = min(size, window_size - offset);
memcpy_toio(dimm_mmio + offset / 4, psource, dist);
writel(0x01, mmio + PDC_GENERAL_CTLR);
readl(mmio + PDC_GENERAL_CTLR);

View File

@ -53,6 +53,7 @@ static unsigned int defer_sync_state_count = 1;
static DEFINE_MUTEX(fwnode_link_lock);
static bool fw_devlink_is_permissive(void);
static bool fw_devlink_drv_reg_done;
static struct workqueue_struct *device_link_wq;
/**
* fwnode_link_add - Create a link between two fwnode_handles.
@ -501,12 +502,26 @@ static void devlink_dev_release(struct device *dev)
/*
* It may take a while to complete this work because of the SRCU
* synchronization in device_link_release_fn() and if the consumer or
* supplier devices get deleted when it runs, so put it into the "long"
* workqueue.
* supplier devices get deleted when it runs, so put it into the
* dedicated workqueue.
*/
queue_work(system_long_wq, &link->rm_work);
queue_work(device_link_wq, &link->rm_work);
}
/**
* device_link_wait_removal - Wait for ongoing devlink removal jobs to terminate
*/
void device_link_wait_removal(void)
{
/*
* devlink removal jobs are queued in the dedicated work queue.
* To be sure that all removal jobs are terminated, ensure that any
* scheduled work has run to completion.
*/
flush_workqueue(device_link_wq);
}
EXPORT_SYMBOL_GPL(device_link_wait_removal);
static struct class devlink_class = {
.name = "devlink",
.owner = THIS_MODULE,
@ -3825,9 +3840,14 @@ int __init devices_init(void)
sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj);
if (!sysfs_dev_char_kobj)
goto char_kobj_err;
device_link_wq = alloc_workqueue("device_link_wq", 0, 0);
if (!device_link_wq)
goto wq_err;
return 0;
wq_err:
kobject_put(sysfs_dev_char_kobj);
char_kobj_err:
kobject_put(sysfs_dev_block_kobj);
block_kobj_err:

View File

@ -589,6 +589,12 @@ ssize_t __weak cpu_show_spec_rstack_overflow(struct device *dev,
return sysfs_emit(buf, "Not affected\n");
}
ssize_t __weak cpu_show_reg_file_data_sampling(struct device *dev,
struct device_attribute *attr, char *buf)
{
return sysfs_emit(buf, "Not affected\n");
}
static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
@ -602,6 +608,7 @@ static DEVICE_ATTR(mmio_stale_data, 0444, cpu_show_mmio_stale_data, NULL);
static DEVICE_ATTR(retbleed, 0444, cpu_show_retbleed, NULL);
static DEVICE_ATTR(gather_data_sampling, 0444, cpu_show_gds, NULL);
static DEVICE_ATTR(spec_rstack_overflow, 0444, cpu_show_spec_rstack_overflow, NULL);
static DEVICE_ATTR(reg_file_data_sampling, 0444, cpu_show_reg_file_data_sampling, NULL);
static struct attribute *cpu_root_vulnerabilities_attrs[] = {
&dev_attr_meltdown.attr,
@ -617,6 +624,7 @@ static struct attribute *cpu_root_vulnerabilities_attrs[] = {
&dev_attr_retbleed.attr,
&dev_attr_gather_data_sampling.attr,
&dev_attr_spec_rstack_overflow.attr,
&dev_attr_reg_file_data_sampling.attr,
NULL
};

View File

@ -362,8 +362,10 @@ void dev_pm_enable_wake_irq_complete(struct device *dev)
return;
if (wirq->status & WAKE_IRQ_DEDICATED_MANAGED &&
wirq->status & WAKE_IRQ_DEDICATED_REVERSE)
wirq->status & WAKE_IRQ_DEDICATED_REVERSE) {
enable_irq(wirq->irq);
wirq->status |= WAKE_IRQ_DEDICATED_ENABLED;
}
}
/**

View File

@ -411,7 +411,7 @@ int btintel_read_version(struct hci_dev *hdev, struct intel_version *ver)
return PTR_ERR(skb);
}
if (skb->len != sizeof(*ver)) {
if (!skb || skb->len != sizeof(*ver)) {
bt_dev_err(hdev, "Intel version event size mismatch");
kfree_skb(skb);
return -EILSEQ;

View File

@ -1557,6 +1557,7 @@ static struct clk_regmap_div nss_ubi0_div_clk_src = {
static const struct freq_tbl ftbl_pcie_aux_clk_src[] = {
F(24000000, P_XO, 1, 0, 0),
{ }
};
static const struct clk_parent_data gcc_xo_gpll0_core_pi_sleep_clk[] = {
@ -1737,6 +1738,7 @@ static const struct freq_tbl ftbl_sdcc_ice_core_clk_src[] = {
F(160000000, P_GPLL0, 5, 0, 0),
F(216000000, P_GPLL6, 5, 0, 0),
F(308570000, P_GPLL6, 3.5, 0, 0),
{ }
};
static const struct clk_parent_data gcc_xo_gpll0_gpll6_gpll0_div2[] = {

View File

@ -972,6 +972,7 @@ static struct clk_rcg2 pcie0_axi_clk_src = {
static const struct freq_tbl ftbl_pcie_aux_clk_src[] = {
F(19200000, P_XO, 1, 0, 0),
{ }
};
static struct clk_rcg2 pcie0_aux_clk_src = {
@ -1077,6 +1078,7 @@ static const struct freq_tbl ftbl_sdcc_ice_core_clk_src[] = {
F(19200000, P_XO, 1, 0, 0),
F(160000000, P_GPLL0, 5, 0, 0),
F(308570000, P_GPLL6, 3.5, 0, 0),
{ }
};
static struct clk_rcg2 sdcc1_ice_core_clk_src = {

View File

@ -3642,3 +3642,4 @@ module_exit(gcc_sdm845_exit);
MODULE_DESCRIPTION("QTI GCC SDM845 Driver");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS("platform:gcc-sdm845");
MODULE_SOFTDEP("pre: rpmhpd");

View File

@ -333,6 +333,7 @@ static struct freq_tbl ftbl_mmss_axi_clk[] = {
F(333430000, P_MMPLL1, 3.5, 0, 0),
F(400000000, P_MMPLL0, 2, 0, 0),
F(466800000, P_MMPLL1, 2.5, 0, 0),
{ }
};
static struct clk_rcg2 mmss_axi_clk_src = {
@ -357,6 +358,7 @@ static struct freq_tbl ftbl_ocmemnoc_clk[] = {
F(150000000, P_GPLL0, 4, 0, 0),
F(228570000, P_MMPLL0, 3.5, 0, 0),
F(320000000, P_MMPLL0, 2.5, 0, 0),
{ }
};
static struct clk_rcg2 ocmemnoc_clk_src = {

View File

@ -267,6 +267,7 @@ static struct freq_tbl ftbl_mmss_axi_clk[] = {
F(291750000, P_MMPLL1, 4, 0, 0),
F(400000000, P_MMPLL0, 2, 0, 0),
F(466800000, P_MMPLL1, 2.5, 0, 0),
{ }
};
static struct clk_rcg2 mmss_axi_clk_src = {
@ -291,6 +292,7 @@ static struct freq_tbl ftbl_ocmemnoc_clk[] = {
F(150000000, P_GPLL0, 4, 0, 0),
F(291750000, P_MMPLL1, 4, 0, 0),
F(400000000, P_MMPLL0, 2, 0, 0),
{ }
};
static struct clk_rcg2 ocmemnoc_clk_src = {

View File

@ -32,7 +32,7 @@
#define GT_CONTROL_IRQ_ENABLE BIT(2) /* banked */
#define GT_CONTROL_AUTO_INC BIT(3) /* banked */
#define GT_CONTROL_PRESCALER_SHIFT 8
#define GT_CONTROL_PRESCALER_MAX 0xF
#define GT_CONTROL_PRESCALER_MAX 0xFF
#define GT_CONTROL_PRESCALER_MASK (GT_CONTROL_PRESCALER_MAX << \
GT_CONTROL_PRESCALER_SHIFT)

View File

@ -481,10 +481,11 @@ static bool brcm_avs_is_firmware_loaded(struct private_data *priv)
static unsigned int brcm_avs_cpufreq_get(unsigned int cpu)
{
struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
struct private_data *priv;
if (!policy)
return 0;
struct private_data *priv = policy->driver_data;
priv = policy->driver_data;
cpufreq_cpu_put(policy);
return brcm_avs_get_frequency(priv->base);

View File

@ -208,7 +208,7 @@ static int dt_cpufreq_early_init(struct device *dev, int cpu)
if (!priv)
return -ENOMEM;
if (!alloc_cpumask_var(&priv->cpus, GFP_KERNEL))
if (!zalloc_cpumask_var(&priv->cpus, GFP_KERNEL))
return -ENOMEM;
cpumask_set_cpu(cpu, priv->cpus);

View File

@ -16,6 +16,7 @@
#include <linux/cpumask.h>
#include <linux/tick.h>
#include <linux/cpu.h>
#include <linux/math64.h>
#include "cpuidle.h"
@ -185,7 +186,7 @@ static void __cpuidle_driver_init(struct cpuidle_driver *drv)
s->target_residency_ns = 0;
if (s->exit_latency > 0)
s->exit_latency_ns = s->exit_latency * NSEC_PER_USEC;
s->exit_latency_ns = mul_u32_u32(s->exit_latency, NSEC_PER_USEC);
else if (s->exit_latency_ns < 0)
s->exit_latency_ns = 0;
}

View File

@ -95,6 +95,8 @@ static void adf_device_reset_worker(struct work_struct *work)
if (adf_dev_init(accel_dev) || adf_dev_start(accel_dev)) {
/* The device hanged and we can't restart it so stop here */
dev_err(&GET_DEV(accel_dev), "Restart device failed\n");
if (reset_data->mode == ADF_DEV_RESET_ASYNC ||
completion_done(&reset_data->compl))
kfree(reset_data);
WARN(1, "QAT: device restart failed. Device is unusable\n");
return;
@ -102,11 +104,19 @@ static void adf_device_reset_worker(struct work_struct *work)
adf_dev_restarted_notify(accel_dev);
clear_bit(ADF_STATUS_RESTARTING, &accel_dev->status);
/* The dev is back alive. Notify the caller if in sync mode */
if (reset_data->mode == ADF_DEV_RESET_SYNC)
complete(&reset_data->compl);
else
/*
* The dev is back alive. Notify the caller if in sync mode
*
* If device restart will take a more time than expected,
* the schedule_reset() function can timeout and exit. This can be
* detected by calling the completion_done() function. In this case
* the reset_data structure needs to be freed here.
*/
if (reset_data->mode == ADF_DEV_RESET_ASYNC ||
completion_done(&reset_data->compl))
kfree(reset_data);
else
complete(&reset_data->compl);
}
static int adf_dev_aer_schedule_reset(struct adf_accel_dev *accel_dev,
@ -139,8 +149,9 @@ static int adf_dev_aer_schedule_reset(struct adf_accel_dev *accel_dev,
dev_err(&GET_DEV(accel_dev),
"Reset device timeout expired\n");
ret = -EFAULT;
}
} else {
kfree(reset_data);
}
return ret;
}
return 0;

View File

@ -415,7 +415,7 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *),
void *data, bool duplicates, struct list_head *head)
{
const struct efivar_operations *ops;
unsigned long variable_name_size = 1024;
unsigned long variable_name_size = 512;
efi_char16_t *variable_name;
efi_status_t status;
efi_guid_t vendor_guid;
@ -438,12 +438,13 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *),
}
/*
* Per EFI spec, the maximum storage allocated for both
* the variable name and variable data is 1024 bytes.
* A small set of old UEFI implementations reject sizes
* above a certain threshold, the lowest seen in the wild
* is 512.
*/
do {
variable_name_size = 1024;
variable_name_size = 512;
status = ops->get_next_variable(&variable_name_size,
variable_name,
@ -491,9 +492,13 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *),
break;
case EFI_NOT_FOUND:
break;
case EFI_BUFFER_TOO_SMALL:
pr_warn("efivars: Variable name size exceeds maximum (%lu > 512)\n",
variable_name_size);
status = EFI_NOT_FOUND;
break;
default:
printk(KERN_WARNING "efivars: get_next_variable: status=%lx\n",
status);
pr_warn("efivars: get_next_variable: status=%lx\n", status);
status = EFI_NOT_FOUND;
break;
}

Some files were not shown because too many files have changed in this diff Show More