As part of on ongoing effort to perform more automated testing and
provide more tools for individual developers to validate their
patches before submitting, we are trying to make our code
"clang-format clean". My hope is that once we have fixed all of our
style "quirks", developers will be able to run clang-format on their
patches to help avoid silly formatting problems and ensure their
changes fit in well with the rest of the SELinux kernel code.
Signed-off-by: Paul Moore <paul@paul-moore.com>
A trivial correction to convert an 'unsigned' parameter into an
'unsigned int' parameter so the prototype matches the function
definition.
I really thought that someone submitted a patch for this a few years
ago but sadly I can't find it now.
Signed-off-by: Paul Moore <paul@paul-moore.com>
As part of on ongoing effort to perform more automated testing and
provide more tools for individual developers to validate their
patches before submitting, we are trying to make our code
"clang-format clean". My hope is that once we have fixed all of our
style "quirks", developers will be able to run clang-format on their
patches to help avoid silly formatting problems and ensure their
changes fit in well with the rest of the SELinux kernel code.
Signed-off-by: Paul Moore <paul@paul-moore.com>
As part of on ongoing effort to perform more automated testing and
provide more tools for individual developers to validate their
patches before submitting, we are trying to make our code
"clang-format clean". My hope is that once we have fixed all of our
style "quirks", developers will be able to run clang-format on their
patches to help avoid silly formatting problems and ensure their
changes fit in well with the rest of the SELinux kernel code.
Signed-off-by: Paul Moore <paul@paul-moore.com>
This code is rarely (never?) enabled by distros, and it hasn't caught
anything in decades. Let's kill off this legacy debug code.
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
linux/io_uring.h is slowly becoming a rubbish bin where we put
anything exposed to other subsystems. For instance, the task exit
hooks and io_uring cmd infra are completely orthogonal and don't need
each other's definitions. Start cleaning it up by splitting out all
command bits into a new header file.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7ec50bae6e21f371d3850796e716917fc141225a.1701391955.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since commit d9250dea3f ("SELinux: add boundary support and thread
context assignment"), SELinux has been supporting assigning per-thread
security context under a constraint and the comment was updated
accordingly. However, seems like commit d84f4f992c ("CRED: Inaugurate
COW credentials") accidentally brought the old comment back that doesn't
match what the code does.
Considering the ease of understanding the code, this patch just removes the
wrong comment.
Fixes: d84f4f992c ("CRED: Inaugurate COW credentials")
Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Currently, SELinux doesn't allow distinguishing between kernel threads
and userspace processes that are started before the policy is first
loaded - both get the label corresponding to the kernel SID. The only
way a process that persists from early boot can get a meaningful label
is by doing a voluntary dyntransition or re-executing itself.
Reusing the kernel label for userspace processes is problematic for
several reasons:
1. The kernel is considered to be a privileged domain and generally
needs to have a wide range of permissions allowed to work correctly,
which prevents the policy writer from effectively hardening against
early boot processes that might remain running unintentionally after
the policy is loaded (they represent a potential extra attack surface
that should be mitigated).
2. Despite the kernel being treated as a privileged domain, the policy
writer may want to impose certain special limitations on kernel
threads that may conflict with the requirements of intentional early
boot processes. For example, it is a good hardening practice to limit
what executables the kernel can execute as usermode helpers and to
confine the resulting usermode helper processes. However, a
(legitimate) process surviving from early boot may need to execute a
different set of executables.
3. As currently implemented, overlayfs remembers the security context of
the process that created an overlayfs mount and uses it to bound
subsequent operations on files using this context. If an overlayfs
mount is created before the SELinux policy is loaded, these "mounter"
checks are made against the kernel context, which may clash with
restrictions on the kernel domain (see 2.).
To resolve this, introduce a new initial SID (reusing the slot of the
former "init" initial SID) that will be assigned to any userspace
process started before the policy is first loaded. This is easy to do,
as we can simply label any process that goes through the
bprm_creds_for_exec LSM hook with the new init-SID instead of
propagating the kernel SID from the parent.
To provide backwards compatibility for existing policies that are
unaware of this new semantic of the "init" initial SID, introduce a new
policy capability "userspace_initial_context" and set the "init" SID to
the same context as the "kernel" SID unless this capability is set by
the policy.
Another small backwards compatibility measure is needed in
security_sid_to_context_core() for before the initial SELinux policy
load - see the code comment for explanation.
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: edited comments based on feedback/discussion]
Signed-off-by: Paul Moore <paul@paul-moore.com>
In four separate functions within avtab, the same comparison logic is
used. The only difference is how the result is handled or whether there
is a unique specifier value to be checked for or used.
Extracting this functionality into the avtab_node_cmp() function unifies
the comparison logic between searching and insertion and gets rid of
duplicative code so that the implementation is easier to maintain.
Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Using full_name_hash() instead of partial_name_hash() should result
in cleaner and better performing code.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
On policy reload selinuxfs replaces two subdirectories (/booleans
and /class) with new variants. Unfortunately, that's done with
serious abuses of directory locking.
1) lock_rename() should be done to parents, not to objects being
exchanged
2) there's a bunch of reasons why it should not be done for directories
that do not have a common ancestor; most of those do not apply to
selinuxfs, but even in the best case the proof is subtle and brittle.
3) failure halfway through the creation of /class will leak
names and values arrays.
4) use of d_genocide() is also rather brittle; it's probably not much of
a bug per se, but e.g. an overmount of /sys/fs/selinuxfs/classes/shm/index
with any regular file will end up with leaked mount on policy reload.
Sure, don't do it, but...
Let's stop messing with disconnected directories; just create
a temporary (/.swapover) with no permissions for anyone (on the
level of ->permission() returing -EPERM, no matter who's calling
it) and build the new /booleans and /class in there; then
lock_rename on root and that temporary directory and d_exchange()
old and new both for class and booleans. Then unlock and use
simple_recursive_removal() to take the temporary out; it's much
more robust.
And instead of bothering with separate pathways for freeing
new (on failure halfway through) and old (on success) names/values,
do all freeing in one place. With temporaries swapped with the
old ones when we are past all possible failures.
The only user-visible difference is that /.swapover shows up
(but isn't possible to open, look up into, etc.) for the
duration of policy reload.
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[PM: applied some fixes from Al post merge]
Signed-off-by: Paul Moore <paul@paul-moore.com>
As the kernel test robot helpfully reminded us, all of the lsm_id
instances defined inside the various LSMs should be marked as static.
The one exception is Landlock which uses its lsm_id variable across
multiple source files with an extern declaration in a header file.
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
While we have a lsm_fill_user_ctx() helper function designed to make
life easier for LSMs which return lsm_ctx structs to userspace, we
didn't include all of the buffer length safety checks and buffer
padding adjustments in the helper. This led to code duplication
across the different LSMs and the possibility for mistakes across the
different LSM subsystems. In order to reduce code duplication and
decrease the chances of silly mistakes, we're consolidating all of
this code into the lsm_fill_user_ctx() helper.
The buffer padding is also modified from a fixed 8-byte alignment to
an alignment that matches the word length of the machine
(BITS_PER_LONG / 8).
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add hooks for setselfattr and getselfattr. These hooks are not very
different from their setprocattr and getprocattr equivalents, and
much of the code is shared.
Cc: selinux@vger.kernel.org
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Create a struct lsm_id to contain identifying information about Linux
Security Modules (LSMs). At inception this contains the name of the
module and an identifier associated with the security module. Change
the security_add_hooks() interface to use this structure. Change the
individual modules to maintain their own struct lsm_id and pass it to
security_add_hooks().
The values are for LSM identifiers are defined in a new UAPI
header file linux/lsm.h. Each existing LSM has been updated to
include it's LSMID in the lsm_id.
The LSM ID values are sequential, with the oldest module
LSM_ID_CAPABILITY being the lowest value and the existing modules
numbered in the order they were included in the main line kernel.
This is an arbitrary convention for assigning the values, but
none better presents itself. The value 0 is defined as being invalid.
The values 1-99 are reserved for any special case uses which may
arise in the future. This may include attributes of the LSM
infrastructure itself, possibly related to namespacing or network
attribute management. A special range is identified for such attributes
to help reduce confusion for developers unfamiliar with LSMs.
LSM attribute values are defined for the attributes presented by
modules that are available today. As with the LSM IDs, The value 0
is defined as being invalid. The values 1-99 are reserved for any
special case uses which may arise in the future.
Cc: linux-security-module <linux-security-module@vger.kernel.org>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Mickael Salaun <mic@digikod.net>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Nacked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
[PM: forward ported beyond v6.6 due merge window changes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmVAJpsUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOvghAAzzIu0KzvcDci1AkEvS/6MD0ChS/K
d6cJPnf65bJ98M8PSIJz5bik2t6TYGDJZqAXTqyb6Bb7XM0ITVY5vPgLKWy1VXCr
xHBQOSFzYWMV+cqo+17l55iRKPFQb+2rNHfIqKfwxrAWhLL0zeZR4M7AQQVMrT5Q
WZXleW5xwvMA9My7ny+n3jTqxaiDIZTYyi0noF89j7NNfteehaKAkwZ6Phkdx5cM
UeSTkv9fO2/O80UcBs63h2rAhBM7BAvSDV8PicDN6VyWWdE3njQUBkWK+aGt4Ab+
KW2S1su0cPc636cmRyFqUvAdHNaJb2Uh+0OWRYWPd0fhAxI9kzYVQaMvl84ngwQZ
g2z8VHaCE8pO4zCuBVmxOBhl0pVoQfSXOG1wXPvEw4Me5hEudda3Fa3SjCeD1u2j
vEi/w2yl7Rb0oMg7MlpCZLSnH1fhwPxmnn+SJD1C0lCuINsB04sTuO0YbwJOcoUc
30bbzZt4oNJmTeNfuDiYDeQQExdIaOhACEnJ0c4TiMU26FfoAWnQVnz2KSqWvy+g
EKcQvqQD0Jl1e6adqyNDA7cexPkcC26KkUBJzK8g7pI6YqOG/Ev3dwygcM6Nesi8
iSxfnjixU84hHRq+u/7wQ72ga5oRw0qwL6iXLyFlV/O6vuvlFRtGSofluXkZ2LNL
m82igNveVpd9/Mc=
=a9QY
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20231030' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:
- Add new credential functions, get_cred_many() and put_cred_many() to
save some atomic_t operations for a few operations.
While not strictly LSM related, this patchset had been rotting on the
mailing lists for some time and since the LSMs do care a lot about
credentials I thought it reasonable to give this patch a home.
- Five patches to constify different LSM hook parameters.
- Fix a spelling mistake.
* tag 'lsm-pr-20231030' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: fix a spelling mistake
cred: add get_cred_many and put_cred_many
lsm: constify 'sb' parameter in security_sb_kern_mount()
lsm: constify 'bprm' parameter in security_bprm_committed_creds()
lsm: constify 'bprm' parameter in security_bprm_committing_creds()
lsm: constify 'file' parameter in security_bprm_creds_from_file()
lsm: constify 'sb' parameter in security_quotactl()
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmVAJoEUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNMSw//eS3kO3K9NdpLgRIy+FWIljWt6+hp
z6GhX3jBhxZ5Hot+Y1SGNmQrA1ZQcFDW/h0sRCilmpt388ihBE7lIrNpPDDDx0U8
GjPjMbO2mgpYKTHCPJi+k98xBdPdFR5IK2jnl9K03eFpLX6fy8MLg/AWzJJnhPT/
+EtSY8ZMJOt6JVsb392WR0TFaROhqTjAUFsCRisRYeKmI9r7LE2oDq6Edu4dzKqH
+bSfD11Hn4GUYPPjcmGFO8iBOhjgbBOQLMxR/zHX9l2T3mOiCvprgI8+1kQdB04o
mFjjc1oQYOLwnwJZQdU8ppdCXQVaR2HtrJ9DbULATaUYwOVkZnxpYzRAaVONnLdh
LTPw9665Ip/iDoSOP5esuc4kxwKcUUkV84BgbKmzw4+Ml4kL4CvAJmLofu87O4lI
K1SFxNbKDJcTPsff4IQ9y98e64LhTCXBkfTj5ZyYKR5UBorlAlX1UhfhgImD8qNU
UaIOFrKCyGgJhsHpJx+MpM9W+mTLl9AsvhUo+Es1frk7jUdzztGEunmWwdpW/uaH
fPbBw3Nf/5ZLCH44HUEZS4hAhNuhUx9oX8Bz28+CBdUtWUeUArc2C2IyNv1bTDTW
sQh7SVaZj6CBRznumLy8MrnmTQ+D9j1egcgeg+S2FBgHjg1GV8sqnykvutPuJ95F
EdRMlG7ahmnmD7w=
=uDFj
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20231030' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- improve the SELinux debugging configuration controls in Kconfig
- print additional information about the hash table chain lengths when
when printing SELinux debugging information
- simplify the SELinux access vector hash table calcaulations
- use a better hashing function for the SELinux role tansition hash
table
- improve SELinux load policy time through the use of optimized
functions for calculating the number of bits set in a field
- addition of a __counted_by annotation
- simplify the avtab_inert_node() function through a simplified
prototype
* tag 'selinux-pr-20231030' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: simplify avtab_insert_node() prototype
selinux: hweight optimization in avtab_read_item
selinux: improve role transition hashing
selinux: simplify avtab slot calculation
selinux: improve debug configuration
selinux: print sum of chain lengths^2 for hash tables
selinux: Annotate struct sidtab_str_cache with __counted_by
Convert to using the new inode timestamp accessor functions.
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-83-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
__hashtab_insert() in hashtab.h has a cleaner interface that allows the
caller to specify the chain node location that the new node is being
inserted into so that it can update the node that currently occupies it.
Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The "sb_kern_mount" hook has implementation registered in SELinux.
Looking at the function implementation we observe that the "sb"
parameter is not changing.
Mark the "sb" parameter of LSM hook security_sb_kern_mount() as "const"
since it will not be changing in the LSM hook.
Signed-off-by: Khadija Kamran <kamrankhadijadj@gmail.com>
[PM: minor merge fuzzing due to other constification patches]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Three LSMs register the implementations for the 'bprm_committed_creds()'
hook: AppArmor, SELinux and tomoyo. Looking at the function
implementations we may observe that the 'bprm' parameter is not changing.
Mark the 'bprm' parameter of LSM hook security_bprm_committed_creds() as
'const' since it will not be changing in the LSM hook.
Signed-off-by: Khadija Kamran <kamrankhadijadj@gmail.com>
[PM: minor merge fuzzing due to other constification patches]
Signed-off-by: Paul Moore <paul@paul-moore.com>
The 'bprm_committing_creds' hook has implementations registered in
SELinux and Apparmor. Looking at the function implementations we observe
that the 'bprm' parameter is not changing.
Mark the 'bprm' parameter of LSM hook security_bprm_committing_creds()
as 'const' since it will not be changing in the LSM hook.
Signed-off-by: Khadija Kamran <kamrankhadijadj@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
SELinux registers the implementation for the "quotactl" hook. Looking at
the function implementation we observe that the parameter "sb" is not
changing.
Mark the "sb" parameter of LSM hook security_quotactl() as "const" since
it will not be changing in the LSM hook.
Signed-off-by: Khadija Kamran <kamrankhadijadj@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
avtab_read_item() is a hot function called when reading each rule in a
binary policydb. With the current Fedora policy and refpolicy, this
function is called nearly 100,000 times per policy load.
A single avtab node is only permitted to have a single specifier to
describe the data it holds. As such, a check is performed to make sure
only one specifier is set. Previously this was done via a for-loop.
However, there is already an optimal function for finding the number of
bits set (hamming weight) and on some architectures, dedicated
instructions (popcount) which can be executed much more efficiently.
Even when using -mcpu=generic on a x86-64 Fedora 38 VM, this commit
results in a modest 2-4% speedup for policy loading due to a substantial
reduction in the number of instructions executed.
Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The number of buckets is calculated by performing a binary AND against
the mask of the hash table, which is one less than its size (which is a
power of two). This leads to all top bits being discarded, e.g. with
the Reference Policy on Debian there exists 376 entries, leading to a
size of 512, discarding the top 23 bits.
Use jhash to improve the hash table utilization:
# current
roletr: 376 entries and 124/512 buckets used,
longest chain length 8, sum of chain length^2 1496
# patch
roletr: 376 entries and 266/512 buckets used,
longest chain length 4, sum of chain length^2 646
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
[PM: line wrap in the commit description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Instead of dividing by 8 and then performing log2 by hand, use a more
readable calculation.
The behavior of rounddown_pow_of_two() for an input of 0 is undefined,
so handle that case and small values manually to achieve the same
results.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
If the SELinux debug configuration is enabled define the macro DEBUG
such that pr_debug() calls are always enabled, regardless of
CONFIG_DYNAMIC_DEBUG, since those message are the main reason for this
configuration in the first place.
Mention example usage in case CONFIG_DYNAMIC_DEBUG is enabled in the
help section of the configuration.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Print the sum of chain lengths squared as a metric for hash tables to
provide more insights, similar to avtabs.
While on it add a comma in the avtab message to improve readability of
the output.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
selinux_set_mnt_opts() relies on the fact that the mount options pointer
is always NULL when all options are unset (specifically in its
!selinux_initialized() branch. However, the new
selinux_fs_context_submount() hook breaks this rule by allocating a new
structure even if no options are set. That causes any submount created
before a SELinux policy is loaded to be rejected in
selinux_set_mnt_opts().
Fix this by making selinux_fs_context_submount() leave fc->security
set to NULL when there are no options to be copied from the reference
superblock.
Cc: <stable@vger.kernel.org>
Reported-by: Adam Williamson <awilliam@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2236345
Fixes: d80a8f1b58 ("vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct sidtab_str_cache.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: selinux@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmTuKLcUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXM/Eg//cwaOu/ASS08Cz/tfXeKpzg9UpzbW
uHqGtgdE9ZEvS71z+3dorOJVPEwPr+/yviq3FXYjYHFqvVhLZCvYM9rw+eNo/k4T
I95UTchGUsMWwkw61YBDLythfXm2UL5nabjckO81i9UPtxUYOwF6xQMQXYyMcLL8
6fm1vnCvK5FBEXi2HSUWy3Eb3wdviGdHrL6h19Aeew+q8u33asWSxn9vmBSSFEzZ
492//Pgy0t3FA6paWXQRvoR+GvLgBXNOvHB68cAx9vS8Lq6mAwJJSCRrQtKGh2Gd
YInr49f+TXOosD5Tm6ueWO4sr8RzQZ7nPyM+BLue4Yn2ZzdYgjwfHdkHWS1KeH5X
qVqa9s6/QONvkSCzqHs/ne2qio1Q0/0uGgwOkx6N7oVWQWjE7iTYlADwM0CDJnd2
UD7AHTOgpc88x1T1eW599MZttSCznBTSFXv4waaS5/5NT9n8Db1TpTtCTedOc1x2
n+c+F5BHLy69vhSGCanvum/8i2gNoKVyYaHyaMsQxr5LRcLnvN6oOjWIv7jMKxe7
GavUAxU7M5rxPUH44vrrrI+XztKJOdpCz4S0xp+7pSSSGAK5KkmVVLXjzrlGO1WS
55ixxQWYTGK0KlWHp4Ofi6brE9a4ATKcd1XscPN+AtBYX2ufNHLskCZulu/lyrMx
lAy9RRDe1hHWTvg=
=dnm4
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:
- Add proper multi-LSM support for xattrs in the
security_inode_init_security() hook
Historically the LSM layer has only allowed a single LSM to add an
xattr to an inode, with IMA/EVM measuring that and adding its own as
well. As we work towards promoting IMA/EVM to a "proper LSM" instead
of the special case that it is now, we need to better support the
case of multiple LSMs each adding xattrs to an inode and after
several attempts we now appear to have something that is working
well. It is worth noting that in the process of making this change we
uncovered a problem with Smack's SMACK64TRANSMUTE xattr which is also
fixed in this pull request.
- Additional LSM hook constification
Two patches to constify parameters to security_capget() and
security_binder_transfer_file(). While I generally don't make a
special note of who submitted these patches, these were the work of
an Outreachy intern, Khadija Kamran, and that makes me happy;
hopefully it does the same for all of you reading this.
- LSM hook comment header fixes
One patch to add a missing hook comment header, one to fix a minor
typo.
- Remove an old, unused credential function declaration
It wasn't clear to me who should pick this up, but it was trivial,
obviously correct, and arguably the LSM layer has a vested interest
in credentials so I merged it. Sadly I'm now noticing that despite my
subject line cleanup I didn't cleanup the "unsued" misspelling, sigh
* tag 'lsm-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: constify the 'file' parameter in security_binder_transfer_file()
lsm: constify the 'target' parameter in security_capget()
lsm: add comment block for security_sk_classify_flow LSM hook
security: Fix ret values doc for security_inode_init_security()
cred: remove unsued extern declaration change_create_files_as()
evm: Support multiple LSMs providing an xattr
evm: Align evm_inode_init_security() definition with LSM infrastructure
smack: Set the SMACK64TRANSMUTE xattr in smack_inode_init_security()
security: Allow all LSMs to provide xattrs for inode_init_security hook
lsm: fix typo in security_file_lock() comment header
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmTuKKAUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNTVBAAjFt/+t74FkH7OeIuwa4QFodNHWLS
4AgtudUrH0oE/JnDDZsMNm3gjtFhM0cYcCCvItjNmea9sCMfR/huAaQXuYldLI/w
IF2/n31Q9SlaHF8xgdpPg55tudx7khGoecC+aHA/qsRoikjCvUhOiWJZT4xn92R7
3DtPMyoY7/1Bw8TWhq9Kj4ks2evx+o9Q798Nxu6XKUWZ6X+CasQfH0bAWWcmh8sL
Z/mh5pyICmg9/sHvTdi3/VhhhSiWY0gyaX3tBo7Y3z6J4xjBzLzpJTC2TMaqpqLx
nUO2R02Bk6PgMjQcVYl77WlUOfwPz3L2NkWcCmTBNzyBzirl9pUjVZ7ay9fek4Zf
GSnoZ3IPNLxsK/oU4Zh+LKzkBpq7astovebFd4SYG+KHIhXkp8eblzxM5/M+LfG/
SXnGrLaCwxWp4jWlYubufPtKhF8jFlpzRvTELKrhU9pZCgzpvwvIJZhUGVP6cDWh
7j9RRJy1wBgqoDASM5+tCrplkwFa3r4AV5CVOFYSXgyvLCHk28IxcMEVqCvoO3yQ
FmvAi0MV85XOa/NpD3hYpnvfAcLbv6ftZC6JekqXMjwVE9vfdrywS4ZW4Z3BcEo+
M+eDsCh8ek9LmO+6IW0jffNuHijLKDr3hywnndNWascrmkmgDmu60kW/HIQORkPa
tZkswaKUa6RszR0=
=XB/b
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"Thirty three SELinux patches, which is a pretty big number for us, but
there isn't really anything scary in here; in fact we actually manage
to remove 10 lines of code with this :)
- Promote the SELinux DEBUG_HASHES macro to CONFIG_SECURITY_SELINUX_DEBUG
The DEBUG_HASHES macro was a buried SELinux specific preprocessor
debug macro that was a problem waiting to happen. Promoting the
debug macro to a proper Kconfig setting should help both improve
the visibility of the feature as well enable improved test
coverage. We've moved some additional debug functions under the
CONFIG_SECURITY_SELINUX_DEBUG flag and we may see more work in the
future.
- Emit a pr_notice() message if virtual memory is executable by default
As this impacts the SELinux access control policy enforcement, if
the system's configuration is such that virtual memory is
executable by default we print a single line notice to the console.
- Drop avtab_search() in favor of avtab_search_node()
Both functions are nearly identical so we removed avtab_search()
and converted the callers to avtab_search_node().
- Add some SELinux network auditing helpers
The helpers not only reduce a small amount of code duplication, but
they provide an opportunity to improve UDP flood performance
slightly by delaying initialization of the audit data in some
cases.
- Convert GFP_ATOMIC allocators to GFP_KERNEL when reading SELinux policy
There were two SELinux policy load helper functions that were
allocating memory using GFP_ATOMIC, they have been converted to
GFP_KERNEL.
- Quiet a KMSAN warning in selinux_inet_conn_request()
A one-line error path (re)set patch that resolves a KMSAN warning.
It is important to note that this doesn't represent a real bug in
the current code, but it quiets KMSAN and arguably hardens the code
against future changes.
- Cleanup the policy capability accessor functions
This is a follow-up to the patch which reverted SELinux to using a
global selinux_state pointer. This patch cleans up some artifacts
of that change and turns each accessor into a one-line READ_ONCE()
call into the policy capabilities array.
- A number of patches from Christian Göttsche
Christian submitted almost two-thirds of the patches in this pull
request as he worked to harden the SELinux code against type
differences, variable overflows, etc.
- Support for separating early userspace from the kernel in policy,
with a later revert
We did have a patch that added a new userspace initial SID which
would allow SELinux to distinguish between early user processes
created before the initial policy load and the kernel itself.
Unfortunately additional post-merge testing revealed a problematic
interaction with an old SELinux userspace on an old version of
Ubuntu so we've reverted the patch until we can resolve the
compatibility issue.
- Remove some outdated comments dealing with LSM hook registration
When we removed the runtime disable functionality we forgot to
remove some old comments discussing the importance of LSM hook
registration ordering.
- Minor administrative changes
Stephen Smalley updated his email address and "debranded" SELinux
from "NSA SELinux" to simply "SELinux". We've come a long way from
the original NSA submission and I would consider SELinux a true
community project at this point so removing the NSA branding just
makes sense"
* tag 'selinux-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (33 commits)
selinux: prevent KMSAN warning in selinux_inet_conn_request()
selinux: use unsigned iterator in nlmsgtab code
selinux: avoid implicit conversions in policydb code
selinux: avoid implicit conversions in selinuxfs code
selinux: make left shifts well defined
selinux: update type for number of class permissions in services code
selinux: avoid implicit conversions in avtab code
selinux: revert SECINITSID_INIT support
selinux: use GFP_KERNEL while reading binary policy
selinux: update comment on selinux_hooks[]
selinux: avoid implicit conversions in services code
selinux: avoid implicit conversions in mls code
selinux: use identical iterator type in hashtab_duplicate()
selinux: move debug functions into debug configuration
selinux: log about VM being executable by default
selinux: fix a 0/NULL mistmatch in ad_net_init_from_iif()
selinux: introduce SECURITY_SELINUX_DEBUG configuration
selinux: introduce and use lsm_ad_net_init*() helpers
selinux: update my email address
selinux: add missing newlines in pr_err() statements
...
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
reduces the special-case code for handling hugetlb pages in GUP. It
also speeds up GUP handling of transparent hugepages.
- Peng Zhang provides some maple tree speedups ("Optimize the fast path
of mas_store()").
- Sergey Senozhatsky has improved te performance of zsmalloc during
compaction (zsmalloc: small compaction improvements").
- Domenico Cerasuolo has developed additional selftest code for zswap
("selftests: cgroup: add zswap test program").
- xu xin has doe some work on KSM's handling of zero pages. These
changes are mainly to enable the user to better understand the
effectiveness of KSM's treatment of zero pages ("ksm: support tracking
KSM-placed zero-pages").
- Jeff Xu has fixes the behaviour of memfd's
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
- David Howells has fixed an fscache optimization ("mm, netfs, fscache:
Stop read optimisation when folio removed from pagecache").
- Axel Rasmussen has given userfaultfd the ability to simulate memory
poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD").
- Miaohe Lin has contributed some routine maintenance work on the
memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
check").
- Peng Zhang has contributed some maintenance work on the maple tree
code ("Improve the validation for maple tree and some cleanup").
- Hugh Dickins has optimized the collapsing of shmem or file pages into
THPs ("mm: free retracted page table by RCU").
- Jiaqi Yan has a patch series which permits us to use the healthy
subpages within a hardware poisoned huge page for general purposes
("Improve hugetlbfs read on HWPOISON hugepages").
- Kemeng Shi has done some maintenance work on the pagetable-check code
("Remove unused parameters in page_table_check").
- More folioification work from Matthew Wilcox ("More filesystem folio
conversions for 6.6"), ("Followup folio conversions for zswap"). And
from ZhangPeng ("Convert several functions in page_io.c to use a
folio").
- page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
- Baoquan He has converted some architectures to use the GENERIC_IOREMAP
ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take
GENERIC_IOREMAP way").
- Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
batched/deferred tlb shootdown during page reclamation/migration").
- Better maple tree lockdep checking from Liam Howlett ("More strict
maple tree lockdep"). Liam also developed some efficiency improvements
("Reduce preallocations for maple tree").
- Cleanup and optimization to the secondary IOMMU TLB invalidation, from
Alistair Popple ("Invalidate secondary IOMMU TLB on permission
upgrade").
- Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
for arm64").
- Kemeng Shi provides some maintenance work on the compaction code ("Two
minor cleanups for compaction").
- Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most
file-backed faults under the VMA lock").
- Aneesh Kumar contributes code to use the vmemmap optimization for DAX
on ppc64, under some circumstances ("Add support for DAX vmemmap
optimization for ppc64").
- page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
data in page_ext"), ("minor cleanups to page_ext header").
- Some zswap cleanups from Johannes Weiner ("mm: zswap: three
cleanups").
- kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
- VMA handling cleanups from Kefeng Wang ("mm: convert to
vma_is_initial_heap/stack()").
- DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
address ranges and DAMON monitoring targets").
- Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
- Liam Howlett has improved the maple tree node replacement code
("maple_tree: Change replacement strategy").
- ZhangPeng has a general code cleanup - use the K() macro more widely
("cleanup with helper macro K()").
- Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap
on memory feature on ppc64").
- pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
in page_alloc"), ("Two minor cleanups for get pageblock migratetype").
- Vishal Moola introduces a memory descriptor for page table tracking,
"struct ptdesc" ("Split ptdesc from struct page").
- memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
for vm.memfd_noexec").
- MM include file rationalization from Hugh Dickins ("arch: include
asm/cacheflush.h in asm/hugetlb.h").
- THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
output").
- kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
object_cache instead of kmemleak_initialized").
- More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
and _folio_order").
- A VMA locking scalability improvement from Suren Baghdasaryan
("Per-VMA lock support for swap and userfaults").
- pagetable handling cleanups from Matthew Wilcox ("New page table range
API").
- A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
using page->private on tail pages for THP_SWAP + cleanups").
- Cleanups and speedups to the hugetlb fault handling from Matthew
Wilcox ("Change calling convention for ->huge_fault").
- Matthew Wilcox has also done some maintenance work on the MM subsystem
documentation ("Improve mm documentation").
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO1JUQAKCRDdBJ7gKXxA
jrMwAP47r/fS8vAVT3zp/7fXmxaJYTK27CTAM881Gw1SDhFM/wEAv8o84mDenCg6
Nfio7afS1ncD+hPYT8947UnLxTgn+ww=
=Afws
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Some swap cleanups from Ma Wupeng ("fix WARN_ON in
add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
reduces the special-case code for handling hugetlb pages in GUP. It
also speeds up GUP handling of transparent hugepages.
- Peng Zhang provides some maple tree speedups ("Optimize the fast path
of mas_store()").
- Sergey Senozhatsky has improved te performance of zsmalloc during
compaction (zsmalloc: small compaction improvements").
- Domenico Cerasuolo has developed additional selftest code for zswap
("selftests: cgroup: add zswap test program").
- xu xin has doe some work on KSM's handling of zero pages. These
changes are mainly to enable the user to better understand the
effectiveness of KSM's treatment of zero pages ("ksm: support
tracking KSM-placed zero-pages").
- Jeff Xu has fixes the behaviour of memfd's
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
- David Howells has fixed an fscache optimization ("mm, netfs, fscache:
Stop read optimisation when folio removed from pagecache").
- Axel Rasmussen has given userfaultfd the ability to simulate memory
poisoning ("add UFFDIO_POISON to simulate memory poisoning with
UFFD").
- Miaohe Lin has contributed some routine maintenance work on the
memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
check").
- Peng Zhang has contributed some maintenance work on the maple tree
code ("Improve the validation for maple tree and some cleanup").
- Hugh Dickins has optimized the collapsing of shmem or file pages into
THPs ("mm: free retracted page table by RCU").
- Jiaqi Yan has a patch series which permits us to use the healthy
subpages within a hardware poisoned huge page for general purposes
("Improve hugetlbfs read on HWPOISON hugepages").
- Kemeng Shi has done some maintenance work on the pagetable-check code
("Remove unused parameters in page_table_check").
- More folioification work from Matthew Wilcox ("More filesystem folio
conversions for 6.6"), ("Followup folio conversions for zswap"). And
from ZhangPeng ("Convert several functions in page_io.c to use a
folio").
- page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
- Baoquan He has converted some architectures to use the
GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
architectures to take GENERIC_IOREMAP way").
- Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
batched/deferred tlb shootdown during page reclamation/migration").
- Better maple tree lockdep checking from Liam Howlett ("More strict
maple tree lockdep"). Liam also developed some efficiency
improvements ("Reduce preallocations for maple tree").
- Cleanup and optimization to the secondary IOMMU TLB invalidation,
from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
upgrade").
- Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
for arm64").
- Kemeng Shi provides some maintenance work on the compaction code
("Two minor cleanups for compaction").
- Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
most file-backed faults under the VMA lock").
- Aneesh Kumar contributes code to use the vmemmap optimization for DAX
on ppc64, under some circumstances ("Add support for DAX vmemmap
optimization for ppc64").
- page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
data in page_ext"), ("minor cleanups to page_ext header").
- Some zswap cleanups from Johannes Weiner ("mm: zswap: three
cleanups").
- kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
- VMA handling cleanups from Kefeng Wang ("mm: convert to
vma_is_initial_heap/stack()").
- DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
address ranges and DAMON monitoring targets").
- Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
- Liam Howlett has improved the maple tree node replacement code
("maple_tree: Change replacement strategy").
- ZhangPeng has a general code cleanup - use the K() macro more widely
("cleanup with helper macro K()").
- Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
memmap on memory feature on ppc64").
- pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
in page_alloc"), ("Two minor cleanups for get pageblock
migratetype").
- Vishal Moola introduces a memory descriptor for page table tracking,
"struct ptdesc" ("Split ptdesc from struct page").
- memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
for vm.memfd_noexec").
- MM include file rationalization from Hugh Dickins ("arch: include
asm/cacheflush.h in asm/hugetlb.h").
- THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
output").
- kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
object_cache instead of kmemleak_initialized").
- More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
and _folio_order").
- A VMA locking scalability improvement from Suren Baghdasaryan
("Per-VMA lock support for swap and userfaults").
- pagetable handling cleanups from Matthew Wilcox ("New page table
range API").
- A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
using page->private on tail pages for THP_SWAP + cleanups").
- Cleanups and speedups to the hugetlb fault handling from Matthew
Wilcox ("Change calling convention for ->huge_fault").
- Matthew Wilcox has also done some maintenance work on the MM
subsystem documentation ("Improve mm documentation").
* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
maple_tree: shrink struct maple_tree
maple_tree: clean up mas_wr_append()
secretmem: convert page_is_secretmem() to folio_is_secretmem()
nios2: fix flush_dcache_page() for usage from irq context
hugetlb: add documentation for vma_kernel_pagesize()
mm: add orphaned kernel-doc to the rst files.
mm: fix clean_record_shared_mapping_range kernel-doc
mm: fix get_mctgt_type() kernel-doc
mm: fix kernel-doc warning from tlb_flush_rmaps()
mm: remove enum page_entry_size
mm: allow ->huge_fault() to be called without the mmap_lock held
mm: move PMD_ORDER to pgtable.h
mm: remove checks for pte_index
memcg: remove duplication detection for mem_cgroup_uncharge_swap
mm/huge_memory: work on folio->swap instead of page->private when splitting folio
mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
mm/swap: use dedicated entry for swap in folio
mm/swap: stop using page->private on tail pages for THP_SWAP
selftests/mm: fix WARNING comparing pointer to 0
selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
...
Core
----
- Increase size limits for to-be-sent skb frag allocations. This
allows tun, tap devices and packet sockets to better cope with large
writes operations.
- Store netdevs in an xarray, to simplify iterating over netdevs.
- Refactor nexthop selection for multipath routes.
- Improve sched class lifetime handling.
- Add backup nexthop ID support for bridge.
- Implement drop reasons support in openvswitch.
- Several data races annotations and fixes.
- Constify the sk parameter of routing functions.
- Prepend kernel version to netconsole message.
Protocols
---------
- Implement support for TCP probing the peer being under memory
pressure.
- Remove hard coded limitation on IPv6 specific info placement
inside the socket struct.
- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated
per socket scaling factor.
- Scaling-up the IPv6 expired route GC via a separated list of
expiring routes.
- In-kernel support for the TLS alert protocol.
- Better support for UDP reuseport with connected sockets.
- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
header size.
- Get rid of additional ancillary per MPTCP connection struct socket.
- Implement support for BPF-based MPTCP packet schedulers.
- Format MPTCP subtests selftests results in TAP.
- Several new SMC 2.1 features including unique experimental options,
max connections per lgr negotiation, max links per lgr negotiation.
BPF
---
- Multi-buffer support in AF_XDP.
- Add multi uprobe BPF links for attaching multiple uprobes
and usdt probes, which is significantly faster and saves extra fds.
- Implement an fd-based tc BPF attach API (TCX) and BPF link support on
top of it.
- Add SO_REUSEPORT support for TC bpf_sk_assign.
- Support new instructions from cpu v4 to simplify the generated code and
feature completeness, for x86, arm64, riscv64.
- Support defragmenting IPv(4|6) packets in BPF.
- Teach verifier actual bounds of bpf_get_smp_processor_id()
and fix perf+libbpf issue related to custom section handling.
- Introduce bpf map element count and enable it for all program types.
- Add a BPF hook in sys_socket() to change the protocol ID
from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy.
- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress.
- Add uprobe support for the bpf_get_func_ip helper.
- Check skb ownership against full socket.
- Support for up to 12 arguments in BPF trampoline.
- Extend link_info for kprobe_multi and perf_event links.
Netfilter
---------
- Speed-up process exit by aborting ruleset validation if a
fatal signal is pending.
- Allow NLA_POLICY_MASK to be used with BE16/BE32 types.
Driver API
----------
- Page pool optimizations, to improve data locality and cache usage.
- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need
for raw ioctl() handling in drivers.
- Simplify genetlink dump operations (doit/dumpit) providing them
the common information already populated in struct genl_info.
- Extend and use the yaml devlink specs to [re]generate the split ops.
- Introduce devlink selective dumps, to allow SF filtering SF based on
handle and other attributes.
- Add yaml netlink spec for netlink-raw families, allow route, link and
address related queries via the ynl tool.
- Remove phylink legacy mode support.
- Support offload LED blinking to phy.
- Add devlink port function attributes for IPsec.
New hardware / drivers
----------------------
- Ethernet:
- Broadcom ASP 2.0 (72165) ethernet controller
- MediaTek MT7988 SoC
- Texas Instruments AM654 SoC
- Texas Instruments IEP driver
- Atheros qca8081 phy
- Marvell 88Q2110 phy
- NXP TJA1120 phy
- WiFi:
- MediaTek mt7981 support
- Can:
- Kvaser SmartFusion2 PCI Express devices
- Allwinner T113 controllers
- Texas Instruments tcan4552/4553 chips
- Bluetooth:
- Intel Gale Peak
- Qualcomm WCN3988 and WCN7850
- NXP AW693 and IW624
- Mediatek MT2925
Drivers
-------
- Ethernet NICs:
- nVidia/Mellanox:
- mlx5:
- support UDP encapsulation in packet offload mode
- IPsec packet offload support in eswitch mode
- improve aRFS observability by adding new set of counters
- extends MACsec offload support to cover RoCE traffic
- dynamic completion EQs
- mlx4:
- convert to use auxiliary bus instead of custom interface logic
- Intel
- ice:
- implement switchdev bridge offload, even for LAG interfaces
- implement SRIOV support for LAG interfaces
- igc:
- add support for multiple in-flight TX timestamps
- Broadcom:
- bnxt:
- use the unified RX page pool buffers for XDP and non-XDP
- use the NAPI skb allocation cache
- OcteonTX2:
- support Round Robin scheduling HTB offload
- TC flower offload support for SPI field
- Freescale:
- add XDP_TX feature support
- AMD:
- ionic: add support for PCI FLR event
- sfc:
- basic conntrack offload
- introduce eth, ipv4 and ipv6 pedit offloads
- ST Microelectronics:
- stmmac: maximze PTP timestamping resolution
- Virtual NICs:
- Microsoft vNIC:
- batch ringing RX queue doorbell on receiving packets
- add page pool for RX buffers
- Virtio vNIC:
- add per queue interrupt coalescing support
- Google vNIC:
- add queue-page-list mode support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add port range matching tc-flower offload
- permit enslavement to netdevices with uppers
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- convert to phylink_pcs
- Renesas:
- r8A779fx: add speed change support
- rzn1: enables vlan support
- Ethernet PHYs:
- convert mv88e6xxx to phylink_pcs
- WiFi:
- Qualcomm Wi-Fi 7 (ath12k):
- extremely High Throughput (EHT) PHY support
- RealTek (rtl8xxxu):
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
- RealTek (rtw89):
- Introduce Time Averaged SAR (TAS) support
- Connector:
- support for event filtering
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmTt1ZoSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkgFUP/REFaYWdWUvAzmWeezyx9dqgZMfSOjWq
9QvySiA94OAOcjIYkb7wfzQ5BBAZqaBQ/f8XqWwS1EDDDEBs8sP1cxmABKwW7Hsr
qFRu2sOqLzKBk223d0jIgEocfQaFpGbF71gXoTlDivBjBi5UxWm9bF0XnbYWcKgO
/QEvzNosi9uNdi85Fzmv62J6YzAdidEpwGsM7X2CfejwNRmStxAEg/NwvRR0Hyiq
OJCo97omEgTRaUle8nc64PDx33u4h5kQ1BkaeHEv0rbE3hftFC2YPKn/InmqSFGz
6ew2xnrGPR37LCuAiCcIIv6yR7K0eu0iYJ7jXwZxBDqxGavEPuwWGBoCP6qFiitH
ZLWhIrAUrdmSbySkTOCONhJ475qFAuQoYHYpZnX/bJZUHlSsb/9lwDJYJQGpVfd1
/daqJVSb7lhaifmNO1iNd/ibCIXq9zapwtkRwA897M8GkZBTsnVvazFld1Em+Se3
Bx6DSDUVBqVQ9fpZG2IAGD6odDwOzC1lF2IoceFvK9Ff6oE0psI+A0qNLMkHxZbW
Qlo7LsNe53hpoCC+yHTfXX7e/X8eNt0EnCGOQJDusZ0Nr3K7H4LKFA0i8UBUK05n
4lKnnaSQW7GQgdofLWt103OMDR9GoDxpFsm7b1X9+AEk6Fz6tq50wWYeMZETUKYP
DCW8VGFOZjZM
=9CsR
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Increase size limits for to-be-sent skb frag allocations. This
allows tun, tap devices and packet sockets to better cope with
large writes operations
- Store netdevs in an xarray, to simplify iterating over netdevs
- Refactor nexthop selection for multipath routes
- Improve sched class lifetime handling
- Add backup nexthop ID support for bridge
- Implement drop reasons support in openvswitch
- Several data races annotations and fixes
- Constify the sk parameter of routing functions
- Prepend kernel version to netconsole message
Protocols:
- Implement support for TCP probing the peer being under memory
pressure
- Remove hard coded limitation on IPv6 specific info placement inside
the socket struct
- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per
socket scaling factor
- Scaling-up the IPv6 expired route GC via a separated list of
expiring routes
- In-kernel support for the TLS alert protocol
- Better support for UDP reuseport with connected sockets
- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
header size
- Get rid of additional ancillary per MPTCP connection struct socket
- Implement support for BPF-based MPTCP packet schedulers
- Format MPTCP subtests selftests results in TAP
- Several new SMC 2.1 features including unique experimental options,
max connections per lgr negotiation, max links per lgr negotiation
BPF:
- Multi-buffer support in AF_XDP
- Add multi uprobe BPF links for attaching multiple uprobes and usdt
probes, which is significantly faster and saves extra fds
- Implement an fd-based tc BPF attach API (TCX) and BPF link support
on top of it
- Add SO_REUSEPORT support for TC bpf_sk_assign
- Support new instructions from cpu v4 to simplify the generated code
and feature completeness, for x86, arm64, riscv64
- Support defragmenting IPv(4|6) packets in BPF
- Teach verifier actual bounds of bpf_get_smp_processor_id() and fix
perf+libbpf issue related to custom section handling
- Introduce bpf map element count and enable it for all program types
- Add a BPF hook in sys_socket() to change the protocol ID from
IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy
- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress
- Add uprobe support for the bpf_get_func_ip helper
- Check skb ownership against full socket
- Support for up to 12 arguments in BPF trampoline
- Extend link_info for kprobe_multi and perf_event links
Netfilter:
- Speed-up process exit by aborting ruleset validation if a fatal
signal is pending
- Allow NLA_POLICY_MASK to be used with BE16/BE32 types
Driver API:
- Page pool optimizations, to improve data locality and cache usage
- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the
need for raw ioctl() handling in drivers
- Simplify genetlink dump operations (doit/dumpit) providing them the
common information already populated in struct genl_info
- Extend and use the yaml devlink specs to [re]generate the split ops
- Introduce devlink selective dumps, to allow SF filtering SF based
on handle and other attributes
- Add yaml netlink spec for netlink-raw families, allow route, link
and address related queries via the ynl tool
- Remove phylink legacy mode support
- Support offload LED blinking to phy
- Add devlink port function attributes for IPsec
New hardware / drivers:
- Ethernet:
- Broadcom ASP 2.0 (72165) ethernet controller
- MediaTek MT7988 SoC
- Texas Instruments AM654 SoC
- Texas Instruments IEP driver
- Atheros qca8081 phy
- Marvell 88Q2110 phy
- NXP TJA1120 phy
- WiFi:
- MediaTek mt7981 support
- Can:
- Kvaser SmartFusion2 PCI Express devices
- Allwinner T113 controllers
- Texas Instruments tcan4552/4553 chips
- Bluetooth:
- Intel Gale Peak
- Qualcomm WCN3988 and WCN7850
- NXP AW693 and IW624
- Mediatek MT2925
Drivers:
- Ethernet NICs:
- nVidia/Mellanox:
- mlx5:
- support UDP encapsulation in packet offload mode
- IPsec packet offload support in eswitch mode
- improve aRFS observability by adding new set of counters
- extends MACsec offload support to cover RoCE traffic
- dynamic completion EQs
- mlx4:
- convert to use auxiliary bus instead of custom interface
logic
- Intel
- ice:
- implement switchdev bridge offload, even for LAG
interfaces
- implement SRIOV support for LAG interfaces
- igc:
- add support for multiple in-flight TX timestamps
- Broadcom:
- bnxt:
- use the unified RX page pool buffers for XDP and non-XDP
- use the NAPI skb allocation cache
- OcteonTX2:
- support Round Robin scheduling HTB offload
- TC flower offload support for SPI field
- Freescale:
- add XDP_TX feature support
- AMD:
- ionic: add support for PCI FLR event
- sfc:
- basic conntrack offload
- introduce eth, ipv4 and ipv6 pedit offloads
- ST Microelectronics:
- stmmac: maximze PTP timestamping resolution
- Virtual NICs:
- Microsoft vNIC:
- batch ringing RX queue doorbell on receiving packets
- add page pool for RX buffers
- Virtio vNIC:
- add per queue interrupt coalescing support
- Google vNIC:
- add queue-page-list mode support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add port range matching tc-flower offload
- permit enslavement to netdevices with uppers
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- convert to phylink_pcs
- Renesas:
- r8A779fx: add speed change support
- rzn1: enables vlan support
- Ethernet PHYs:
- convert mv88e6xxx to phylink_pcs
- WiFi:
- Qualcomm Wi-Fi 7 (ath12k):
- extremely High Throughput (EHT) PHY support
- RealTek (rtl8xxxu):
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
- RealTek (rtw89):
- Introduce Time Averaged SAR (TAS) support
- Connector:
- support for event filtering"
* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits)
net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show
net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler
net: stmmac: clarify difference between "interface" and "phy_interface"
r8152: add vendor/device ID pair for D-Link DUB-E250
devlink: move devlink_notify_register/unregister() to dev.c
devlink: move small_ops definition into netlink.c
devlink: move tracepoint definitions into core.c
devlink: push linecard related code into separate file
devlink: push rate related code into separate file
devlink: push trap related code into separate file
devlink: use tracepoint_enabled() helper
devlink: push region related code into separate file
devlink: push param related code into separate file
devlink: push resource related code into separate file
devlink: push dpipe related code into separate file
devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
devlink: push shared buffer related code into separate file
devlink: push port related code into separate file
devlink: push object register/unregister notifications into separate helpers
inet: fix IP_TRANSPARENT error handling
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXTxQAKCRCRxhvAZXjc
okaVAP94WAlItvDRt/z2Wtzf0+RqPZeTXEdGTxua8+RxqCyYIQD+OO5nRfKQPHlV
AqqGJMKItQMSMIYgB5ftqVhNWZfnHgM=
=pSEW
-----END PGP SIGNATURE-----
Merge tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual miscellaneous features, cleanups, and fixes
for vfs and individual filesystems.
Features:
- Block mode changes on symlinks and rectify our broken semantics
- Report file modifications via fsnotify() for splice
- Allow specifying an explicit timeout for the "rootwait" kernel
command line option. This allows to timeout and reboot instead of
always waiting indefinitely for the root device to show up
- Use synchronous fput for the close system call
Cleanups:
- Get rid of open-coded lockdep workarounds for async io submitters
and replace it all with a single consolidated helper
- Simplify epoll allocation helper
- Convert simple_write_begin and simple_write_end to use a folio
- Convert page_cache_pipe_buf_confirm() to use a folio
- Simplify __range_close to avoid pointless locking
- Disable per-cpu buffer head cache for isolated cpus
- Port ecryptfs to kmap_local_page() api
- Remove redundant initialization of pointer buf in pipe code
- Unexport the d_genocide() function which is only used within core
vfs
- Replace printk(KERN_ERR) and WARN_ON() with WARN()
Fixes:
- Fix various kernel-doc issues
- Fix refcount underflow for eventfds when used as EFD_SEMAPHORE
- Fix a mainly theoretical issue in devpts
- Check the return value of __getblk() in reiserfs
- Fix a racy assert in i_readcount_dec
- Fix integer conversion issues in various functions
- Fix LSM security context handling during automounts that prevented
NFS superblock sharing"
* tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
cachefiles: use kiocb_{start,end}_write() helpers
ovl: use kiocb_{start,end}_write() helpers
aio: use kiocb_{start,end}_write() helpers
io_uring: use kiocb_{start,end}_write() helpers
fs: create kiocb_{start,end}_write() helpers
fs: add kerneldoc to file_{start,end}_write() helpers
io_uring: rename kiocb_end_write() local helper
splice: Convert page_cache_pipe_buf_confirm() to use a folio
libfs: Convert simple_write_begin and simple_write_end to use a folio
fs/dcache: Replace printk and WARN_ON by WARN
fs/pipe: remove redundant initialization of pointer buf
fs: Fix kernel-doc warnings
devpts: Fix kernel-doc warnings
doc: idmappings: fix an error and rephrase a paragraph
init: Add support for rootwait timeout parameter
vfs: fix up the assert in i_readcount_dec
fs: Fix one kernel-doc comment
docs: filesystems: idmappings: clarify from where idmappings are taken
fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs
vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZOXTKAAKCRCRxhvAZXjc
oifJAQCzi/p+AdQu8LA/0XvR7fTwaq64ZDCibU4BISuLGT2kEgEAuGbuoFZa0rs2
XYD/s4+gi64p9Z01MmXm2XO1pu3GPg0=
=eJz5
-----END PGP SIGNATURE-----
Merge tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs timestamp updates from Christian Brauner:
"This adds VFS support for multi-grain timestamps and converts tmpfs,
xfs, ext4, and btrfs to use them. This carries acks from all relevant
filesystems.
The VFS always uses coarse-grained timestamps when updating the ctime
and mtime after a change. This has the benefit of allowing filesystems
to optimize away a lot of metadata updates, down to around 1 per
jiffy, even when a file is under heavy writes.
Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. A lot of changes
can happen in a jiffy, so timestamps aren't sufficient to help the
client decide to invalidate the cache.
Even with NFSv4, a lot of exported filesystems don't properly support
a change attribute and are subject to the same problems with timestamp
granularity. Other applications have similar issues with timestamps
(e.g., backup applications).
If we were to always use fine-grained timestamps, that would improve
the situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.
This introduces fine-grained timestamps that are used when they are
actively queried.
This uses the 31st bit of the ctime tv_nsec field to indicate that
something has queried the inode for the mtime or ctime. When this flag
is set, on the next mtime or ctime update, the kernel will fetch a
fine-grained timestamp instead of the usual coarse-grained one.
As POSIX generally mandates that when the mtime changes, the ctime
must also change the kernel always stores normalized ctime values, so
only the first 30 bits of the tv_nsec field are ever used.
Filesytems can opt into this behavior by setting the FS_MGTIME flag in
the fstype. Filesystems that don't set this flag will continue to use
coarse-grained timestamps.
Various preparatory changes, fixes and cleanups are included:
- Fixup all relevant places where POSIX requires updating ctime
together with mtime. This is a wide-range of places and all
maintainers provided necessary Acks.
- Add new accessors for inode->i_ctime directly and change all
callers to rely on them. Plain accesses to inode->i_ctime are now
gone and it is accordingly rename to inode->__i_ctime and commented
as requiring accessors.
- Extend generic_fillattr() to pass in a request mask mirroring in a
sense the statx() uapi. This allows callers to pass in a request
mask to only get a subset of attributes filled in.
- Rework timestamp updates so it's possible to drop the @now
parameter the update_time() inode operation and associated helpers.
- Add inode_update_timestamps() and convert all filesystems to it
removing a bunch of open-coding"
* tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits)
btrfs: convert to multigrain timestamps
ext4: switch to multigrain timestamps
xfs: switch to multigrain timestamps
tmpfs: add support for multigrain timestamps
fs: add infrastructure for multigrain timestamps
fs: drop the timespec64 argument from update_time
xfs: have xfs_vn_update_time gets its own timestamp
fat: make fat_update_time get its own timestamp
fat: remove i_version handling from fat_update_time
ubifs: have ubifs_update_time use inode_update_timestamps
btrfs: have it use inode_update_timestamps
fs: drop the timespec64 arg from generic_update_time
fs: pass the request_mask to generic_fillattr
fs: remove silly warning from current_time
gfs2: fix timestamp handling on quota inodes
fs: rename i_ctime field to __i_ctime
selinux: convert to ctime accessor functions
security: convert to ctime accessor functions
apparmor: convert to ctime accessor functions
sunrpc: convert to ctime accessor functions
...
Use the helpers to simplify code.
Link: https://lkml.kernel.org/r/20230728050043.59880-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Christian Göttsche <cgzones@googlemail.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Set the next pointer in filename_trans_read_helper() before attaching
the new node under construction to the list, otherwise garbage would be
dereferenced on subsequent failure during cleanup in the out goto label.
Cc: <stable@vger.kernel.org>
Fixes: 4300590243 ("selinux: implement new format of filename transitions")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
KMSAN reports the following issue:
[ 81.822503] =====================================================
[ 81.823222] BUG: KMSAN: uninit-value in selinux_inet_conn_request+0x2c8/0x4b0
[ 81.823891] selinux_inet_conn_request+0x2c8/0x4b0
[ 81.824385] security_inet_conn_request+0xc0/0x160
[ 81.824886] tcp_v4_route_req+0x30e/0x490
[ 81.825343] tcp_conn_request+0xdc8/0x3400
[ 81.825813] tcp_v4_conn_request+0x134/0x190
[ 81.826292] tcp_rcv_state_process+0x1f4/0x3b40
[ 81.826797] tcp_v4_do_rcv+0x9ca/0xc30
[ 81.827236] tcp_v4_rcv+0x3bf5/0x4180
[ 81.827670] ip_protocol_deliver_rcu+0x822/0x1230
[ 81.828174] ip_local_deliver_finish+0x259/0x370
[ 81.828667] ip_local_deliver+0x1c0/0x450
[ 81.829105] ip_sublist_rcv+0xdc1/0xf50
[ 81.829534] ip_list_rcv+0x72e/0x790
[ 81.829941] __netif_receive_skb_list_core+0x10d5/0x1180
[ 81.830499] netif_receive_skb_list_internal+0xc41/0x1190
[ 81.831064] napi_complete_done+0x2c4/0x8b0
[ 81.831532] e1000_clean+0x12bf/0x4d90
[ 81.831983] __napi_poll+0xa6/0x760
[ 81.832391] net_rx_action+0x84c/0x1550
[ 81.832831] __do_softirq+0x272/0xa6c
[ 81.833239] __irq_exit_rcu+0xb7/0x1a0
[ 81.833654] irq_exit_rcu+0x17/0x40
[ 81.834044] common_interrupt+0x8d/0xa0
[ 81.834494] asm_common_interrupt+0x2b/0x40
[ 81.834949] default_idle+0x17/0x20
[ 81.835356] arch_cpu_idle+0xd/0x20
[ 81.835766] default_idle_call+0x43/0x70
[ 81.836210] do_idle+0x258/0x800
[ 81.836581] cpu_startup_entry+0x26/0x30
[ 81.837002] __pfx_ap_starting+0x0/0x10
[ 81.837444] secondary_startup_64_no_verify+0x17a/0x17b
[ 81.837979]
[ 81.838166] Local variable nlbl_type.i created at:
[ 81.838596] selinux_inet_conn_request+0xe3/0x4b0
[ 81.839078] security_inet_conn_request+0xc0/0x160
KMSAN warning is reproducible with:
* netlabel_mgmt_protocount is 0 (e.g. netlbl_enabled() returns 0)
* CONFIG_SECURITY_NETWORK_XFRM may be set or not
* CONFIG_KMSAN=y
* `ssh USER@HOSTNAME /bin/date`
selinux_skb_peerlbl_sid() will call selinux_xfrm_skb_sid(), then fall
to selinux_netlbl_skbuff_getsid() which will not initialize nlbl_type,
but it will be passed to:
err = security_net_peersid_resolve(nlbl_sid,
nlbl_type, xfrm_sid, sid);
and checked by KMSAN, although it will not be used inside
security_net_peersid_resolve() (at least now), since this function
will check either (xfrm_sid == SECSID_NULL) or (nlbl_sid ==
SECSID_NULL) first and return before using uninitialized nlbl_type.
Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com>
[PM: subject line tweak, removed 'fixes' tag as code is not broken]
Signed-off-by: Paul Moore <paul@paul-moore.com>
SELinux registers the implementation for the "binder_transfer_file"
hook. Looking at the function implementation we observe that the
parameter "file" is not changing.
Mark the "file" parameter of LSM hook security_binder_transfer_file() as
"const" since it will not be changing in the LSM hook.
Signed-off-by: Khadija Kamran <kamrankhadijadj@gmail.com>
[PM: subject line whitespace fix]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use the identical type for local variables, e.g. loop counters.
Declare members of struct policydb_compat_info unsigned to consistently
use unsigned iterators. They hold read-only non-negative numbers in the
global variable policydb_compat.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use umode_t as parameter type for sel_make_inode(), which assigns the
value to the member i_mode of struct inode.
Use identical and unsigned types for loop iterators.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The loops upper bound represent the number of permissions used (for the
current class or in general). The limit for this is 32, thus we might
left shift of one less, 31. Shifting a base of 1 results in undefined
behavior; use (u32)1 as base.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Security classes have only up to 32 permissions, hence using an u16 is
sufficient (while improving padding in struct selinux_mapping).
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return u32 from avtab_hash() instead of int, since the hashing is done
on u32 and the result is used as an index on the hash array.
Use the type of the limit in for loops.
Avoid signed to unsigned conversion of multiplication result in
avtab_hash_eval() and perform multiplication in destination type.
Use unsigned loop iterator for index operations, to avoid sign
extension.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This commit reverts 5b0eea835d ("selinux: introduce an initial SID
for early boot processes") as it was found to cause problems on
distros with old SELinux userspace tools/libraries, specifically
Ubuntu 16.04.
Hopefully we will be able to re-add this functionality at a later
date, but let's revert this for now to help ensure a stable and
backwards compatible SELinux tree.
Link: https://lore.kernel.org/selinux/87edkseqf8.fsf@mail.lhotse
Acked-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Three LSMs register the implementations for the "capget" hook: AppArmor,
SELinux, and the normal capability code. Looking at the function
implementations we may observe that the first parameter "target" is not
changing.
Mark the first argument "target" of LSM hook security_capget() as
"const" since it will not be changing in the LSM hook.
cap_capget() LSM hook declaration exceeds the 80 characters per line
limit. Split the function declaration to multiple lines to decrease the
line length.
Signed-off-by: Khadija Kamran <kamrankhadijadj@gmail.com>
Acked-by: John Johansen <john.johansen@canonical.com>
[PM: align the cap_capget() declaration, spelling fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use GFP_KERNEL instead of GFP_ATOMIC while reading a binary policy in
sens_read() and cat_read(), similar to surrounding code.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
After commit f22f9aaf6c ("selinux: remove the runtime disable
functionality"), the comment on selinux_hooks[] is out-of-date,
remove the last paragraph about runtime disable functionality.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use u32 as the output parameter type in security_get_classes() and
security_get_permissions(), based on the type of the symtab nprim
member.
Declare the read-only class string parameter of
security_get_permissions() const.
Avoid several implicit conversions by using the identical type for the
destination.
Use the type identical to the source for local variables.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: cleanup extra whitespace in subject]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use u32 for ebitmap bits and sensitivity levels, char for the default
range of a class.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: description tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use the identical type u32 for the loop iterator.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: remove extra whitespace in subject]
Signed-off-by: Paul Moore <paul@paul-moore.com>
avtab_hash_eval() and hashtab_stat() are only used in policydb.c when
the configuration SECURITY_SELINUX_DEBUG is enabled.
Move the function definitions under that configuration as well and
provide empty definitions in case SECURITY_SELINUX_DEBUG is disabled, to
avoid using #ifdef in the callers.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In case virtual memory is being marked as executable by default, SELinux
checks regarding explicit potential dangerous use are disabled.
Inform the user about it.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-89-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Use a NULL instead of a zero to resolve a int/pointer mismatch.
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307210332.4AqFZfzI-lkp@intel.com/
Fixes: dd51fcd42f ("selinux: introduce and use lsm_ad_net_init*() helpers")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The policy database code contains several debug output statements
related to hashtable utilization. Those are guarded by the macro
DEBUG_HASHES, which is neither documented nor set anywhere.
Introduce a new Kconfig configuration guarding this and potential
other future debugging related code. Disable the setting by default.
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: fixed line lengths in the help text]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Perf traces of network-related workload shows a measurable overhead
inside the network-related selinux hooks while zeroing the
lsm_network_audit struct.
In most cases we can delay the initialization of such structure to the
usage point, avoiding such overhead in a few cases.
Additionally, the audit code accesses the IP address information only
for AF_INET* families, and selinux_parse_skb() will fill-out the
relevant fields in such cases. When the family field is zeroed or the
initialization is followed by the mentioned parsing, the zeroing can be
limited to the sk, family and netif fields.
By factoring out the audit-data initialization to new helpers, this
patch removes some duplicate code and gives small but measurable
performance gain under UDP flood.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Update my email address; MAINTAINERS was updated some time ago.
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The kernel print statements do not append an implicit newline to format
strings.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
avtab_search() shares the same logic with avtab_search_node(), except
that it returns, if found, a pointer to the struct avtab_node member
datum instead of the node itself. Since the member is an embedded
struct, and not a pointer, the returned value of avtab_search() and
avtab_search_node() will always in unison either be NULL or non-NULL.
Drop avtab_search() and replace its calls by avtab_search_node() to
deduplicate logic and adopt the only caller caring for the type of
the returned value accordingly.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Change "NSA SELinux" to just "SELinux" in Kconfig help text and
comments. While NSA was the original primary developer and continues to
help maintain SELinux, SELinux has long since transitioned to a wide
community of developers and maintainers. SELinux has been part of the
mainline Linux kernel for nearly 20 years now [1] and has received
contributions from many individuals and organizations.
[1] https://lore.kernel.org/lkml/Pine.LNX.4.44.0308082228470.1852-100000@home.osdl.org/
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use the type bool as parameter type in
selinux_status_update_setenforce(). The related function
enforcing_enabled() returns the type bool, while the struct
selinux_kernel_status member enforcing uses an u32.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
hashtab_init() takes an u32 as size parameter type.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
The specifier for avtab keys is always supplied with a type of u16,
either as a macro to security_compute_sid() or the member specified of
the struct avtab_key.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use the identical types in assignments of local variables for the
destination.
Merge tail calls into return statements.
Avoid using leading underscores for function local variable.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use a consistent type of u32 for sequence numbers.
Use a non-negative and input parameter matching type for the hash
result.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Use the identical type sel_netif_hashfn() returns.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Align the type with the one used in selinux_notify_policy_change() and
the sequence member of struct selinux_kernel_status.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Prevent inserting more than the supported U32_MAX number of entries.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The function is always inlined and most of the time both relevant
arguments are compile time constants, allowing compilers to elide the
check. Also the function is part of outputting the policy, which is not
performance critical.
Also convert the type of the third parameter into a size_t, since it
should always be a non-negative number of elements.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The sk_getsecid hook shouldn't need to modify its socket argument.
Make it const so that callers of security_sk_classify_flow() can use a
const struct sock *.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, SELinux doesn't allow distinguishing between kernel threads
and userspace processes that are started before the policy is first
loaded - both get the label corresponding to the kernel SID. The only
way a process that persists from early boot can get a meaningful label
is by doing a voluntary dyntransition or re-executing itself.
Reusing the kernel label for userspace processes is problematic for
several reasons:
1. The kernel is considered to be a privileged domain and generally
needs to have a wide range of permissions allowed to work correctly,
which prevents the policy writer from effectively hardening against
early boot processes that might remain running unintentionally after
the policy is loaded (they represent a potential extra attack surface
that should be mitigated).
2. Despite the kernel being treated as a privileged domain, the policy
writer may want to impose certain special limitations on kernel
threads that may conflict with the requirements of intentional early
boot processes. For example, it is a good hardening practice to limit
what executables the kernel can execute as usermode helpers and to
confine the resulting usermode helper processes. However, a
(legitimate) process surviving from early boot may need to execute a
different set of executables.
3. As currently implemented, overlayfs remembers the security context of
the process that created an overlayfs mount and uses it to bound
subsequent operations on files using this context. If an overlayfs
mount is created before the SELinux policy is loaded, these "mounter"
checks are made against the kernel context, which may clash with
restrictions on the kernel domain (see 2.).
To resolve this, introduce a new initial SID (reusing the slot of the
former "init" initial SID) that will be assigned to any userspace
process started before the policy is first loaded. This is easy to do,
as we can simply label any process that goes through the
bprm_creds_for_exec LSM hook with the new init-SID instead of
propagating the kernel SID from the parent.
To provide backwards compatibility for existing policies that are
unaware of this new semantic of the "init" initial SID, introduce a new
policy capability "userspace_initial_context" and set the "init" SID to
the same context as the "kernel" SID unless this capability is set by
the policy.
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In the process of reverting back to directly accessing the global
selinux_state pointer we left behind some artifacts in the
selinux_policycap_XXX() helper functions. This patch cleans up
some of that left-behind cruft.
Signed-off-by: Paul Moore <paul@paul-moore.com>
Currently, the LSM infrastructure supports only one LSM providing an xattr
and EVM calculating the HMAC on that xattr, plus other inode metadata.
Allow all LSMs to provide one or multiple xattrs, by extending the security
blob reservation mechanism. Introduce the new lbs_xattr_count field of the
lsm_blob_sizes structure, so that each LSM can specify how many xattrs it
needs, and the LSM infrastructure knows how many xattr slots it should
allocate.
Modify the inode_init_security hook definition, by passing the full
xattr array allocated in security_inode_init_security(), and the current
number of xattr slots in that array filled by LSMs. The first parameter
would allow EVM to access and calculate the HMAC on xattrs supplied by
other LSMs, the second to not leave gaps in the xattr array, when an LSM
requested but did not provide xattrs (e.g. if it is not initialized).
Introduce lsm_get_xattr_slot(), which LSMs can call as many times as the
number specified in the lbs_xattr_count field of the lsm_blob_sizes
structure. During each call, lsm_get_xattr_slot() increments the number of
filled xattrs, so that at the next invocation it returns the next xattr
slot to fill.
Cleanup security_inode_init_security(). Unify the !initxattrs and
initxattrs case by simply not allocating the new_xattrs array in the
former. Update the documentation to reflect the changes, and fix the
description of the xattr name, as it is not allocated anymore.
Adapt both SELinux and Smack to use the new definition of the
inode_init_security hook, and to call lsm_get_xattr_slot() to obtain and
fill the reserved slots in the xattr array.
Move the xattr->name assignment after the xattr->value one, so that it is
done only in case of successful memory allocation.
Finally, change the default return value of the inode_init_security hook
from zero to -EOPNOTSUPP, so that BPF LSM correctly follows the hook
conventions.
Reported-by: Nicolas Bouchinet <nicolas.bouchinet@clip-os.org>
Link: https://lore.kernel.org/linux-integrity/Y1FTSIo+1x+4X0LS@archlinux/
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: minor comment and variable tweaks, approved by RS]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Avoid using the identifier `bool` to improve support with future C
standards. C23 is about to make `bool` a predefined macro (see N2654).
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
As noted in the comments of this commit, the current SELinux Makefile
requires features found in make v4.3 or later, which is problematic
as the Linux Kernel currently only requires make v3.82. This patch
fixes the SELinux Makefile so that it works properly on these older
versions of make, and adds a couple of comments to the Makefile about
how it can be improved once make v4.3 is required by the kernel.
Fixes: 6f933aa7df ("selinux: more Makefile tweaks")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Currently, when an NFS filesystem that supports passing LSM/SELinux
labels is mounted during early boot (before the SELinux policy is
loaded), it ends up mounted without the labeling support (i.e. with
Fedora policy all files get the generic NFS label
system_u:object_r:nfs_t:s0).
This is because the information that the NFS mount supports passing
labels (communicated to the LSM layer via the kern_flags argument of
security_set_mnt_opts()) gets lost and when the policy is loaded the
mount is initialized as if the passing is not supported.
Fix this by noting the "native labeling" in newsbsec->flags (using a new
SE_SBNATIVE flag) on the pre-policy-loaded call of
selinux_set_mnt_opts() and then making sure it is respected on the
second call from delayed_superblock_init().
Additionally, make inode_doinit_with_dentry() initialize the inode's
label from its extended attributes whenever it doesn't find it already
intitialized by the filesystem. This is needed to properly initialize
pre-existing inodes when delayed_superblock_init() is called. It should
not trigger in any other cases (and if it does, it's still better to
initialize the correct label instead of leaving the inode unlabeled).
Fixes: eb9ae68650 ("SELinux: Add new labeling type native labels")
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: fixed 'Fixes' tag format]
Signed-off-by: Paul Moore <paul@paul-moore.com>
exit_sel_fs() has been removed since commit f22f9aaf6c ("selinux:
remove the runtime disable functionality").
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The object context type `fs`, not to be confused with the well used
object context type `fscon`, was introduced in the initial git commit
1da177e4c3 ("Linux-2.6.12-rc2") but never actually used since.
The paper "A Security Policy Configuration for the Security-Enhanced
Linux" [1] mentions it under `7.2 File System Contexts` but also states:
Currently, this configuration is unused.
The policy statement defining such object contexts is `fscon`, e.g.:
fscon 2 3 gen_context(system_u:object_r:conA_t,s0) \
gen_context(system_u:object_r:conB_t,s0)
It is not documented at selinuxproject.org or in the SELinux notebook
and not supported by the Reference Policy buildsystem - the statement is
not properly sorted - and thus not used in the Reference or Fedora
Policy.
Print a warning message at policy load for each such object context:
SELinux: void and deprecated fs ocon 02:03
This topic was initially highlighted by Nicolas Iooss [2].
[1]: https://media.defense.gov/2021/Jul/29/2002815735/-1/-1/0/SELINUX-SECURITY-POLICY-CONFIGURATION-REPORT.PDF
[2]: https://lore.kernel.org/selinux/CAJfZ7=mP2eJaq2BfO3y0VnwUJaY2cS2p=HZMN71z1pKjzaT0Eg@mail.gmail.com/
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: tweaked deprecation comment, description line wrapping]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Include all necessary headers in header files to enable third party
applications, like LSP servers, to resolve all used symbols.
ibpkey.h: include "flask.h" for SECINITSID_UNLABELED
initial_sid_to_string.h: include <linux/stddef.h> for NULL
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit 53f3517ae0 ("selinux: do not leave dangling pointer behind")
reset the `str` field of the `context` struct in an OOM error branch.
In this struct the fields `str` and `len` are coupled and should be kept
in sync. Set the length to zero according to the string be set to NULL.
Fixes: 53f3517ae0 ("selinux: do not leave dangling pointer behind")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Newly added subflows should inherit the LSM label from the associated
MPTCP socket regardless of the current context.
This patch implements the above copying sid and class from the MPTCP
socket context, deleting the existing subflow label, if any, and then
re-creating the correct one.
The new helper reuses the selinux_netlbl_sk_security_free() function,
and the latter can end-up being called multiple times with the same
argument; we additionally need to make it idempotent.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
A few small tweaks to selinux_audit_rule_init():
- Adjust how we use the @rc variable so we are not doing any extra
work in the common/success case.
- Related to the above, rework the 'out' jump label so that the
success and error paths are different, simplifying both.
- Cleanup some of the vertical whitespace while we are making the
other changes.
Signed-off-by: Paul Moore <paul@paul-moore.com>
The array of mount tokens in only used in match_opt_prefix() and never
modified.
The array of symtab names is never modified and only used in the
DEBUG_HASHES configuration as output.
The array of files for the SElinux filesystem sub-directory `ss` is
similar to the other `struct tree_descr` usages only read from to
construct the containing entries.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The second parameter `tag` of avtab_hash_eval() is only used for
printing. In policydb_index() it is called with a string literal:
avtab_hash_eval(&p->te_avtab, "rules");
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: slight formatting tweak in description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit 539813e418 ("selinux: stop returning node from avc_insert()")
converted the return value of avc_insert() to void but left the now
unnecessary trailing return statement.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Since commit f22f9aaf6c ("selinux: remove the runtime disable
functionality") the function avc_disable() is no longer used.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In case mls_context_cpy() fails due to OOM set the free'd pointer in
context_cpy() to NULL to avoid it potentially being dereferenced or
free'd again in future. Freeing a NULL pointer is well-defined and a
hard NULL dereference crash is at least not exploitable and should give
a workable stack trace.
Fixes: 12b29f3455 ("selinux: support deferred mapping of contexts")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
A few small tweaks to improve the SELinux Makefile:
- Define a new variable, 'genhdrs', to represent both flask.h and
av_permissions.h; this should help ensure consistent processing for
both generated headers.
- Move the 'ccflags-y' variable closer to the top, just after the
main 'obj-$(CONFIG_SECURITY_SELINUX)' definition to make it more
visible and improve the grouping in the Makefile.
- Rework some of the vertical whitespace to improve some of the
grouping in the Makefile.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The Makefile rule responsible for building flask.h and
av_permissions.h only lists flask.h as a target which means that
av_permissions.h is only generated when flask.h needs to be
generated. This patch fixes this by adding av_permissions.h as a
target to the rule.
Fixes: 8753f6bec3 ("selinux: generate flask headers during kernel build")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Make the flask.h target depend on the genheaders binary instead of
classmap.h to ensure that it is rebuilt if any of the dependencies of
genheaders are changed.
Notably this fixes flask.h not being rebuilt when
initial_sid_to_string.h is modified.
Fixes: 8753f6bec3 ("selinux: generate flask headers during kernel build")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The callers haven't used the returned node since
commit 21193dcd1f ("SELinux: more careful use of avd in
avc_has_perm_noaudit") and the return value assignments were removed in
commit 0a9876f36b ("selinux: Remove redundant assignments"). Stop
returning the node altogether and make the functions return void.
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
PM: minor subj tweak, repair whitespace damage
Signed-off-by: Paul Moore <paul@paul-moore.com>
After working with the larger SELinux-based distros for several
years, we're finally at a place where we can disable the SELinux
runtime disable functionality. The existing kernel deprecation
notice explains the functionality and why we want to remove it:
The selinuxfs "disable" node allows SELinux to be disabled at
runtime prior to a policy being loaded into the kernel. If
disabled via this mechanism, SELinux will remain disabled until
the system is rebooted.
The preferred method of disabling SELinux is via the "selinux=0"
boot parameter, but the selinuxfs "disable" node was created to
make it easier for systems with primitive bootloaders that did not
allow for easy modification of the kernel command line.
Unfortunately, allowing for SELinux to be disabled at runtime makes
it difficult to secure the kernel's LSM hooks using the
"__ro_after_init" feature.
It is that last sentence, mentioning the '__ro_after_init' hardening,
which is the real motivation for this change, and if you look at the
diffstat you'll see that the impact of this patch reaches across all
the different LSMs, helping prevent tampering at the LSM hook level.
From a SELinux perspective, it is important to note that if you
continue to disable SELinux via "/etc/selinux/config" it may appear
that SELinux is disabled, but it is simply in an uninitialized state.
If you load a policy with `load_policy -i`, you will see SELinux
come alive just as if you had loaded the policy during early-boot.
It is also worth noting that the "/sys/fs/selinux/disable" file is
always writable now, regardless of the Kconfig settings, but writing
to the file has no effect on the system, other than to display an
error on the console if a non-zero/true value is written.
Finally, in the several years where we have been working on
deprecating this functionality, there has only been one instance of
someone mentioning any user visible breakage. In this particular
case it was an individual's kernel test system, and the workaround
documented in the deprecation notice ("selinux=0" on the kernel
command line) resolved the issue without problem.
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
We originally promised that the SELinux 'checkreqprot' functionality
would be removed no sooner than June 2021, and now that it is March
2023 it seems like it is a good time to do the final removal. The
deprecation notice in the kernel provides plenty of detail on why
'checkreqprot' is not desirable, with the key point repeated below:
This was a compatibility mechanism for legacy userspace and
for the READ_IMPLIES_EXEC personality flag. However, if set to
1, it weakens security by allowing mappings to be made executable
without authorization by policy. The default value of checkreqprot
at boot was changed starting in Linux v4.4 to 0 (i.e. check the
actual protection), and Android and Linux distributions have been
explicitly writing a "0" to /sys/fs/selinux/checkreqprot during
initialization for some time.
Along with the official deprecation notice, we have been discussing
this on-list and directly with several of the larger SELinux-based
distros and everyone is happy to see this feature finally removed.
In an attempt to catch all of the smaller, and DIY, Linux systems
we have been writing a deprecation notice URL into the kernel log,
along with a growing ssleep() penalty, when admins enabled
checkreqprot at runtime or via the kernel command line. We have
yet to have anyone come to us and raise an objection to the
deprecation or planned removal.
It is worth noting that while this patch removes the checkreqprot
functionality, it leaves the user visible interfaces (kernel command
line and selinuxfs file) intact, just inert. This should help
prevent breakages with existing userspace tools that correctly, but
unnecessarily, disable checkreqprot at boot or runtime. Admins
that attempt to enable checkreqprot will be met with a removal
message in the kernel log.
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Linus observed that the pervasive passing of selinux_state pointers
introduced by me in commit aa8e712cee ("selinux: wrap global selinux
state") adds overhead and complexity without providing any
benefit. The original idea was to pave the way for SELinux namespaces
but those have not yet been implemented and there isn't currently
a concrete plan to do so. Remove the passing of the selinux_state
pointers, reverting to direct use of the single global selinux_state,
and likewise remove passing of child pointers like the selinux_avc.
The selinux_policy pointer remains as it is needed for atomic switching
of policies.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202303101057.mZ3Gv5fK-lkp@intel.com/
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This is based on earlier patch posted to the list by Linus, his
commit description read:
"avc_has_perm_noaudit()is one of those hot functions that end up
being used by almost all filesystem operations (through
"avc_has_perm()") and it's intended to be cheap enough to inline.
However, it turns out that the unlikely parts of it (where it
doesn't find an existing avc node) need a fair amount of stack
space for the automatic replacement node, so if it were to be
inlined (at least clang does not) it would just use stack space
unnecessarily.
So split the unlikely part out of it, and mark that part noinline.
That improves the actual likely part."
The basic idea behind the patch was reasonable, but there were minor
nits (double indenting, etc.) and the RCU read lock unlock/re-lock in
avc_compute_av() began to look even more ugly. This patch builds on
Linus' first effort by cleaning things up a bit and removing the RCU
unlock/lock dance in avc_compute_av().
Removing the RCU lock dance in avc_compute_av() is safe as there are
currently two callers of avc_compute_av(): avc_has_perm_noaudit() and
avc_has_extended_perms(). The first caller in avc_has_perm_noaudit()
does not require a RCU lock as there is no avc_node to protect so the
RCU lock can be dropped before calling avc_compute_av(). The second
caller, avc_has_extended_perms(), is similar in that there is no
avc_node that requires RCU protection, but the code is simplified by
holding the RCU look around the avc_compute_av() call, and given that
we enter a RCU critical section in security_compute_av() (called from
av_compute_av()) the impact will likely be unnoticeable. It is also
worth noting that avc_has_extended_perms() is only called from the
SELinux ioctl() access control hook at the moment.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter". These filters provide users
with finer-grained control over DAMOS's actions. SeongJae has also done
some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series "mm:
support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with his
series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings. The previous BPF-based approach had
shortcomings. See "mm: In-kernel support for memory-deny-write-execute
(MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a per-node
basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage during
compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in ths
series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's series
"mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
"fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest of
the kernel in the series "Simplify the external interface for GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the series
"mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
=MlGs
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X
bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()")
which does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter".
These filters provide users with finer-grained control over DAMOS's
actions. SeongJae has also done some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series
"mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
swap PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with
his series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings.
The previous BPF-based approach had shortcomings. See "mm: In-kernel
support for memory-deny-write-execute (MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a
per-node basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage
during compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in
ths series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier
functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's
series "mm, arch: add generic implementation of pfn_valid() for
FLATMEM" and "fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest
of the kernel in the series "Simplify the external interface for
GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the
series "mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
include/linux/migrate.h: remove unneeded externs
mm/memory_hotplug: cleanup return value handing in do_migrate_range()
mm/uffd: fix comment in handling pte markers
mm: change to return bool for isolate_movable_page()
mm: hugetlb: change to return bool for isolate_hugetlb()
mm: change to return bool for isolate_lru_page()
mm: change to return bool for folio_isolate_lru()
objtool: add UACCESS exceptions for __tsan_volatile_read/write
kmsan: disable ftrace in kmsan core code
kasan: mark addr_has_metadata __always_inline
mm: memcontrol: rename memcg_kmem_enabled()
sh: initialize max_mapnr
m68k/nommu: add missing definition of ARCH_PFN_OFFSET
mm: percpu: fix incorrect size in pcpu_obj_full_size()
maple_tree: reduce stack usage with gcc-9 and earlier
mm: page_alloc: call panic() when memoryless node allocation fails
mm: multi-gen LRU: avoid futile retries
migrate_pages: move THP/hugetlb migration support check to simplify code
migrate_pages: batch flushing TLB
migrate_pages: share more code between _unmap and _move
...
Replace direct modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.
[akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo]
Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmOXmxkUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMPXg//cxfYC8lRtVpuGNCZWDietSiHzpzu
+qFntaTplvybJMQX0HfgNee5cTBZM+W5mp1BHRcZInvV5LRhyrVtgsxDBifutE4x
LyUJAw5SkiPdRC+XLDIRLKiZCobFBLVs2zO+qibIqsyR60pFjU6WXBLbJfidXBFR
yWudDbLU0YhQJCHdNHNqnHCgqrEculxn6q3QPvm/DX0xzBwkFHSSYBkGNvHW2ZTA
lKNreEOwEk5DTLIKjP4bJ72ixp0xbshw5CXuxtwB/12/4h8QbWbJVQLlIeZrTLmp
zQXQLJ3pCqKJ2OUCgMDK+wmkvLezd80BV3Due7KX0pT0YRDygoh5QEpZ5/8k8eG7
prxToh2gJWk2htfJF6kgMpAh9Jqewcke4BysbYVM/427OPZYwQqLDZDGOzbtT6pl
FYF+adN9wwkAErnHnPlzYipUEpBWurbjtsV8KFWNERoZ4YmzfSPEisRqGIHDGRws
bTyq/7qs5FXkb1zULELj8V+S2ULsmxPqsxJ63p9di54Uo9lHK0I+0IUtajGDdfze
psAasa9DD/oH2PAbSmpQ5Xo9XyfHRXsVuz1twEmEA14ML0m4wHbNWVHaK0aaXVdG
kJKSDSjMsiV+GiwNo7ISJ4pVdUpnMI/iZSghFfV28cJslNhJDeaREHaE/Wtn1/xF
/bCVmEfS16UoJsQ=
=klFk
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Improve the error handling in the device cgroup such that memory
allocation failures when updating the access policy do not
potentially alter the policy.
- Some minor fixes to reiserfs to ensure that it properly releases
LSM-related xattr values.
- Update the security_socket_getpeersec_stream() LSM hook to take
sockptr_t values.
Previously the net/BPF folks updated the getsockopt code in the
network stack to leverage the sockptr_t type to make it easier to
pass both kernel and __user pointers, but unfortunately when they did
so they didn't convert the LSM hook.
While there was/is no immediate risk by not converting the LSM hook,
it seems like this is a mistake waiting to happen so this patch
proactively does the LSM hook conversion.
- Convert vfs_getxattr_alloc() to return an int instead of a ssize_t
and cleanup the callers. Internally the function was never going to
return anything larger than an int and the callers were doing some
very odd things casting the return value; this patch fixes all that
and helps bring a bit of sanity to vfs_getxattr_alloc() and its
callers.
- More verbose, and helpful, LSM debug output when the system is booted
with "lsm.debug" on the command line. There are examples in the
commit description, but the quick summary is that this patch provides
better information about which LSMs are enabled and the ordering in
which they are processed.
- General comment and kernel-doc fixes and cleanups.
* tag 'lsm-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: Fix description of fs_context_parse_param
lsm: Add/fix return values in lsm_hooks.h and fix formatting
lsm: Clarify documentation of vm_enough_memory hook
reiserfs: Add missing calls to reiserfs_security_free()
lsm,fs: fix vfs_getxattr_alloc() return type and caller error paths
device_cgroup: Roll back to original exceptions after copy failure
LSM: Better reporting of actual LSMs at boot
lsm: make security_socket_getpeersec_stream() sockptr_t safe
audit: Fix some kernel-doc warnings
lsm: remove obsoleted comments for security hooks
fs: edit a comment made in bad taste
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmOXmvkUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNP8BAA0jhzbzMXynz7es7dQTdE2J22umMe
CzGoNxyMAPEYRPlTZmqqwSUaDPhtt4Z0MDkAG1Fn46qn3W8b0L31Z5kXTpanl+1P
ZMP2WRCiuBS8V90XrMhQ9qvUjnIJwe/RRbwiyaSBxRUrN4MU6RA/q9suyYu/aKvo
sueRJJtJgcwb8fGpKbaoGU4NiSeCCzabT7E+ofPYt4joCAdbLzokszbWrqEYInh/
yb6V03Mad/wl7jz3BwSwY+cVdEuJV+mDcfIg1yB7O9pr/H8HpIcXvYIyEICrVdGw
nstkI76w22HcbHkWWbLWNAdPRUcMRA8Bf3GAXuhV+8gr2g8bt5ePEXsqkc1Oh75z
o59TaBwCGxsE6qffBcytdBueqaf+CFWXv0kTIRGS9SMMCe6r3y8UIYxzdebOEB3v
uJVWOUZTI3FqFdHl6v9I2d1R5FQurh2yX01JIe5vk2I5Oswy8hHVvDFxnJ5AEeUW
Mcl/zV2lGgdfLrxQ+qideiTx/d71Dw/BExlyaFP8b1/ccX0X6vnOtvt6z3vw4KsR
QDffPbFZhtApJuHBf05iYMXaUS41RU55sAaDtFh94eWRD5EZ9298qGpP6+weJvlz
ofBvKaZswQj6ZdymoZB+A+vbwUKItp2ApijyLbOMtaP1RNY1/47aO0kQkmPRuHe7
5+cKG8cjyrruZXw=
=4AGR
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"Two SELinux patches: one increases the sleep time on deprecated
functionality, and one removes the indirect calls in the sidtab
context conversion code"
* tag 'selinux-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: remove the sidtab context conversion indirect calls
selinux: increase the deprecation sleep for checkreqprot and runtime disable
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY5bwTgAKCRCRxhvAZXjc
ovd2AQCK00NAtGjQCjQPQGyTa4GAPqvWgq1ef0lnhv+TL5US5gD9FncQ8UofeMXt
pBfjtAD6ettTPCTxUQfnTwWEU4rc7Qg=
=27Wm
-----END PGP SIGNATURE-----
Merge tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull VFS acl updates from Christian Brauner:
"This contains the work that builds a dedicated vfs posix acl api.
The origins of this work trace back to v5.19 but it took quite a while
to understand the various filesystem specific implementations in
sufficient detail and also come up with an acceptable solution.
As we discussed and seen multiple times the current state of how posix
acls are handled isn't nice and comes with a lot of problems: The
current way of handling posix acls via the generic xattr api is error
prone, hard to maintain, and type unsafe for the vfs until we call
into the filesystem's dedicated get and set inode operations.
It is already the case that posix acls are special-cased to death all
the way through the vfs. There are an uncounted number of hacks that
operate on the uapi posix acl struct instead of the dedicated vfs
struct posix_acl. And the vfs must be involved in order to interpret
and fixup posix acls before storing them to the backing store, caching
them, reporting them to userspace, or for permission checking.
Currently a range of hacks and duct tape exist to make this work. As
with most things this is really no ones fault it's just something that
happened over time. But the code is hard to understand and difficult
to maintain and one is constantly at risk of introducing bugs and
regressions when having to touch it.
Instead of continuing to hack posix acls through the xattr handlers
this series builds a dedicated posix acl api solely around the get and
set inode operations.
Going forward, the vfs_get_acl(), vfs_remove_acl(), and vfs_set_acl()
helpers must be used in order to interact with posix acls. They
operate directly on the vfs internal struct posix_acl instead of
abusing the uapi posix acl struct as we currently do. In the end this
removes all of the hackiness, makes the codepaths easier to maintain,
and gets us type safety.
This series passes the LTP and xfstests suites without any
regressions. For xfstests the following combinations were tested:
- xfs
- ext4
- btrfs
- overlayfs
- overlayfs on top of idmapped mounts
- orangefs
- (limited) cifs
There's more simplifications for posix acls that we can make in the
future if the basic api has made it.
A few implementation details:
- The series makes sure to retain exactly the same security and
integrity module permission checks. Especially for the integrity
modules this api is a win because right now they convert the uapi
posix acl struct passed to them via a void pointer into the vfs
struct posix_acl format to perform permission checking on the mode.
There's a new dedicated security hook for setting posix acls which
passes the vfs struct posix_acl not a void pointer. Basing checking
on the posix acl stored in the uapi format is really unreliable.
The vfs currently hacks around directly in the uapi struct storing
values that frankly the security and integrity modules can't
correctly interpret as evidenced by bugs we reported and fixed in
this area. It's not necessarily even their fault it's just that the
format we provide to them is sub optimal.
- Some filesystems like 9p and cifs need access to the dentry in
order to get and set posix acls which is why they either only
partially or not even at all implement get and set inode
operations. For example, cifs allows setxattr() and getxattr()
operations but doesn't allow permission checking based on posix
acls because it can't implement a get acl inode operation.
Thus, this patch series updates the set acl inode operation to take
a dentry instead of an inode argument. However, for the get acl
inode operation we can't do this as the old get acl method is
called in e.g., generic_permission() and inode_permission(). These
helpers in turn are called in various filesystem's permission inode
operation. So passing a dentry argument to the old get acl inode
operation would amount to passing a dentry to the permission inode
operation which we shouldn't and probably can't do.
So instead of extending the existing inode operation Christoph
suggested to add a new one. He also requested to ensure that the
get and set acl inode operation taking a dentry are consistently
named. So for this version the old get acl operation is renamed to
->get_inode_acl() and a new ->get_acl() inode operation taking a
dentry is added. With this we can give both 9p and cifs get and set
acl inode operations and in turn remove their complex custom posix
xattr handlers.
In the future I hope to get rid of the inode method duplication but
it isn't like we have never had this situation. Readdir is just one
example. And frankly, the overall gain in type safety and the more
pleasant api wise are simply too big of a benefit to not accept
this duplication for a while.
- We've done a full audit of every codepaths using variant of the
current generic xattr api to get and set posix acls and
surprisingly it isn't that many places. There's of course always a
chance that we might have missed some and if so I'm sure we'll find
them soon enough.
The crucial codepaths to be converted are obviously stacking
filesystems such as ecryptfs and overlayfs.
For a list of all callers currently using generic xattr api helpers
see [2] including comments whether they support posix acls or not.
- The old vfs generic posix acl infrastructure doesn't obey the
create and replace semantics promised on the setxattr(2) manpage.
This patch series doesn't address this. It really is something we
should revisit later though.
The patches are roughly organized as follows:
(1) Change existing set acl inode operation to take a dentry
argument (Intended to be a non-functional change)
(2) Rename existing get acl method (Intended to be a non-functional
change)
(3) Implement get and set acl inode operations for filesystems that
couldn't implement one before because of the missing dentry.
That's mostly 9p and cifs (Intended to be a non-functional
change)
(4) Build posix acl api, i.e., add vfs_get_acl(), vfs_remove_acl(),
and vfs_set_acl() including security and integrity hooks
(Intended to be a non-functional change)
(5) Implement get and set acl inode operations for stacking
filesystems (Intended to be a non-functional change)
(6) Switch posix acl handling in stacking filesystems to new posix
acl api now that all filesystems it can stack upon support it.
(7) Switch vfs to new posix acl api (semantical change)
(8) Remove all now unused helpers
(9) Additional regression fixes reported after we merged this into
linux-next
Thanks to Seth for a lot of good discussion around this and
encouragement and input from Christoph"
* tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (36 commits)
posix_acl: Fix the type of sentinel in get_acl
orangefs: fix mode handling
ovl: call posix_acl_release() after error checking
evm: remove dead code in evm_inode_set_acl()
cifs: check whether acl is valid early
acl: make vfs_posix_acl_to_xattr() static
acl: remove a slew of now unused helpers
9p: use stub posix acl handlers
cifs: use stub posix acl handlers
ovl: use stub posix acl handlers
ecryptfs: use stub posix acl handlers
evm: remove evm_xattr_acl_change()
xattr: use posix acl api
ovl: use posix acl api
ovl: implement set acl method
ovl: implement get acl method
ecryptfs: implement set acl method
ecryptfs: implement get acl method
ksmbd: use vfs_remove_acl()
acl: add vfs_remove_acl()
...
The sidtab conversion code has support for multiple context
conversion routines through the use of function pointers and
indirect calls. However, the reality is that all current users rely
on the same conversion routine: convert_context(). This patch does
away with this extra complexity and replaces the indirect calls
with direct function calls; allowing us to remove a layer of
obfuscation and create cleaner, more maintainable code.
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit 4ff09db1b7 ("bpf: net: Change sk_getsockopt() to take the
sockptr_t argument") made it possible to call sk_getsockopt()
with both user and kernel address space buffers through the use of
the sockptr_t type. Unfortunately at the time of conversion the
security_socket_getpeersec_stream() LSM hook was written to only
accept userspace buffers, and in a desire to avoid having to change
the LSM hook the commit author simply passed the sockptr_t's
userspace buffer pointer. Since the only sk_getsockopt() callers
at the time of conversion which used kernel sockptr_t buffers did
not allow SO_PEERSEC, and hence the
security_socket_getpeersec_stream() hook, this was acceptable but
also very fragile as future changes presented the possibility of
silently passing kernel space pointers to the LSM hook.
There are several ways to protect against this, including careful
code review of future commits, but since relying on code review to
catch bugs is a recipe for disaster and the upstream eBPF maintainer
is "strongly against defensive programming", this patch updates the
LSM hook, and all of the implementations to support sockptr_t and
safely handle both user and kernel space buffers.
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
So far posix acls were passed as a void blob to the security and
integrity modules. Some of them like evm then proceed to interpret the
void pointer and convert it into the kernel internal struct posix acl
representation to perform their integrity checking magic. This is
obviously pretty problematic as that requires knowledge that only the
vfs is guaranteed to have and has lead to various bugs. Add a proper
security hook for setting posix acls and pass down the posix acls in
their appropriate vfs format instead of hacking it through a void
pointer stored in the uapi format.
I spent considerate time in the security module infrastructure and
audited all codepaths. SELinux has no restrictions based on the posix
acl values passed through it. The capability hook doesn't need to be
called either because it only has restrictions on security.* xattrs. So
these are all fairly simply hooks for SELinux.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
The following warning was triggered on a hardware environment:
SELinux: Converting 162 SID table entries...
BUG: sleeping function called from invalid context at
__might_sleep+0x60/0x74 0x0
in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 5943, name: tar
CPU: 7 PID: 5943 Comm: tar Tainted: P O 5.10.0 #1
Call trace:
dump_backtrace+0x0/0x1c8
show_stack+0x18/0x28
dump_stack+0xe8/0x15c
___might_sleep+0x168/0x17c
__might_sleep+0x60/0x74
__kmalloc_track_caller+0xa0/0x7dc
kstrdup+0x54/0xac
convert_context+0x48/0x2e4
sidtab_context_to_sid+0x1c4/0x36c
security_context_to_sid_core+0x168/0x238
security_context_to_sid_default+0x14/0x24
inode_doinit_use_xattr+0x164/0x1e4
inode_doinit_with_dentry+0x1c0/0x488
selinux_d_instantiate+0x20/0x34
security_d_instantiate+0x70/0xbc
d_splice_alias+0x4c/0x3c0
ext4_lookup+0x1d8/0x200 [ext4]
__lookup_slow+0x12c/0x1e4
walk_component+0x100/0x200
path_lookupat+0x88/0x118
filename_lookup+0x98/0x130
user_path_at_empty+0x48/0x60
vfs_statx+0x84/0x140
vfs_fstatat+0x20/0x30
__se_sys_newfstatat+0x30/0x74
__arm64_sys_newfstatat+0x1c/0x2c
el0_svc_common.constprop.0+0x100/0x184
do_el0_svc+0x1c/0x2c
el0_svc+0x20/0x34
el0_sync_handler+0x80/0x17c
el0_sync+0x13c/0x140
SELinux: Context system_u:object_r:pssp_rsyslog_log_t:s0:c0 is
not valid (left unmapped).
It was found that within a critical section of spin_lock_irqsave in
sidtab_context_to_sid(), convert_context() (hooked by
sidtab_convert_params.func) might cause the process to sleep via
allocating memory with GFP_KERNEL, which is problematic.
As Ondrej pointed out [1], convert_context()/sidtab_convert_params.func
has another caller sidtab_convert_tree(), which is okay with GFP_KERNEL.
Therefore, fix this problem by adding a gfp_t argument for
convert_context()/sidtab_convert_params.func and pass GFP_KERNEL/_ATOMIC
properly in individual callers.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20221018120111.1474581-1-gongruiqi1@huawei.com/ [1]
Reported-by: Tan Ninghao <tanninghao1@huawei.com>
Fixes: ee1a84fdfe ("selinux: overhaul sidtab to fix bug and improve performance")
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: wrap long BUG() output lines, tweak subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Further the checkreqprot and runtime disable deprecation efforts by
increasing the sleep time from 5 to 15 seconds to help make this more
noticeable for any users who are still using these knobs.
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmM68YIUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOTbA//TR8i+Wy8iswUCmtfmYg91h1uebpl
/kjNsSmfgivAUTGamr3eN2WRlGhZfkFDPIHa25uybSA6Q+75p4lst83Rt3HDbjkv
Ga7grCXnHwSDwJoHOSeFh0pojV2u7Zvfmiib2U5hPZEmd3kBw3NCgAJVcSGN80B2
dct36fzZNXjvpWDbygmFtRRkmEseslSkft8bUVvNZBP+B0zvv3vcNY1QFuKuK+W2
8wWpvO/cCSmke5i2c2ktHSk2f8/Y6n26Ik/OTHcTVfoKZLRaFbXEzLyxzLrNWd6m
hujXgcxszTtHdmoXx+J6uBauju7TR8pi1x8mO2LSGrlpRc1cX0A5ED8WcH71+HVE
8L1fIOmZShccPZn8xRok7oYycAUm/gIfpmSLzmZA76JsZYAe+mp9Ze9FA6fZtSwp
7Q/rfw/Rlz25WcFBe4xypP078HkOmqutkCk2zy5liR+cWGrgy/WKX15vyC0TaPrX
tbsRKuCLkipgfXrTk0dX3kmhz+3bJYjqeZEt7sfPSZYpaOGkNXVmAW0wnCOTuLMU
+8pIVktvQxMmACEj2gBMz11iooR4DpWLxOcQQR/impgCpNdZ60nA0a6KPJoIXC+5
NfTa422FZkc99QRVblUZyWSgJBW78Z3ZAQcQlo1AGLlFydbfrSFTRLbmNJZo/Nkl
KwpGvWs5nB0rVw0=
=VZl5
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:
"Seven patches for the LSM layer and we've got a mix of trivial and
significant patches. Highlights below, starting with the smaller bits
first so they don't get lost in the discussion of the larger items:
- Remove some redundant NULL pointer checks in the common LSM audit
code.
- Ratelimit the lockdown LSM's access denial messages.
With this change there is a chance that the last visible lockdown
message on the console is outdated/old, but it does help preserve
the initial series of lockdown denials that started the denial
message flood and my gut feeling is that these might be the more
valuable messages.
- Open userfaultfds as readonly instead of read/write.
While this code obviously lives outside the LSM, it does have a
noticeable impact on the LSMs with Ondrej explaining the situation
in the commit description. It is worth noting that this patch
languished on the VFS list for over a year without any comments
(objections or otherwise) so I took the liberty of pulling it into
the LSM tree after giving fair notice. It has been in linux-next
since the end of August without any noticeable problems.
- Add a LSM hook for user namespace creation, with implementations
for both the BPF LSM and SELinux.
Even though the changes are fairly small, this is the bulk of the
diffstat as we are also including BPF LSM selftests for the new
hook.
It's also the most contentious of the changes in this pull request
with Eric Biederman NACK'ing the LSM hook multiple times during its
development and discussion upstream. While I've never taken NACK's
lightly, I'm sending these patches to you because it is my belief
that they are of good quality, satisfy a long-standing need of
users and distros, and are in keeping with the existing nature of
the LSM layer and the Linux Kernel as a whole.
The patches in implement a LSM hook for user namespace creation
that allows for a granular approach, configurable at runtime, which
enables both monitoring and control of user namespaces. The general
consensus has been that this is far preferable to the other
solutions that have been adopted downstream including outright
removal from the kernel, disabling via system wide sysctls, or
various other out-of-tree mechanisms that users have been forced to
adopt since we haven't been able to provide them an upstream
solution for their requests. Eric has been steadfast in his
objections to this LSM hook, explaining that any restrictions on
the user namespace could have significant impact on userspace.
While there is the possibility of impacting userspace, it is
important to note that this solution only impacts userspace when it
is requested based on the runtime configuration supplied by the
distro/admin/user. Frederick (the pathset author), the LSM/security
community, and myself have tried to work with Eric during
development of this patchset to find a mutually acceptable
solution, but Eric's approach and unwillingness to engage in a
meaningful way have made this impossible. I have CC'd Eric directly
on this pull request so he has a chance to provide his side of the
story; there have been no objections outside of Eric's"
* tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lockdown: ratelimit denial messages
userfaultfd: open userfaultfds with O_RDONLY
selinux: Implement userns_create hook
selftests/bpf: Add tests verifying bpf lsm userns_create hook
bpf-lsm: Make bpf_lsm_userns_create() sleepable
security, lsm: Introduce security_create_user_ns()
lsm: clean up redundant NULL pointer check
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmM68ZsUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOAtRAAw/lcyPoyN8ia6+PPihRtAKGUFIf5
+IdEPYfCqkGghqB7BRDl5bXOLFgpY/m/41g+xFvzJ0fhVPLa7UWB//N7yTu3OnW/
vXz1wn0EJAeDlLbPzWd6V/SpcxJ1WPzjHj2B3YXNWnukfMjCnPIA8XlZc18zAWS1
/OOEBoOo/a/8Giw2l1bEXxfmDI20NrXNL3vWKQ+Bbhg2PJaH/FTk4DNxopt84o28
vA+cbfQcOOjeRjBuncnTp9/b244ojeM+lRSJZozGTogFIeDUp3KW1D7NHqNwyX12
seDooqLEP25vP+kQh8zH7gvacpoeDLz40bSpd+MKKj02IxKGikykWuvtlFWY3xNB
o1mT4SJhh3JcewS7gh6P5aESSSgLg9zb3zMGtjHhtz+HHi/Sq7PK7xJgrnKOBNgu
CLIu3L+5vJpAgrsze2tIcwRUySIzDKnfgw8Oz7zaS2lOTJ58emz00QwEioHMQufK
8gZXTvZykJAtLF19PJw+mHKu38hbdD/4vt8AFuIgJzFkjWKzaZAxUBT+3p/uaLHG
2PegjKzpCqH9vZ/HCdYI42OB8TKiPU3eBtYZ2eP3h7cdDu++tp1rf0hwHQrwE2AD
PRuoCaBYOTUedbR8CV07fSSGFnZvlPnuk9yB7/eztV2thBQG28ALGxVhWadn4ap/
UIFgCs5QDRj11u8=
=BQ+i
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux updates from Paul Moore:
"Six SELinux patches, all are simple and easily understood, but a list
of the highlights is below:
- Use 'grep -E' instead of 'egrep' in the SELinux policy install
script.
Fun fact, this seems to be GregKH's *second* dedicated SELinux
patch since we transitioned to git (ignoring merges, the SPDX
stuff, and a trivial fs reference removal when lustre was yanked);
the first was back in 2011 when selinuxfs was placed in
/sys/fs/selinux. Oh, the memories ...
- Convert the SELinux policy boolean values to use signed integer
types throughout the SELinux kernel code.
Prior to this we were using a mix of signed and unsigned integers
which was probably okay in this particular case, but it is
definitely not a good idea in general.
- Remove a reference to the SELinux runtime disable functionality in
/etc/selinux/config as we are in the process of deprecating that.
See [1] for more background on this if you missed the previous
notes on the deprecation.
- Minor cleanups: remove unneeded variables and function parameter
constification"
Link: https://github.com/SELinuxProject/selinux-kernel/wiki/DEPRECATE-runtime-disable [1]
* tag 'selinux-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: remove runtime disable message in the install_policy.sh script
selinux: use "grep -E" instead of "egrep"
selinux: remove the unneeded result variable
selinux: declare read-only parameters const
selinux: use int arrays for boolean values
selinux: remove an unneeded variable in sel_make_class_dir_entries()
Return the value avc_has_perm() directly instead of storing it in
another redundant variable.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Xu Panda <xu.panda@zte.com.cn>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
cast of ->d_name.name to char * is completely wrong - nothing is
allowed to modify its contents.
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Declare ebitmap, mls_level and mls_context parameters const where they
are only read from. This allows callers to supply pointers to const
as arguments and increases readability.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Do not cast pointers of signed integers to pointers of unsigned integers
and vice versa.
It should currently not be an issue since they hold SELinux boolean
values which should only contain either 0's or 1's, which should have
the same representation.
Reported by sparse:
.../selinuxfs.c:1485:30: warning: incorrect type in assignment
(different signedness)
.../selinuxfs.c:1485:30: expected unsigned int *
.../selinuxfs.c:1485:30: got int *[addressable] values
.../selinuxfs.c:1402:48: warning: incorrect type in argument 3
(different signedness)
.../selinuxfs.c:1402:48: expected int *values
.../selinuxfs.c:1402:48: got unsigned int *bool_pending_values
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: minor whitespace fixes, sparse output cleanup]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Return the value sel_make_perm_files() directly instead of storing it
in another redundant variable.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add a SELinux access control for the iouring IORING_OP_URING_CMD
command. This includes the addition of a new permission in the
existing "io_uring" object class: "cmd". The subject of the new
permission check is the domain of the process requesting access, the
object is the open file which points to the device/file that is the
target of the IORING_OP_URING_CMD operation. A sample policy rule
is shown below:
allow <domain> <file>:io_uring { cmd };
Cc: stable@vger.kernel.org
Fixes: ee692a21e9 ("fs,io_uring: add infrastructure for uring-cmd")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Unprivileged user namespace creation is an intended feature to enable
sandboxing, however this feature is often used to as an initial step to
perform a privilege escalation attack.
This patch implements a new user_namespace { create } access control
permission to restrict which domains allow or deny user namespace
creation. This is necessary for system administrators to quickly protect
their systems while waiting for vulnerability patches to be applied.
This permission can be used in the following way:
allow domA_t domA_t : user_namespace { create };
Signed-off-by: Frederick Lawler <fred@cloudflare.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmLoEeIUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNSOhAAwWwRcmcHnk+k2agT9QjKrLo26NCO
MQLE89o4y2ChEFHxC7F7SKoQRxtfYa323p1vmlGzKrlB+IZ6oqERVp4QNQQbXsfn
n9VvVpxjRNHAetcRhCM9ZOchWjUdw6AMaJ8e3fdRNRESadAUUFDxifw1wpjgG9+i
LmtDbfZ7vLs2grTf9OZy3JIl1VF3lVRUTI7ZBQggfJncMa+LXNWdVNmEe3yfyboA
1MwpSao7K2si0hBGAQo/UGQz4b19Tm4xMg8bSy7oTsP5Lae5ciPkeI3qazvs9usp
WScZYhQ8NugqLbDbjs7dm6QCpj4x3dUs6ei48LKe3GF2mcGesFfOPo9sNHao4kKv
C9t0f9qw+EhGvnNL7uQIDDf8OuTjuLWDvZSrMLID/IJKFF5NJ3y+XzaS9aPM3VEY
qyOsX+cEzheXGhD6xE1sCo+AyPUDYqNDMIKBj2wlIGCKlzDGa8RT6VsQuvgf3c3K
43CnRCQeWDWOHCq3MnRe/fmYtW+JB7tsXiKAq4OJADacwPP36bsP3bqU8AlWYwDt
tnuMa+LKusHnMEQpMPI8FW8qGdxwGSen+mymfLFIMgtwNGkV7WGRJ6Lbyn0SaR6v
HyXgZASIOQRnamK3yZCDpxo0K81IVxPWJIjHyg53znqT5TCpXccPyV4HwbJKI/KG
8PtHrXOdPOGCZ2g=
=WWq1
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"A relatively small set of patches for SELinux this time, eight patches
in total with really only one significant change.
The highlights are:
- Add support for proper labeling of memfd_secret anonymous inodes.
This will allow LSMs that implement the anonymous inode hooks to
apply security policy to memfd_secret() fds.
- Various small improvements to memory management: fixed leaks, freed
memory when needed, boundary checks.
- Hardened the selinux_audit_data struct with __randomize_layout.
- A minor documentation tweak to fix a formatting/style issue"
* tag 'selinux-pr-20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: selinux_add_opt() callers free memory
selinux: Add boundary check in put_entry()
selinux: fix memleak in security_read_state_kernel()
docs: selinux: add '=' signs to kernel boot options
mm: create security context for memfd_secret inodes
selinux: fix typos in comments
selinux: drop unnecessary NULL check
selinux: add __randomize_layout to selinux_audit_data
The selinux_add_opt() function may need to allocate memory for the
mount options if none has already been allocated, but there is no
need to free that memory on error as the callers handle that. Drop
the existing kfree() on error to help increase consistency in the
selinux_add_opt() error handling.
This patch also changes selinux_add_opt() to return -EINVAL when
the mount option value, @s, is NULL. It currently return -ENOMEM.
Link: https://lore.kernel.org/lkml/20220611090550.135674-1-xiujianfeng@huawei.com/T/
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
[PM: fix subject, rework commit description language]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Just like next_entry(), boundary check is necessary to prevent memory
out-of-bound access.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
In this function, it directly returns the result of __security_read_policy
without freeing the allocated memory in *data, cause memory leak issue,
so free the memory if __security_read_policy failed.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Commit e3489f8974 ("selinux: kill selinux_sb_get_mnt_opts()")
introduced a NULL check on the context after a successful call to
security_sid_to_context(). This is on the one hand redundant after
checking for success and on the other hand insufficient on an actual
NULL pointer, since the context is passed to seq_escape() leading to a
call of strlen() on it.
Reported by Clang analyzer:
In file included from security/selinux/hooks.c:28:
In file included from ./include/linux/tracehook.h:50:
In file included from ./include/linux/memcontrol.h:13:
In file included from ./include/linux/cgroup.h:18:
./include/linux/seq_file.h:136:25: warning: Null pointer passed as 1st argument to string length function [unix.cstring.NullArg]
seq_escape_mem(m, src, strlen(src), flags, esc);
^~~~~~~~~~~
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Randomize the layout of struct selinux_audit_data as suggested in [1],
since it contains a pointer to struct selinux_state, an already
randomized strucure.
[1]: https://github.com/KSPP/linux/issues/188
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmKLj4oUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNIoA//c2Fbgr3tTs6yCWAJk+mQcVwD1eq5
F2f3ild8qpSH15aYZkQPapJ0Ep1W4EDuf/AbRbfVB4t+tknrxtR8IAtiUYOPDlfW
eK85ENj5b+Hc6mPPHE8On0kc6oNySYeHXHGZ84c4DxRwjXolnHQTOIHb7pMKTGyU
cq6oqsgkpou88rnzJg/eiFkf/Yk2h0oS8jDQcu2OVaeNoBaVg5oAau01HES1IMzB
gqiEi0WXQII9lQX2qRLCPiPuHwA//PoMmx342JiIFcrOrprBCYiQ5yNWYR+VKuGP
WH85etJOeWh9kqsvRVSMs/y3L+RPFoydwLXsud0lIappbad53KJDq53oDco7PTY/
lhrhgSEipwc18QFZzIj7+h2R53k5YQYWFk5dC1nKfkVLd/sAqAcLPfbyOmeSQ097
/DbzUouiP8zq7WHpPw6dikVeT5wBqBjEcwoCZSjctXi4vDSWNWt6OBunx7bwOhbr
IfKESEDJhyG2xtmyYgEpDFXTn4d2SuxspPRmdYDOlvgLLH037+cXm/8TmzoMNiQ3
Xs6/vpzFmh+r+0Astzt+MisQrWDGNF9XQqVz4UrXkSXTqtkXO28/4ZCh0NE2squu
6zXf2KX79HxMos8OELvBV73U6yIEoK18qsygYgHwT+iB+YOMZvwZMpyl35JZWnAK
fxVu54GrcQNjCQs=
=1ZFj
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20220523' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"We've got twelve patches queued for v5.19, with most being fairly
minor. The highlights are below:
- The checkreqprot and runtime disable knobs have been deprecated for
some time with no active users that we can find. In an effort to
move things along we are adding a pause when the knobs are used to
help make the deprecation more noticeable in case anyone is still
using these hacks in the shadows.
- We've added the anonymous inode class name to the AVC audit records
when anonymous inodes are involved. This should make writing policy
easier when anonymous inodes are involved.
- More constification work. This is fairly straightforward and the
source of most of the diffstat.
- The usual minor cleanups: remove unnecessary assignments, assorted
style/checkpatch fixes, kdoc fixes, macro while-loop
encapsulations, #include tweaks, etc"
* tag 'selinux-pr-20220523' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
security: declare member holding string literal const
selinux: log anon inode class name
selinux: declare data arrays const
selinux: fix indentation level of mls_ops block
selinux: include necessary headers in headers
selinux: avoid extra semicolon
selinux: update parameter documentation
selinux: resolve checkpatch errors
selinux: don't sleep when CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE is true
selinux: checkreqprot is deprecated, add some ssleep() discomfort
selinux: runtime disable is deprecated, add some ssleep() discomfort
selinux: Remove redundant assignments
The code attempts to free the 'new' pointer using kmem_cache_free(),
which is wrong because this function isn't responsible of freeing it.
Instead, the function should free new->htable and clear the contents of
*new (to prevent double-free).
Cc: stable@vger.kernel.org
Fixes: c7c556f1e8 ("selinux: refactor changing booleans")
Reported-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Log the anonymous inode class name in the security hook
inode_init_security_anon. This name is the key for name based type
transitions on the anon_inode security class on creation. Example:
type=AVC msg=audit(02/16/22 22:02:50.585:216) : avc: granted \
{ create } for pid=2136 comm=mariadbd anonclass=[io_uring] \
scontext=system_u:system_r:mysqld_t:s0 \
tcontext=system_u:system_r:mysqld_iouring_t:s0 tclass=anon_inode
Add a new LSM audit data type holding the inode and the class name.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: adjusted 'anonclass' to be a trusted string, cgzones approved]
Signed-off-by: Paul Moore <paul@paul-moore.com>
The arrays for the policy capability names, the initial sid identifiers
and the class and permission names are not changed at runtime. Declare
them const to avoid accidental modification.
Do not override the classmap and the initial sid list in the build time
script genheaders.
Check flose(3) is successful in genheaders.c, otherwise the written data
might be corrupted or incomplete.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: manual merge due to fuzz, minor style tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Add one level of indentation to the code block of the label mls_ops in
constraint_expr_eval(), to adjust the trailing break; to the parent
case: branch.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Include header files required for struct or typedef declarations in
header files. This is for example helpful when working with an IDE, which
needs to resolve those symbols.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Wrap macro into `do { } while (0)` to avoid Clang emitting warnings
about extra semicolons.
Similar to userspace commit
9d85aa60d1
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: whitespace/indenting tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
security/selinux/include/audit.h:54: warning: Function parameter or member 'krule' not described in 'selinux_audit_rule_known'
security/selinux/include/audit.h:54: warning: Excess function parameter 'rule' description in 'selinux_audit_rule_known'
security/selinux/include/avc.h:130: warning: Function parameter or member 'state' not described in 'avc_audit'
This also bring the parameter name of selinux_audit_rule_known() in sync
between declaration and definition.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Reported by checkpatch:
security/selinux/nlmsgtab.c
---------------------------
ERROR: that open brace { should be on the previous line
#29: FILE: security/selinux/nlmsgtab.c:29:
+static const struct nlmsg_perm nlmsg_route_perms[] =
+{
ERROR: that open brace { should be on the previous line
#97: FILE: security/selinux/nlmsgtab.c:97:
+static const struct nlmsg_perm nlmsg_tcpdiag_perms[] =
+{
ERROR: that open brace { should be on the previous line
#105: FILE: security/selinux/nlmsgtab.c:105:
+static const struct nlmsg_perm nlmsg_xfrm_perms[] =
+{
ERROR: that open brace { should be on the previous line
#134: FILE: security/selinux/nlmsgtab.c:134:
+static const struct nlmsg_perm nlmsg_audit_perms[] =
+{
security/selinux/ss/policydb.c
------------------------------
ERROR: that open brace { should be on the previous line
#318: FILE: security/selinux/ss/policydb.c:318:
+static int (*destroy_f[SYM_NUM]) (void *key, void *datum, void *datap) =
+{
ERROR: that open brace { should be on the previous line
#674: FILE: security/selinux/ss/policydb.c:674:
+static int (*index_f[SYM_NUM]) (void *key, void *datum, void *datap) =
+{
ERROR: that open brace { should be on the previous line
#1643: FILE: security/selinux/ss/policydb.c:1643:
+static int (*read_f[SYM_NUM]) (struct policydb *p, struct symtab *s, void *fp) =
+{
ERROR: that open brace { should be on the previous line
#3246: FILE: security/selinux/ss/policydb.c:3246:
+ void *datap) =
+{
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Unfortunately commit 81200b0265 ("selinux: checkreqprot is
deprecated, add some ssleep() discomfort") added a five second sleep
during early kernel boot, e.g. start_kernel(), which could cause a
"scheduling while atomic" panic. This patch fixes this problem by
moving the sleep out of checkreqprot_set() and into
sel_write_checkreqprot() so that we only sleep when the checkreqprot
setting is set during runtime, after the kernel has booted. The
error message remains the same in both cases.
Fixes: 81200b0265 ("selinux: checkreqprot is deprecated, add some ssleep() discomfort")
Reported-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The checkreqprot functionality was disabled by default back in
Linux v4.4 (2015) with commit 2a35d196c1 ("selinux: change
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default") and it was
officially marked as deprecated in Linux v5.7. It was always a
bit of a hack to workaround very old userspace and to the best of
our knowledge, the checkreqprot functionality has been disabled by
Linux distributions for quite some time.
This patch moves the deprecation messages from KERN_WARNING to
KERN_ERR and adds a five second sleep to anyone using it to help
draw their attention to the deprecation and provide a URL which
helps explain things in more detail.
Signed-off-by: Paul Moore <paul@paul-moore.com>
We deprecated the SELinux runtime disable functionality in Linux
v5.6, and it is time to get a bit more serious about removing it.
Add a five second sleep to anyone using it to help draw their
attention to the deprecation and provide a URL which helps explain
things in more detail, including how to add kernel command line
parameters to some of the more popular Linux distributions.
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Get rid of redundant assignments which end up in values not being
read either because they are overwritten or the function ends.
Reported by clang-tidy [deadcode.DeadStores]
Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was
around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled
making the semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where now anything left in tracehook.h is
some weird strange thing that is difficult to understand.
Eric W. Biederman (15):
ptrace: Move ptrace_report_syscall into ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Remove tracehook_signal_handler
task_work: Remove unnecessary include from posix_timers.h
task_work: Introduce task_work_pending
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
resume_user_mode: Move to resume_user_mode.h
tracehook: Remove tracehook.h
ptrace: Move setting/clearing ptrace_message into ptrace_stop
ptrace: Return the signal to continue with from ptrace_stop
Jann Horn (1):
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
Yang Li (1):
ptrace: Remove duplicated include in ptrace.c
MAINTAINERS | 1 -
arch/Kconfig | 5 +-
arch/alpha/kernel/ptrace.c | 5 +-
arch/alpha/kernel/signal.c | 4 +-
arch/arc/kernel/ptrace.c | 5 +-
arch/arc/kernel/signal.c | 4 +-
arch/arm/kernel/ptrace.c | 12 +-
arch/arm/kernel/signal.c | 4 +-
arch/arm64/kernel/ptrace.c | 14 +--
arch/arm64/kernel/signal.c | 4 +-
arch/csky/kernel/ptrace.c | 5 +-
arch/csky/kernel/signal.c | 4 +-
arch/h8300/kernel/ptrace.c | 5 +-
arch/h8300/kernel/signal.c | 4 +-
arch/hexagon/kernel/process.c | 4 +-
arch/hexagon/kernel/signal.c | 1 -
arch/hexagon/kernel/traps.c | 6 +-
arch/ia64/kernel/process.c | 4 +-
arch/ia64/kernel/ptrace.c | 6 +-
arch/ia64/kernel/signal.c | 1 -
arch/m68k/kernel/ptrace.c | 5 +-
arch/m68k/kernel/signal.c | 4 +-
arch/microblaze/kernel/ptrace.c | 5 +-
arch/microblaze/kernel/signal.c | 4 +-
arch/mips/kernel/ptrace.c | 5 +-
arch/mips/kernel/signal.c | 4 +-
arch/nds32/include/asm/syscall.h | 2 +-
arch/nds32/kernel/ptrace.c | 5 +-
arch/nds32/kernel/signal.c | 4 +-
arch/nios2/kernel/ptrace.c | 5 +-
arch/nios2/kernel/signal.c | 4 +-
arch/openrisc/kernel/ptrace.c | 5 +-
arch/openrisc/kernel/signal.c | 4 +-
arch/parisc/kernel/ptrace.c | 7 +-
arch/parisc/kernel/signal.c | 4 +-
arch/powerpc/kernel/ptrace/ptrace.c | 8 +-
arch/powerpc/kernel/signal.c | 4 +-
arch/riscv/kernel/ptrace.c | 5 +-
arch/riscv/kernel/signal.c | 4 +-
arch/s390/include/asm/entry-common.h | 1 -
arch/s390/kernel/ptrace.c | 1 -
arch/s390/kernel/signal.c | 5 +-
arch/sh/kernel/ptrace_32.c | 5 +-
arch/sh/kernel/signal_32.c | 4 +-
arch/sparc/kernel/ptrace_32.c | 5 +-
arch/sparc/kernel/ptrace_64.c | 5 +-
arch/sparc/kernel/signal32.c | 1 -
arch/sparc/kernel/signal_32.c | 4 +-
arch/sparc/kernel/signal_64.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/um/kernel/ptrace.c | 5 +-
arch/x86/kernel/ptrace.c | 1 -
arch/x86/kernel/signal.c | 5 +-
arch/x86/mm/tlb.c | 1 +
arch/xtensa/kernel/ptrace.c | 5 +-
arch/xtensa/kernel/signal.c | 4 +-
block/blk-cgroup.c | 2 +-
fs/coredump.c | 1 -
fs/exec.c | 1 -
fs/io-wq.c | 6 +-
fs/io_uring.c | 11 +-
fs/proc/array.c | 1 -
fs/proc/base.c | 1 -
include/asm-generic/syscall.h | 2 +-
include/linux/entry-common.h | 47 +-------
include/linux/entry-kvm.h | 2 +-
include/linux/posix-timers.h | 1 -
include/linux/ptrace.h | 81 ++++++++++++-
include/linux/resume_user_mode.h | 64 ++++++++++
include/linux/sched/signal.h | 17 +++
include/linux/task_work.h | 5 +
include/linux/tracehook.h | 226 -----------------------------------
include/uapi/linux/ptrace.h | 2 +-
kernel/entry/common.c | 19 +--
kernel/entry/kvm.c | 9 +-
kernel/exit.c | 3 +-
kernel/livepatch/transition.c | 1 -
kernel/ptrace.c | 47 +++++---
kernel/seccomp.c | 1 -
kernel/signal.c | 62 +++++-----
kernel/task_work.c | 4 +-
kernel/time/posix-cpu-timers.c | 1 +
mm/memcontrol.c | 2 +-
security/apparmor/domain.c | 1 -
security/selinux/hooks.c | 1 -
85 files changed, 372 insertions(+), 495 deletions(-)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmJCQkoACgkQC/v6Eiaj
j0DCWQ/5AZVFU+hX32obUNCLackHTwgcCtSOs3JNBmNA/zL/htPiYYG0ghkvtlDR
Dw5J5DnxC6P7PVAdAqrpvx2uX2FebHYU0bRlyLx8LYUEP5dhyNicxX9jA882Z+vw
Ud0Ue9EojwGWS76dC9YoKUj3slThMATbhA2r4GVEoof8fSNJaBxQIqath44t0FwU
DinWa+tIOvZANGBZr6CUUINNIgqBIZCH/R4h6ArBhMlJpuQ5Ufk2kAaiWFwZCkX4
0LuuAwbKsCKkF8eap5I2KrIg/7zZVgxAg9O3cHOzzm8OPbKzRnNnQClcDe8perqp
S6e/f3MgpE+eavd1EiLxevZ660cJChnmikXVVh8ZYYoefaMKGqBaBSsB38bNcLjY
3+f2dB+TNBFRnZs1aCujK3tWBT9QyjZDKtCBfzxDNWBpXGLhHH6j6lA5Lj+Cef5K
/HNHFb+FuqedlFZh5m1Y+piFQ70hTgCa2u8b+FSOubI2hW9Zd+WzINV0ANaZ2LvZ
4YGtcyDNk1q1+c87lxP9xMRl/xi6rNg+B9T2MCo4IUnHgpSVP6VEB3osgUmrrrN0
eQlUI154G/AaDlqXLgmn1xhRmlPGfmenkxpok1AuzxvNJsfLKnpEwQSc13g3oiZr
disZQxNY0kBO2Nv3G323Z6PLinhbiIIFez6cJzK5v0YJ2WtO3pY=
=uEro
-----END PGP SIGNATURE-----
Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ptrace cleanups from Eric Biederman:
"This set of changes removes tracehook.h, moves modification of all of
the ptrace fields inside of siglock to remove races, adds a missing
permission check to ptrace.c
The removal of tracehook.h is quite significant as it has been a major
source of confusion in recent years. Much of that confusion was around
task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
semantics clearer).
For people who don't know tracehook.h is a vestiage of an attempt to
implement uprobes like functionality that was never fully merged, and
was later superseeded by uprobes when uprobes was merged. For many
years now we have been removing what tracehook functionaly a little
bit at a time. To the point where anything left in tracehook.h was
some weird strange thing that was difficult to understand"
* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ptrace: Remove duplicated include in ptrace.c
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
ptrace: Return the signal to continue with from ptrace_stop
ptrace: Move setting/clearing ptrace_message into ptrace_stop
tracehook: Remove tracehook.h
resume_user_mode: Move to resume_user_mode.h
resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
task_work: Call tracehook_notify_signal from get_signal on all architectures
task_work: Introduce task_work_pending
task_work: Remove unnecessary include from posix_timers.h
ptrace: Remove tracehook_signal_handler
ptrace: Remove arch_syscall_{enter,exit}_tracehook
ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
ptrace/arm: Rename tracehook_report_syscall report_syscall
ptrace: Move ptrace_report_syscall into ptrace.h
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout
the stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower
iTLB pressure and prevents identity mapping huge pages from
getting split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop
the user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if called
from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling
of TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
/DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
=ELpe
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"The sprinkling of SPI drivers is because we added a new one and Mark
sent us a SPI driver interface conversion pull request.
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout the
stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower iTLB
pressure and prevents identity mapping huge pages from getting
split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop the
user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if
called from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling of
TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup"
* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
llc: fix netdevice reference leaks in llc_ui_bind()
drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
ice: don't allow to run ice_send_event_to_aux() in atomic ctx
ice: fix 'scheduling while atomic' on aux critical err interrupt
net/sched: fix incorrect vlan_push_eth dest field
net: bridge: mst: Restrict info size queries to bridge ports
net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
drivers: net: xgene: Fix regression in CRC stripping
net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
net: dsa: fix missing host-filtered multicast addresses
net/mlx5e: Fix build warning, detected write beyond size of field
iwlwifi: mvm: Don't fail if PPAG isn't supported
selftests/bpf: Fix kprobe_multi test.
Revert "rethook: x86: Add rethook x86 implementation"
Revert "arm64: rethook: Add arm64 rethook implementation"
Revert "powerpc: Add rethook support"
Revert "ARM: rethook: Add rethook arm implementation"
netdevice: add missing dm_private kdoc
net: bridge: mst: prevent NULL deref in br_mst_info_size()
selftests: forwarding: Use same VRF for port and VLAN upper
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmI473AUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPaMBAAuxb+RBG0Wqlt0ktUYHF0ZxDVJOTK
YGGmaDp657YJ349+c0U3mrhm7Wj8Mn7Eoz3tAYUWRQ5xPziJQRX7PfxFzT/qpPUz
XYLRppwCWpLSB5NdpNzK3RdGNv+/9BzZ6gmjTj2wfsUCOA8cfpB1pYwyIWm6M9B+
FXMTZ7WOqiuJ3wJa5nD1PPM1z+99nPkYiE6/iKsDidbQgSl8NX6mJY/yUsVxcZ6A
c45n0Pf6Fj9w1XKdVDPfiRY4nekmPCwqbrn7QVtiuCYyC54JcZNmuCQnoN8dy5XY
s/j2M2DBxT6M9rjOqQznL5jGdNKFCWydCAso06JO/13pfakvPpSS6v95Iltqkbtw
1oHf3j5URIirAhyqcyPGoQz+g5c6krgx/Z2GOpvDs9r/AQ80GlpOBYhN3x61lVT5
MLYq0ylV1Vfosnv7a6+AQZ9lJAkmIqws1WtG28adn7/zMPyD/hWwQ7736k/50CMl
oC6zi3G6jCZueWdHZviqf96bjW20ZmNL2DQRy0n8ZSQQGgrsQnFgYMpXtB1Zv8+m
XaDOPo20Ne68rzmTsEp2gVgcnXFc5/KQBDvaUta9etrbTEWqQqqTWiP8mA2QiGme
JwKMgprV0uVDd6s9TC/O0as02xoKrWuGaL7czhlFxuL45k0nYDmk7ea/gz9MrcWV
Y5pzAxs4LVMwVzs=
=5E1v
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20220321' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"We've got a number of SELinux patches queued up, the highlights are:
- Fixup the security_fs_context_parse_param() LSM hook so it executes
all of the LSM hook implementations unless a serious error occurs.
We also correct the SELinux hook implementation so that it returns
zero on success.
- In addition to a few SELinux mount option parsing fixes, we
simplified the parsing by moving it earlier in the process.
The logic was that it was unlikely an admin/user would use the new
mount API and not have the policy loaded before passing the SELinux
options.
- Properly fixed the LSM/SELinux/SCTP hooks with the addition of the
security_sctp_assoc_established() hook.
This work was done in conjunction with the netdev folks and should
complete the move of the SCTP labeling from the endpoints to the
associations.
- Fixed a variety of sparse warnings caused by changes in the "__rcu"
markings of some core kernel structures.
- Ensure we access the superblock's LSM security blob using the
stacking-safe accessors.
- Added the ability for the kernel to always allow FIOCLEX and
FIONCLEX if the "ioctl_skip_cloexec" policy capability is
specified.
- Various constifications improvements, type casting improvements,
additional return value checks, and dead code/parameter removal.
- Documentation fixes"
* tag 'selinux-pr-20220321' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (23 commits)
selinux: shorten the policy capability enum names
docs: fix 'make htmldocs' warning in SCTP.rst
selinux: allow FIOCLEX and FIONCLEX with policy capability
selinux: use correct type for context length
selinux: drop return statement at end of void functions
security: implement sctp_assoc_established hook in selinux
security: add sctp_assoc_established hook
selinux: parse contexts for mount options early
selinux: various sparse fixes
selinux: try to use preparsed sid before calling parse_sid()
selinux: Fix selinux_sb_mnt_opts_compat()
LSM: general protection fault in legacy_parse_param
selinux: fix a type cast problem in cred_init_security()
selinux: drop unused macro
selinux: simplify cred_init_security
selinux: do not discard const qualifier in cast
selinux: drop unused parameter of avtab_insert_node
selinux: drop cast to same type
selinux: enclose macro arguments in parenthesis
selinux: declare name parameter of hash_eval const
...