Commit Graph

1323891 Commits

Author SHA1 Message Date
Eric Biggers
878d87fc68 crypto: skcipher - call cond_resched() directly
In skcipher_walk_done(), instead of calling crypto_yield() which
requires a translation between flags, just call cond_resched() directly.
This has the same effect.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-14 11:38:33 +08:00
Eric Biggers
8b13c2239d crypto: skcipher - optimize initializing skcipher_walk fields
The helper functions like crypto_skcipher_blocksize() take in a pointer
to a tfm object, but they actually return properties of the algorithm.
As the Linux kernel is compiled with -fno-strict-aliasing, the compiler
has to assume that the writes to struct skcipher_walk could clobber the
tfm's pointer to its algorithm.  Thus it gets repeatedly reloaded in the
generated code.  Therefore, replace the use of these helper functions
with staightforward accesses to the struct fields.

Note that while *users* of the skcipher and aead APIs are supposed to
use the helper functions, this particular code is part of the API
*implementation* in crypto/skcipher.c, which already accesses the
algorithm struct directly in many cases.  So there is no reason to
prefer the helper functions here.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-14 11:38:33 +08:00
Eric Biggers
f2489456fe crypto: skcipher - clean up initialization of skcipher_walk::flags
- Initialize SKCIPHER_WALK_SLEEP in a consistent way, and check for
  atomic=true at the same time as CRYPTO_TFM_REQ_MAY_SLEEP.  Technically
  atomic=true only needs to apply after the first step, but it is very
  rarely used.  We should optimize for the common case.  So, check
  'atomic' alongside CRYPTO_TFM_REQ_MAY_SLEEP.  This is more efficient.

- Initialize flags other than SKCIPHER_WALK_SLEEP to 0 rather than
  preserving them.  No caller actually initializes the flags, which
  makes it impossible to use their original values for anything.
  Indeed, that does not happen and all meaningful flags get overridden
  anyway.  It may have been thought that just clearing one flag would be
  faster than clearing all flags, but that's not the case as the former
  is a read-write operation whereas the latter is just a write.

- Move the explicit clearing of SKCIPHER_WALK_SLOW, SKCIPHER_WALK_COPY,
  and SKCIPHER_WALK_DIFF into skcipher_walk_done(), since it is now
  only needed on non-first steps.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-14 11:38:33 +08:00
Eric Biggers
d97d0668e8 crypto: skcipher - fold skcipher_walk_skcipher() into skcipher_walk_virt()
Fold skcipher_walk_skcipher() into skcipher_walk_virt() which is its
only remaining caller.  No change in behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-14 11:38:33 +08:00
Eric Biggers
24300d282f crypto: skcipher - remove redundant check for SKCIPHER_WALK_SLOW
In skcipher_walk_done(), remove the check for SKCIPHER_WALK_SLOW because
it is always true.  All other flags (and lack thereof) were checked
earlier in the function, leaving SKCIPHER_WALK_SLOW as the only
remaining possibility.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-14 11:38:32 +08:00
Eric Biggers
a22a2316be crypto: skcipher - remove redundant clamping to page size
In the case where skcipher_walk_next() allocates a bounce page, that
page by definition has size PAGE_SIZE.  The number of bytes to copy 'n'
is guaranteed to fit in it, since earlier in the function it was clamped
to be at most a page.  Therefore remove the unnecessary logic that tried
to clamp 'n' again to fit in the bounce page.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-14 11:38:32 +08:00
Eric Biggers
807c8018f5 crypto: skcipher - remove unnecessary page alignment of bounce buffer
In the slow path of skcipher_walk where it uses a slab bounce buffer for
the data and/or IV, do not bother to avoid crossing a page boundary in
the part(s) of this buffer that are used, and do not bother to allocate
extra space in the buffer for that purpose.  The buffer is accessed only
by virtual address, so pages are irrelevant for it.

This logic may have been present due to the physical address support in
skcipher_walk, but that has now been removed.  Or it may have been
present to be consistent with the fast path that currently does not hand
back addresses that span pages, but that behavior is a side effect of
the pages being "mapped" one by one and is not actually a requirement.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-14 11:38:32 +08:00
Eric Biggers
e71778c95a crypto: skcipher - document skcipher_walk_done() and rename some vars
skcipher_walk_done() has an unusual calling convention, and some of its
local variables have unclear names.  Document it and rename variables to
make it a bit clearer what is going on.  No change in behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-14 11:38:32 +08:00
Eric Biggers
42c5675c2f crypto: omap - switch from scatter_walk to plain offset
The omap driver was using struct scatter_walk, but only to maintain an
offset, rather than iterating through the virtual addresses of the data
contained in the scatterlist which is what scatter_walk is intended for.
Make it just use a plain offset instead.  This is simpler and avoids
using struct scatter_walk in a way that is not well supported.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-14 11:38:32 +08:00
Eric Biggers
ee3c9c7e27 crypto: powerpc/p10-aes-gcm - simplify handling of linear associated data
p10_aes_gcm_crypt() is abusing the scatter_walk API to get the virtual
address for the first source scatterlist element.  But this code is only
built for PPC64 which is a !HIGHMEM platform, and it can read past a
page boundary from the address returned by scatterwalk_map() which means
it already assumes the address is from the kernel's direct map.  Thus,
just use sg_virt() instead to get the same result in a simpler way.

Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Danny Tsen <dtsen@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-14 11:38:32 +08:00
Krzysztof Kozlowski
1742b0a0e4 crypto: bcm - Drop unused setting of local 'ptr' variable
spum_cipher_req_init() assigns 'spu_hdr' to local 'ptr' variable and
later increments 'ptr' over specific fields like it was meant to point
to pieces of message for some purpose.  However the code does not read
'ptr' at all thus this entire iteration over 'spu_hdr' seams pointless.

Reported by clang W=1 build:

  drivers/crypto/bcm/spu.c:839:6: error: variable 'ptr' set but not used [-Werror,-Wunused-but-set-variable]

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-14 11:31:13 +08:00
Yang Shen
061b27e372 crypto: hisilicon/qm - support new function communication
On the HiSilicon accelerators drivers, the PF/VFs driver can send messages
to the VFs/PF by writing hardware registers, and the VFs/PF driver receives
messages from the PF/VFs by reading hardware registers. To support this
feature, a new version id is added, different communication mechanism are
used based on different version id.

Signed-off-by: Yang Shen <shenyang39@huawei.com>
Signed-off-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-14 11:31:13 +08:00
Thorsten Blum
a268231678 crypto: proc - Use str_yes_no() and str_no_yes() helpers
Remove hard-coded strings by using the str_yes_no() and str_no_yes()
helpers. Remove unnecessary curly braces.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-14 11:31:13 +08:00
Eric Biggers
7fa4817340 crypto: ahash - make hash walk functions private to ahash.c
Due to the removal of the Niagara2 SPU driver, crypto_hash_walk_first(),
crypto_hash_walk_done(), crypto_hash_walk_last(), and struct
crypto_hash_walk are now only used in crypto/ahash.c.  Therefore, make
them all private to crypto/ahash.c.  I.e. un-export the two functions
that were exported, make the functions static, and move the struct
definition to the .c file.  As part of this, move the functions to
earlier in the file to avoid needing to add forward declarations.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-04 08:53:47 +08:00
Thomas Weißschuh
9ff6e943bc padata: fix sysfs store callback check
padata_sysfs_store() was copied from padata_sysfs_show() but this check
was not adapted. Today there is no attribute which can fail this
check, but if there is one it may as well be correct.

Fixes: 5e017dc3f8 ("padata: Added sysfs primitives to padata subsystem")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-04 08:53:47 +08:00
Eric Biggers
730f67d8b8 crypto: keywrap - remove unused keywrap algorithm
The keywrap (kw) algorithm has no in-tree user.  It has never had an
in-tree user, and the patch that added it provided no justification for
its inclusion.  Even use of it via AF_ALG is impossible, as it uses a
weird calling convention where part of the ciphertext is returned via
the IV buffer, which is not returned to userspace in AF_ALG.

It's also unclear whether any new code in the kernel that does key
wrapping would actually use this algorithm.  It is controversial in the
cryptographic community due to having no clearly stated security goal,
no security proof, poor performance, and only a 64-bit auth tag.  Later
work (https://eprint.iacr.org/2006/221) suggested that the goal is
deterministic authenticated encryption.  But there are now more modern
algorithms for this, and this is not the same as key wrapping, for which
a regular AEAD such as AES-GCM usually can be (and is) used instead.

Therefore, remove this unused code.

There were several special cases for this algorithm in the self-tests,
due to its weird calling convention.  Remove those too.

Cc: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-04 08:53:47 +08:00
Eric Biggers
2890601f54 crypto: vmac - remove unused VMAC algorithm
Remove the vmac64 template, as it has no known users.  It also continues
to have longstanding bugs such as alignment violations (see
https://lore.kernel.org/r/20241226134847.6690-1-evepolonium@gmail.com/).

This code was added in 2009 by commit f1939f7c56 ("crypto: vmac - New
hash algorithm for intel_txt support").  Based on the mention of
intel_txt support in the commit title, it seems it was added as a
prerequisite for the contemporaneous patch
"intel_txt: add s3 userspace memory integrity verification"
(https://lore.kernel.org/r/4ABF2B50.6070106@intel.com/).  In the design
proposed by that patch, when an Intel Trusted Execution Technology (TXT)
enabled system resumed from suspend, the "tboot" trusted executable
launched the Linux kernel without verifying userspace memory, and then
the Linux kernel used VMAC to verify userspace memory.

However, that patch was never merged, as reviewers had objected to the
design.  It was later reworked into commit 4bd96a7a81 ("x86, tboot:
Add support for S3 memory integrity protection") which made tboot verify
the memory instead.  Thus the VMAC support in Linux was never used.

No in-tree user has appeared since then, other than potentially the
usual components that allow specifying arbitrary hash algorithms by
name, namely AF_ALG and dm-integrity.  However there are no indications
that VMAC is being used with these components.  Debian Code Search and
web searches for "vmac64" (the actual algorithm name) do not return any
results other than the kernel itself, suggesting that it does not appear
in any other code or documentation.  Explicitly grepping the source code
of the usual suspects (libell, iwd, cryptsetup) finds no matches either.

Before 2018, the vmac code was also completely broken due to using a
hardcoded nonce and the wrong endianness for the MAC.  It was then fixed
by commit ed331adab3 ("crypto: vmac - add nonced version with big
endian digest") and commit 0917b87312 ("crypto: vmac - remove insecure
version with hardcoded nonce").  These were intentionally breaking
changes that changed all the computed MAC values as well as the
algorithm name ("vmac" to "vmac64").  No complaints were ever received
about these breaking changes, strongly suggesting the absence of users.

The reason I had put some effort into fixing this code in 2018 is
because it was used by an out-of-tree driver.  But if it is still needed
in that particular out-of-tree driver, the code can be carried in that
driver instead.  There is no need to carry it upstream.

Cc: Atharva Tiwari <evepolonium@gmail.com>
Cc: Shane Wang <shane.wang@intel.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-04 08:52:03 +08:00
Md Sadre Alam
eb680160cf dt-bindings: crypto: qcom,prng: document ipq9574, ipq5424 and ipq5322
Document ipq9574, ipq5424 and ipq5322 compatible for the True Random Number
Generator.

Signed-off-by: Md Sadre Alam <quic_mdalam@quicinc.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-04 08:52:03 +08:00
Thorsten Blum
8f904adef6 crypto: fips - Use str_enabled_disabled() helper in fips_enable()
Remove hard-coded strings by using the str_enabled_disabled() helper
function.

Use pr_info() instead of printk(KERN_INFO) to silence a checkpatch
warning.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-04 08:52:03 +08:00
Kanchana P Sridhar
4ebd9a5ca4 crypto: iaa - Fix IAA disabling that occurs when sync_mode is set to 'async'
With the latest mm-unstable, setting the iaa_crypto sync_mode to 'async'
causes crypto testmgr.c test_acomp() failure and dmesg call traces, and
zswap being unable to use 'deflate-iaa' as a compressor:

echo async > /sys/bus/dsa/drivers/crypto/sync_mode

[  255.271030] zswap: compressor deflate-iaa not available
[  369.960673] INFO: task cryptomgr_test:4889 blocked for more than 122 seconds.
[  369.970127]       Not tainted 6.13.0-rc1-mm-unstable-12-16-2024+ #324
[  369.977411] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.986246] task:cryptomgr_test  state:D stack:0     pid:4889  tgid:4889  ppid:2      flags:0x00004000
[  369.986253] Call Trace:
[  369.986256]  <TASK>
[  369.986260]  __schedule+0x45c/0xfa0
[  369.986273]  schedule+0x2e/0xb0
[  369.986277]  schedule_timeout+0xe7/0x100
[  369.986284]  ? __prepare_to_swait+0x4e/0x70
[  369.986290]  wait_for_completion+0x8d/0x120
[  369.986293]  test_acomp+0x284/0x670
[  369.986305]  ? __pfx_cryptomgr_test+0x10/0x10
[  369.986312]  alg_test_comp+0x263/0x440
[  369.986315]  ? sched_balance_newidle+0x259/0x430
[  369.986320]  ? __pfx_cryptomgr_test+0x10/0x10
[  369.986323]  alg_test.part.27+0x103/0x410
[  369.986326]  ? __schedule+0x464/0xfa0
[  369.986330]  ? __pfx_cryptomgr_test+0x10/0x10
[  369.986333]  cryptomgr_test+0x20/0x40
[  369.986336]  kthread+0xda/0x110
[  369.986344]  ? __pfx_kthread+0x10/0x10
[  369.986346]  ret_from_fork+0x2d/0x40
[  369.986355]  ? __pfx_kthread+0x10/0x10
[  369.986358]  ret_from_fork_asm+0x1a/0x30
[  369.986365]  </TASK>

This happens because the only async polling without interrupts that
iaa_crypto currently implements is with the 'sync' mode. With 'async',
iaa_crypto calls to compress/decompress submit the descriptor and return
-EINPROGRESS, without any mechanism in the driver to poll for
completions. Hence callers such as test_acomp() in crypto/testmgr.c or
zswap, that wrap the calls to crypto_acomp_compress() and
crypto_acomp_decompress() in synchronous wrappers, will block
indefinitely. Even before zswap can notice this problem, the crypto
testmgr.c's test_acomp() will fail and prevent registration of
"deflate-iaa" as a valid crypto acomp algorithm, thereby disallowing the
use of "deflate-iaa" as a zswap compress (zswap will fall-back to the
default compressor in this case).

To fix this issue, this patch modifies the iaa_crypto sync_mode set
function to treat 'async' equivalent to 'sync', so that the correct and
only supported driver async polling without interrupts implementation is
enabled, and zswap can use 'deflate-iaa' as the compressor.

Hence, with this patch, this is what will happen:

echo async > /sys/bus/dsa/drivers/crypto/sync_mode
cat /sys/bus/dsa/drivers/crypto/sync_mode
sync

There are no crypto/testmgr.c test_acomp() errors, no call traces and zswap
can use 'deflate-iaa' without any errors. The iaa_crypto documentation has
also been updated to mention this caveat with 'async' and what to expect
with this fix.

True iaa_crypto async polling without interrupts is enabled in patch
"crypto: iaa - Implement batch_compress(), batch_decompress() API in
iaa_crypto." [1] which is under review as part of the "zswap IAA compress
batching" patch-series [2]. Until this is merged, we would appreciate it if
this current patch can be considered for a hotfix.

[1]: https://patchwork.kernel.org/project/linux-mm/patch/20241221063119.29140-5-kanchana.p.sridhar@intel.com/
[2]: https://patchwork.kernel.org/project/linux-mm/list/?series=920084

Fixes: 09646c98d ("crypto: iaa - Add irq support for the crypto async interface")
Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-28 19:49:23 +08:00
Herbert Xu
de662429f3 crypto: lib/aesgcm - Reduce stack usage in libaesgcm_init
The stack frame in libaesgcm_init triggers a size warning on x86-64.
Reduce it by making buf static.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-28 19:49:22 +08:00
Nathan Chancellor
7b6092ee7a crypto: qce - revert "use __free() for a buffer that's always freed"
Commit ce8fd0500b ("crypto: qce - use __free() for a buffer that's
always freed") introduced a buggy use of __free(), which clang
rightfully points out:

  drivers/crypto/qce/sha.c:365:3: error: cannot jump from this goto statement to its label
    365 |                 goto err_free_ahash;
        |                 ^
  drivers/crypto/qce/sha.c:373:6: note: jump bypasses initialization of variable with __attribute__((cleanup))
    373 |         u8 *buf __free(kfree) = kzalloc(keylen + QCE_MAX_ALIGN_SIZE,
        |             ^

Jumping over a variable declared with the cleanup attribute does not
prevent the cleanup function from running; instead, the cleanup function
is called with an uninitialized value.

Moving the declaration back to the top function with __free() and a NULL
initialization would resolve the bug but that is really not much
different from the original code. Since the function is so simple and
there is no functional reason to use __free() here, just revert the
original change to resolve the issue.

Fixes: ce8fd0500b ("crypto: qce - use __free() for a buffer that's always freed")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/CA+G9fYtpAwXa5mUQ5O7vDLK2xN4t-kJoxgUe1ZFRT=AGqmLSRA@mail.gmail.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-21 22:46:24 +08:00
Joe Hattori
472a989029 crypto: ixp4xx - fix OF node reference leaks in init_ixp_crypto()
init_ixp_crypto() calls of_parse_phandle_with_fixed_args() multiple
times, but does not release all the obtained refcounts. Fix it by adding
of_node_put() calls.

This bug was found by an experimental static analysis tool that I am
developing.

Fixes: 76f24b4f46 ("crypto: ixp4xx - Add device tree support")
Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-21 22:46:24 +08:00
Wenkai Lin
a5a9d95993 crypto: hisilicon/sec2 - fix for aead invalid authsize
When the digest alg is HMAC-SHAx or another, the authsize may be less
than 4 bytes and mac_len of the BD is set to zero, the hardware considers
it a BD configuration error and reports a ras error, so the sec driver
needs to switch to software calculation in this case, this patch add a
check for it and remove unnecessary check that has been done by crypto.

Fixes: 2f072d75d1 ("crypto: hisilicon - Add aead support on SEC2")
Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-21 22:46:24 +08:00
Wenkai Lin
fd337f852b crypto: hisilicon/sec2 - fix for aead icv error
When the AEAD algorithm is used for encryption or decryption,
the input authentication length varies, the hardware needs to
obtain the input length to pass the integrity check verification.
Currently, the driver uses a fixed authentication length,which
causes decryption failure, so the length configuration is modified.
In addition, the step of setting the auth length is unnecessary,
so it was deleted from the setkey function.

Fixes: 2f072d75d1 ("crypto: hisilicon - Add aead support on SEC2")
Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-21 22:46:24 +08:00
Eric Biggers
3cd46a78ee crypto: x86/aes-xts - additional optimizations
Reduce latency by taking advantage of the property vaesenclast(key, a) ^
b == vaesenclast(key ^ b, a), like I did in the AES-GCM code.

Also replace a vpand and vpxor with a vpternlogd.

On AMD Zen 5 this improves performance by about 3%.  Intel performance
remains about the same, with a 0.1% improvement being seen on Icelake.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-21 22:46:24 +08:00
Eric Biggers
68e95f5c64 crypto: x86/aes-xts - more code size optimizations
Prefer immediates of -128 to 128, since the former fits in a signed
byte, saving 3 bytes per instruction.  Also prefer VEX-coded
instructions to EVEX where this is easy to do.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-21 22:46:24 +08:00
Eric Biggers
77a4b5675b crypto: x86/aes-xts - change len parameter to int
The AES-XTS assembly code currently treats the length as signed, since
this saves a few instructions in the loop compared to treating it as
unsigned.  Therefore update the type to make this clear.  (It is not
actually passed any values larger than PAGE_SIZE.)

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-21 22:46:24 +08:00
Eric Biggers
bd7e7df6e6 crypto: x86/aes-xts - improve some comments
Improve some of the comments in aes-xts-avx-x86_64.S.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-21 22:46:24 +08:00
Eric Biggers
d1bb1c32f9 crypto: x86/aes-xts - make the register aliases per-function
Since aes-xts-avx-x86_64.S contains multiple functions, move the
register aliases for the parameters and local variables of the XTS
update function into the macro that generates that function.  Then add
register aliases to aes_xts_encrypt_iv() to improve readability there.
This makes aes-xts-avx-x86_64.S consistent with the GCM assembly files.

No change in the generated code.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-21 22:46:24 +08:00
Eric Biggers
5b7981c1ca crypto: x86/aes-xts - use .irp when useful
Use .irp instead of repeating code.

No change in the generated code.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-21 22:46:24 +08:00
Eric Biggers
95791ccd11 crypto: x86/aes-gcm - tune better for AMD CPUs
Reorganize the main loop to free up the RNDKEYLAST[0-3] registers and
use them for more cached round keys.  This improves performance by about
2% on AMD Zen 4 and Zen 5.  Intel performance remains about the same.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-21 22:46:24 +08:00
Eric Biggers
3cae5a3c05 crypto: x86/aes-gcm - code size optimization
Prefer immediates of -128 to 128, since the former fits in a signed
byte, saving 3 bytes per instruction.  Also replace a vpand and vpxor
with a vpternlogd.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-21 22:46:24 +08:00
Dr. David Alan Gilbert
b9b894642f crypto: lib/gf128mul - Remove some bbe deadcode
gf128mul_4k_bbe(), gf128mul_bbe() and gf128mul_init_4k_bbe()
are part of the library originally added in 2006 by
commit c494e0705d ("[CRYPTO] lib: table driven multiplications in
GF(2^128)")

but have never been used.

Remove them.
(BBE is Big endian Byte/Big endian bits
Note the 64k table version is used and I've left that in)

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-21 22:46:24 +08:00
Breno Leitao
e1d3422c95 rhashtable: Fix potential deadlock by moving schedule_work outside lock
Move the hash table growth check and work scheduling outside the
rht lock to prevent a possible circular locking dependency.

The original implementation could trigger a lockdep warning due to
a potential deadlock scenario involving nested locks between
rhashtable bucket, rq lock, and dsq lock. By relocating the
growth check and work scheduling after releasing the rth lock, we break
this potential deadlock chain.

This change expands the flexibility of rhashtable by removing
restrictive locking that previously limited its use in scheduler
and workqueue contexts.

Import to say that this calls rht_grow_above_75(), which reads from
struct rhashtable without holding the lock, if this is a problem, we can
move the check to the lock, and schedule the workqueue after the lock.

Fixes: f0e1a0643a ("sched_ext: Implement BPF extensible scheduler class")
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>

Modified so that atomic_inc is also moved outside of the bucket
lock along with the growth above 75% check.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-21 17:05:29 +08:00
Eric Biggers
f916e44487 crypto: keywrap - remove assignment of 0 to cra_alignmask
Since this code is zero-initializing the algorithm struct, the
assignment of 0 to cra_alignmask is redundant.  Remove it to reduce the
number of matches that are found when grepping for cra_alignmask.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:44 +08:00
Eric Biggers
5478ced478 crypto: aegis - remove assignments of 0 to cra_alignmask
Struct fields are zero by default, so these lines of code have no
effect.  Remove them to reduce the number of matches that are found when
grepping for cra_alignmask.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:44 +08:00
Eric Biggers
a6185842d1 crypto: x86 - remove assignments of 0 to cra_alignmask
Struct fields are zero by default, so these lines of code have no
effect.  Remove them to reduce the number of matches that are found when
grepping for cra_alignmask.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:44 +08:00
Eric Biggers
047ea6d85e crypto: seed - stop using cra_alignmask
Instead of specifying a nonzero alignmask, use the unaligned access
helpers.  This eliminates unnecessary alignment operations on most CPUs,
which can handle unaligned accesses efficiently, and brings us a step
closer to eventually removing support for the alignmask field.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:44 +08:00
Eric Biggers
7e0061586f crypto: khazad - stop using cra_alignmask
Instead of specifying a nonzero alignmask, use the unaligned access
helpers.  This eliminates unnecessary alignment operations on most CPUs,
which can handle unaligned accesses efficiently, and brings us a step
closer to eventually removing support for the alignmask field.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:44 +08:00
Eric Biggers
5e252f490c crypto: tea - stop using cra_alignmask
Instead of specifying a nonzero alignmask, use the unaligned access
helpers.  This eliminates unnecessary alignment operations on most CPUs,
which can handle unaligned accesses efficiently, and brings us a step
closer to eventually removing support for the alignmask field.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:43 +08:00
Eric Biggers
6c178fd66b crypto: aria - stop using cra_alignmask
Instead of specifying a nonzero alignmask, use the unaligned access
helpers.  This eliminates unnecessary alignment operations on most CPUs,
which can handle unaligned accesses efficiently, and brings us a step
closer to eventually removing support for the alignmask field.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:43 +08:00
Eric Biggers
8d90528228 crypto: anubis - stop using cra_alignmask
Instead of specifying a nonzero alignmask, use the unaligned access
helpers.  This eliminates unnecessary alignment operations on most CPUs,
which can handle unaligned accesses efficiently, and brings us a step
closer to eventually removing support for the alignmask field.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:43 +08:00
Eric Biggers
07d58e0a60 crypto: skcipher - remove support for physical address walks
Since the physical address support in skcipher_walk is not used anymore,
remove all the code associated with it.  This includes:

- The skcipher_walk_async() and skcipher_walk_complete() functions;

- The SKCIPHER_WALK_PHYS flag and everything conditional on it;

- The buffers, phys, and virt.page fields in struct skcipher_walk;

- struct skcipher_walk_buffer.

As a result, skcipher_walk now just supports virtual addresses.
Physical address support in skcipher_walk is unneeded because drivers
that need physical addresses just use the scatterlists directly.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:43 +08:00
Eric Biggers
9cda46babd crypto: n2 - remove Niagara2 SPU driver
Remove the driver for the Stream Processing Unit (SPU) on the Niagara 2.

Removing this driver allows removing the support for physical address
walks in skcipher_walk.  That is a misfeature that is used only by this
driver and increases the overhead of the crypto API for everyone else.

There is little evidence that anyone cares about this driver.  The
Niagara 2, a.k.a. the UltraSPARC T2, is a server CPU released in
2007.  The SPU is also present on the SPARC T3, released in 2010.
However, the SPU went away in SPARC T4, released in 2012, which replaced
it with proper cryptographic instructions instead.  These newer
instructions are supported by the kernel in arch/sparc/crypto/.

This driver was completely broken from (at least) 2015 to 2022, from
commit 8996eafdcb ("crypto: ahash - ensure statesize is non-zero") to
commit 76a4e87459 ("crypto: n2 - add missing hash statesize"), since
its probe function always returned an error before registering any
algorithms.  Though, even with that obvious issue fixed, it is unclear
whether the driver now works correctly.  E.g., there are no indications
that anyone has run the self-tests recently.

One bug report for this driver in 2017
(https://lore.kernel.org/r/nycvar.YFH.7.76.1712110214220.28416@n3.vanv.qr)
complained that it crashed the kernel while being loaded.  The reporter
didn't seem to care about the functionality of the driver, but rather
just the fact that loading it crashed the kernel.  In fact not until
2022 was the driver fixed to maybe actually register its algorithms with
the crypto API.  The 2022 fix does have a Reported-by and Tested-by, but
that may similarly have been just about making the error messages go
away as opposed to someone actually wanting to use the driver.

As such, it seems appropriate to retire this driver in mainline.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:43 +08:00
Eric Biggers
49b9258b05 crypto: qce - fix priority to be less than ARMv8 CE
As QCE is an order of magnitude slower than the ARMv8 Crypto Extensions
on the CPU, and is also less well tested, give it a lower priority.
Previously the QCE SHA algorithms had higher priority than the ARMv8 CE
equivalents, and the ciphers such as AES-XTS had the same priority which
meant the QCE versions were chosen if they happened to be loaded later.

Fixes: ec8f5d8f6f ("crypto: qce - Qualcomm crypto engine driver")
Cc: stable@vger.kernel.org
Cc: Bartosz Golaszewski <brgl@bgdev.pl>
Cc: Neil Armstrong <neil.armstrong@linaro.org>
Cc: Thara Gopinath <thara.gopinath@gmail.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:43 +08:00
Mario Limonciello
f1e532d05a crypto: ccp - Use scoped guard for mutex
Use a scoped guard to simplify the cleanup handling.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:43 +08:00
Bartosz Golaszewski
3382c44f0c crypto: qce - switch to using a mutex
Having switched to workqueue from tasklet, we are no longer limited to
atomic APIs and can now convert the spinlock to a mutex. This, along
with the conversion from tasklet to workqueue grants us ~15% improvement
in cryptsetup benchmarks for AES encryption.

While at it: use guards to simplify locking code.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:43 +08:00
Bartosz Golaszewski
eb7986e5e1 crypto: qce - convert tasklet to workqueue
There's nothing about the qce driver that requires running from a
tasklet. Switch to using the system workqueue.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:43 +08:00
Bartosz Golaszewski
ce8fd0500b crypto: qce - use __free() for a buffer that's always freed
The buffer allocated in qce_ahash_hmac_setkey is always freed before
returning to use __free() to automate it.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-12-14 17:21:43 +08:00