Currently fp-stress only covers userspace use of floating point, it does
not cover any kernel mode uses. Since currently kernel mode floating
point usage can't be preempted and there are explicit preemption points in
the existing implementations this isn't so important for fp-stress but
when we readd preemption it will be good to try to exercise it.
When the arm64 accelerated crypto operations are implemented we can
relatively straightforwardly trigger kernel mode floating point usage by
using the crypto userspace API to hash data, using the splice() support
in an effort to minimise copying. We use /proc/crypto to check which
accelerated implementations are available, picking the first symmetric
hash we find. We run the kernel mode test unconditionally, replacing the
second copy of the FPSIMD testcase for systems with FPSIMD only. If we
don't think there are any suitable kernel mode implementations we fall back
to running another copy of fpsimd-stress.
There are a number issues with this approach, we don't actually verify
that we are using an accelerated (or even CPU) implementation of the
algorithm being tested and even with attempting to use splice() to
minimise copying there are sizing limits on how much data gets spliced
at once.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240521-arm64-fp-stress-kernel-v1-1-e38f107baad4@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
While we have test coverage for the ptrace interface in our selftests
the current programs have a number of gaps. The testing is done per
regset so does not cover interactions and at no point do any of the
tests actually run the traced processes meaning that there is no
validation that anything we read or write corresponds to register values
the process actually sees. Let's add a new program which attempts to cover
these gaps.
Each test we do performs a single ptrace write. For each test we generate
some random initial register data in memory and then fork() and trace a
child. The child will load the generated data into the registers then
trigger a breakpoint. The parent waits for the breakpoint then reads the
entire child register state via ptrace, verifying that the values expected
were actually loaded by the child. It then does the write being tested
and resumes the child. Once resumed the child saves the register state
it sees to memory and executes another breakpoint. The parent uses
process_vm_readv() to get these values from the child and verifies that
the values were as expected before cleaning up the child.
We generate configurations with combinations of vector lengths and SVCR
values and then try every ptrace write which will implement the
transition we generated. In order to control execution time (especially
in emulation) we only cover the minimum and maximum VL for each of SVE
and SME, this will ensure we generate both increasing and decreasing
changes in vector length. In order to provide a baseline test we also
check the case where we resume the child without doing a ptrace write.
In order to simplify the generation of the test count for kselftest we
will report but skip a substantial number of tests that can't actually
be expressed via a single ptrace write, several times more than we
actually run. This is noisy and will add some overhead but is very much
simpler so is probably worth the tradeoff.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240122-arm64-test-ptrace-regs-v1-1-0897f822d73e@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Following the pattern for the other register sets add a stress test program
for ZT0 which continually loads and verifies patterns in the register in
an effort to discover context switching problems.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221208-arm64-sme2-v4-14-f2fa0aef982f@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently the stress test programs for floating point context switching are
run by hand, there are extremely simplistic harnesses which run some copies
of each test individually but they are not integrated into kselftest and
with SVE and SME they only run with whatever vector length the process has
by default. This is hassle when running the tests and means that they're
not being run at all by CI systems picking up kselftest.
In order to improve our coverage and provide a more convenient interface
provide a harness program which starts enough stress test programs up to
cause context switching and runs them for a set period. If only FPSIMD is
available in the system we start two copies of the FPSIMD stress test per
CPU, otherwise we start one copy of the FPSIMD and then start the SVE,
streaming SVE and ZA tests once per CPU for each available VL they have
to run on. We then run for a set period monitoring for any errors
reported by the test programs before cleanly terminating them.
In order to provide additional coverage of signal handling and some extra
noise in the scheduling we send a SIGUSR2 to the stress tests once a
second, the tests will count the number of signals they get.
Since kselftest is generally expected to run quickly we by default only run
for ten seconds. This is enough to show if there is anything cripplingly
wrong but not exactly a thorough soak test, for interactive and more
focused use a command line option -t N is provided which overrides the
length of time to run for (specified in seconds) and if 0 is specified then
there is no timeout and the test must be manually terminated. The timeout
is counted in seconds with no output, this is done to account for the
potentially slow startup time for the test programs on virtual platforms
which tend to struggle during startup as they are both slow and tend to
support a wide range of vector lengths.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220829154452.824870-5-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add a small testcase that attempts to do a clone() with ZA enabled and
verifies that it remains enabled with the same contents. We only check
one word in one horizontal vector of ZA since there's already other tests
that check for data corruption more broadly, we're just looking to make
sure that ZA is still enabled and it looks like the data got copied.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220419112247.711548-40-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add some basic coverage for the ZA ptrace interface, including walking
through all the vector lengths supported in the system. Unlike SVE
doing syscalls does not discard the ZA state so when we set data in ZA
we run the child process briefly, having it add one to each byte in ZA
in order to validate that both the vector size and data are being read
and written as expected when the process runs.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-38-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add a stress test for context switching of the ZA register state based on
the similar tests Dave Martin wrote for FPSIMD and SVE registers. The test
loops indefinitely writing a data pattern to ZA then reading it back and
verifying that it's what was expected.
Unlike the other tests we manually assemble the SME instructions since at
present no released toolchain has SME support integrated.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-35-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
One of the features of SME is the addition of streaming mode, in which we
have access to a set of streaming mode SVE registers at the SME vector
length. Since these are accessed using the SVE instructions let's reuse
the existing SVE stress test for testing with a compile time option for
controlling the few small differences needed:
- Enter streaming mode immediately on starting the program.
- In streaming mode FFR is removed so skip reading and writing FFR.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-33-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Provide RDVL helpers for SME and extend the main vector configuration tests
to cover SME.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220419112247.711548-32-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Since it's likely to be useful for performance work with SVE let's have a
pidbench that gives us some numbers for consideration. In order to ensure
that we test exactly the scenario we want this is written in assembly - if
system libraries use SVE this would stop us exercising the case where the
process has never used SVE.
We exercise three cases:
- Never having used SVE.
- Having used SVE once.
- Using SVE after each syscall.
by spinning running getpid() for a fixed number of iterations with the
time measured using CNTVCT_EL0 reported on the console. This is obviously
a totally unrealistic benchmark which will show the extremes of any
performance variation but equally given the potential gotchas with use of
FP instructions by system libraries it's good to have some concrete code
shared to make it easier to compare notes on results.
Testing over multiple SVE vector lengths will need to be done with vlset
currently, the test could be extended to iterate over all of them if
desired.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211202165107.1075259-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We provide interfaces for configuring the SVE vector length seen by
processes using prctl and also via /proc for configuring the default
values. Provide tests that exercise all these interfaces and verify that
they take effect as expected, though at present no test fully enumerates
all the possible vector lengths.
A subset of this is already tested via sve-probe-vls but the /proc
interfaces are not currently covered at all.
In preparation for the forthcoming support for SME, the Scalable Matrix
Extension, which has separately but similarly configured vector lengths
which we expect to offer similar userspace interfaces for, all the actual
files and prctls used are parameterised and we don't validate that the
architectural minimum vector length is the minimum we see.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/r/20210803140450.46624-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
SVE provides an instruction RDVL which reports the currently configured
vector length. In order to validate that our vector length configuration
interfaces are working correctly without having to build the C code for
our test programs with SVE enabled provide a trivial assembly library
with a C callable function that executes RDVL. Since these interfaces
also control behaviour on exec*() provide a trivial wrapper program which
reports the currently configured vector length on stdout, tests can use
this to verify that behaviour on exec*() is as expected.
In preparation for providing similar helper functionality for SME, the
Scalable Matrix Extension, which allows separately configured vector
lengths to be read back both the assembler function and wrapper binary
have SVE included in their name.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/r/20210803140450.46624-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Integrate the FP tests with the build system and add some documentation
for the ones run outside the kselftest infrastructure. The content in
the README was largely written by Dave Martin with edits by me.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20200819114837.51466-7-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>