mirror of https://github.com/nxp-imx/linux-imx.git synced 2025-12-25 03:47:31 +01:00

Mathieu Desnoyers 5b25b13ab0 sys_membarrier(): system-wide memory barrier (generic, x86)

Here is an implementation of a new system call, sys_membarrier(), which
executes a memory barrier on all threads running on the system.  It is
implemented by calling synchronize_sched().  It can be used to
distribute the cost of user-space memory barriers asymmetrically by
transforming pairs of memory barriers into pairs consisting of
sys_membarrier() and a compiler barrier.  For synchronization primitives
that distinguish between read-side and write-side (e.g.  userspace RCU
[1], rwlocks), the read-side can be accelerated significantly by moving
the bulk of the memory barrier overhead to the write-side.

The existing applications of which I am aware that would be improved by
this system call are as follows:

* Through Userspace RCU library (http://urcu.so)
  - DNS server (Knot DNS) https://www.knot-dns.cz/
  - Network sniffer (http://netsniff-ng.org/)
  - Distributed object storage (https://sheepdog.github.io/sheepdog/)
  - User-space tracing (http://lttng.org)
  - Network storage system (https://www.gluster.org/)
  - Virtual routers (https://events.linuxfoundation.org/sites/events/files/slides/DPDK_RCU_0MQ.pdf)
  - Financial software (https://lkml.org/lkml/2015/3/23/189)

Those projects use RCU in userspace to increase read-side speed and
scalability compared to locking.  Especially in the case of RCU used by
libraries, sys_membarrier can speed up the read-side by moving the bulk of
the memory barrier cost to synchronize_rcu().

* Direct users of sys_membarrier
  - core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198)

Microsoft core dotnet GC developers are planning to use the mprotect()
side-effect of issuing memory barriers through IPIs as a way to implement
Windows FlushProcessWriteBuffers() on Linux.  They are referring to
sys_membarrier in their github thread, specifically stating that
sys_membarrier() is what they are looking for.

To explain the benefit of this scheme, let's introduce two example threads:

Thread A (non-frequent, e.g. executing liburcu synchronize_rcu())
Thread B (frequent, e.g. executing liburcu
rcu_read_lock()/rcu_read_unlock())

In a scheme where all smp_mb() in thread A are ordering memory accesses
with respect to smp_mb() present in Thread B, we can change each
smp_mb() within Thread A into calls to sys_membarrier() and each
smp_mb() within Thread B into compiler barriers "barrier()".

Before the change, we had, for each smp_mb() pairs:

Thread A                    Thread B
previous mem accesses       previous mem accesses
smp_mb()                    smp_mb()
following mem accesses      following mem accesses

After the change, these pairs become:

Thread A                    Thread B
prev mem accesses           prev mem accesses
sys_membarrier()            barrier()
follow mem accesses         follow mem accesses

As we can see, there are two possible scenarios: either Thread B memory
accesses do not happen concurrently with Thread A accesses (1), or they
do (2).

1) Non-concurrent Thread A vs Thread B accesses:

Thread A                    Thread B
prev mem accesses
sys_membarrier()
follow mem accesses
                            prev mem accesses
                            barrier()
                            follow mem accesses

In this case, thread B accesses will be weakly ordered. This is OK,
because at that point, thread A is not particularly interested in
ordering them with respect to its own accesses.

2) Concurrent Thread A vs Thread B accesses

Thread A                    Thread B
prev mem accesses           prev mem accesses
sys_membarrier()            barrier()
follow mem accesses         follow mem accesses

In this case, thread B accesses, which are ensured to be in program
order thanks to the compiler barrier, will be "upgraded" to full
smp_mb() by synchronize_sched().

* Benchmarks

On Intel Xeon E5405 (8 cores)
(one thread is calling sys_membarrier, the other 7 threads are busy
looping)

1000 non-expedited sys_membarrier calls in 33s =3D 33 milliseconds/call.

* User-space user of this system call: Userspace RCU library

Both the signal-based and the sys_membarrier userspace RCU schemes
permit us to remove the memory barrier from the userspace RCU
rcu_read_lock() and rcu_read_unlock() primitives, thus significantly
accelerating them. These memory barriers are replaced by compiler
barriers on the read-side, and all matching memory barriers on the
write-side are turned into an invocation of a memory barrier on all
active threads in the process. By letting the kernel perform this
synchronization rather than dumbly sending a signal to every process
threads (as we currently do), we diminish the number of unnecessary wake
ups and only issue the memory barriers on active threads. Non-running
threads do not need to execute such barrier anyway, because these are
implied by the scheduler context switches.

Results in liburcu:

Operations in 10s, 6 readers, 2 writers:

memory barriers in reader:    1701557485 reads, 2202847 writes
signal-based scheme:          9830061167 reads,    6700 writes
sys_membarrier:               9952759104 reads,     425 writes
sys_membarrier (dyn. check):  7970328887 reads,     425 writes

The dynamic sys_membarrier availability check adds some overhead to
the read-side compared to the signal-based scheme, but besides that,
sys_membarrier slightly outperforms the signal-based scheme. However,
this non-expedited sys_membarrier implementation has a much slower grace
period than signal and memory barrier schemes.

Besides diminishing the number of wake-ups, one major advantage of the
membarrier system call over the signal-based scheme is that it does not
need to reserve a signal. This plays much more nicely with libraries,
and with processes injected into for tracing purposes, for which we
cannot expect that signals will be unused by the application.

An expedited version of this system call can be added later on to speed
up the grace period. Its implementation will likely depend on reading
the cpu_curr()->mm without holding each CPU's rq lock.

This patch adds the system call to x86 and to asm-generic.

[1] http://urcu.so

membarrier(2) man page:

MEMBARRIER(2)              Linux Programmer's Manual             MEMBARRIER(2)

NAME
       membarrier - issue memory barriers on a set of threads

SYNOPSIS
       #include <linux/membarrier.h>

       int membarrier(int cmd, int flags);

DESCRIPTION
       The cmd argument is one of the following:

       MEMBARRIER_CMD_QUERY
              Query  the  set  of  supported commands. It returns a bitmask of
              supported commands.

       MEMBARRIER_CMD_SHARED
              Execute a memory barrier on all threads running on  the  system.
              Upon  return from system call, the caller thread is ensured that
              all running threads have passed through a state where all memory
              accesses  to  user-space  addresses  match program order between
              entry to and return from the system  call  (non-running  threads
              are de facto in such a state). This covers threads from all pro=E2=80=90
              cesses running on the system.  This command returns 0.

       The flags argument needs to be 0. For future extensions.

       All memory accesses performed  in  program  order  from  each  targeted
       thread is guaranteed to be ordered with respect to sys_membarrier(). If
       we use the semantic "barrier()" to represent a compiler barrier forcing
       memory  accesses  to  be performed in program order across the barrier,
       and smp_mb() to represent explicit memory barriers forcing full  memory
       ordering  across  the barrier, we have the following ordering table for
       each pair of barrier(), sys_membarrier() and smp_mb():

       The pair ordering is detailed as (O: ordered, X: not ordered):

                              barrier()   smp_mb() sys_membarrier()
              barrier()          X           X            O
              smp_mb()           X           O            O
              sys_membarrier()   O           O            O

RETURN VALUE
       On success, these system calls return zero.  On error, -1 is  returned,
       and errno is set appropriately. For a given command, with flags
       argument set to 0, this system call is guaranteed to always return the
       same value until reboot.

ERRORS
       ENOSYS System call is not implemented.

       EINVAL Invalid arguments.

Linux                             2015-04-15                     MEMBARRIER(2)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Nicholas Miell <nmiell@comcast.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Pranith Kumar <bobby.prani@gmail.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2015-09-11 15:21:34 -07:00

66 KiB

Raw Blame History

config ARCH string option env="ARCH"

config KERNELVERSION string option env="KERNELVERSION"

config DEFCONFIG_LIST string depends on !UML option defconfig_list default "/lib/modules/$UNAME_RELEASE/.config" default "/etc/kernel-config" default "/boot/config-$UNAME_RELEASE" default "$ARCH_DEFCONFIG" default "arch/$ARCH/defconfig"

config CONSTRUCTORS bool depends on !UML

config IRQ_WORK bool

config BUILDTIME_EXTABLE_SORT bool

menu "General setup"

config BROKEN bool

config BROKEN_ON_SMP bool depends on BROKEN || !SMP default y

config INIT_ENV_ARG_LIMIT int default 32 if !UML default 128 if UML help Maximum of each of the number of arguments and environment variables passed to init from the kernel command line.

config CROSS_COMPILE string "Cross-compiler tool prefix" help Same as running 'make CROSS_COMPILE=prefix-' but stored for default make runs in this kernel build directory. You don't need to set this unless you want the configured kernel build directory to select the cross-compiler automatically.

config COMPILE_TEST bool "Compile also drivers which will not load" default n help Some drivers can be compiled on a different platform than they are intended to be run on. Despite they cannot be loaded there (or even when they load they cannot be used due to missing HW support), developers still, opposing to distributors, might want to build such drivers to compile-test them.

  If you are a developer and want to build everything available, say Y
  here. If you are a user/distributor, say N here to exclude useless
  drivers to be distributed.

config LOCALVERSION string "Local version - append to kernel release" help Append an extra string to the end of your kernel version. This will show up when you type uname, for example. The string you set here will be appended after the contents of any files with a filename matching localversion* in your object and source tree, in that order. Your total string can be a maximum of 64 characters.

config LOCALVERSION_AUTO bool "Automatically append version information to the version string" default y help This will try to automatically determine if the current tree is a release tree by looking for git tags that belong to the current top of tree revision.

  A string of the format -gxxxxxxxx will be added to the localversion
  if a git-based tree is found.  The string generated by this will be
  appended after any matching localversion* files, and after the value
  set in CONFIG_LOCALVERSION.

  (The actual string used here is the first eight characters produced
  by running the command:

    $ git rev-parse --verify HEAD

  which is done within the script "scripts/setlocalversion".)

config HAVE_KERNEL_GZIP bool

config HAVE_KERNEL_BZIP2 bool

config HAVE_KERNEL_LZMA bool

config HAVE_KERNEL_XZ bool

config HAVE_KERNEL_LZO bool

config HAVE_KERNEL_LZ4 bool

choice prompt "Kernel compression mode" default KERNEL_GZIP depends on HAVE_KERNEL_GZIP || HAVE_KERNEL_BZIP2 || HAVE_KERNEL_LZMA || HAVE_KERNEL_XZ || HAVE_KERNEL_LZO || HAVE_KERNEL_LZ4 help The linux kernel is a kind of self-extracting executable. Several compression algorithms are available, which differ in efficiency, compression and decompression speed. Compression speed is only relevant when building a kernel. Decompression speed is relevant at each boot.

  If you have any problems with bzip2 or lzma compressed
  kernels, mail me (Alain Knaff) <alain@knaff.lu>. (An older
  version of this functionality (bzip2 only), for 2.4, was
  supplied by Christian Ludwig)

  High compression options are mostly useful for users, who
  are low on disk space (embedded systems), but for whom ram
  size matters less.

  If in doubt, select 'gzip'

config KERNEL_GZIP bool "Gzip" depends on HAVE_KERNEL_GZIP help The old and tried gzip compression. It provides a good balance between compression ratio and decompression speed.

config KERNEL_BZIP2 bool "Bzip2" depends on HAVE_KERNEL_BZIP2 help Its compression ratio and speed is intermediate. Decompression speed is slowest among the choices. The kernel size is about 10% smaller with bzip2, in comparison to gzip. Bzip2 uses a large amount of memory. For modern kernels you will need at least 8MB RAM or more for booting.

config KERNEL_LZMA bool "LZMA" depends on HAVE_KERNEL_LZMA help This compression algorithm's ratio is best. Decompression speed is between gzip and bzip2. Compression is slowest. The kernel size is about 33% smaller with LZMA in comparison to gzip.

config KERNEL_XZ bool "XZ" depends on HAVE_KERNEL_XZ help XZ uses the LZMA2 algorithm and instruction set specific BCJ filters which can improve compression ratio of executable code. The size of the kernel is about 30% smaller with XZ in comparison to gzip. On architectures for which there is a BCJ filter (i386, x86_64, ARM, IA-64, PowerPC, and SPARC), XZ will create a few percent smaller kernel than plain LZMA.

  The speed is about the same as with LZMA: The decompression
  speed of XZ is better than that of bzip2 but worse than gzip
  and LZO. Compression is slow.

config KERNEL_LZO bool "LZO" depends on HAVE_KERNEL_LZO help Its compression ratio is the poorest among the choices. The kernel size is about 10% bigger than gzip; however its speed (both compression and decompression) is the fastest.

config KERNEL_LZ4 bool "LZ4" depends on HAVE_KERNEL_LZ4 help LZ4 is an LZ77-type compressor with a fixed, byte-oriented encoding. A preliminary version of LZ4 de/compression tool is available at https://code.google.com/p/lz4/.

  Its compression ratio is worse than LZO. The size of the kernel
  is about 8% bigger than LZO. But the decompression speed is
  faster than LZO.

endchoice

config DEFAULT_HOSTNAME string "Default hostname" default "(none)" help This option determines the default system hostname before userspace calls sethostname(2). The kernel traditionally uses "(none)" here, but you may wish to use a different default here to make a minimal system more usable with less configuration.

config SWAP bool "Support for paging of anonymous memory (swap)" depends on MMU && BLOCK default y help This option allows you to choose whether you want to have support for so called swap devices or swap files in your kernel that are used to provide more virtual memory than the actual RAM present in your computer. If unsure say Y.

config SYSVIPC bool "System V IPC" ---help--- Inter Process Communication is a suite of library functions and system calls which let processes (running programs) synchronize and exchange information. It is generally considered to be a good thing, and some programs won't run unless you say Y here. In particular, if you want to run the DOS emulator dosemu under Linux (read the DOSEMU-HOWTO, available from http://www.tldp.org/docs.html#howto), you'll need to say Y here.

  You can find documentation about IPC with "info ipc" and also in
  section 6.4 of the Linux Programmer's Guide, available from
  <http://www.tldp.org/guides.html>.

config SYSVIPC_SYSCTL bool depends on SYSVIPC depends on SYSCTL default y

config POSIX_MQUEUE bool "POSIX Message Queues" depends on NET ---help--- POSIX variant of message queues is a part of IPC. In POSIX message queues every message has a priority which decides about succession of receiving it by a process. If you want to compile and run programs written e.g. for Solaris with use of its POSIX message queues (functions mq_*) say Y here.

  POSIX message queues are visible as a filesystem called 'mqueue'
  and can be mounted somewhere if you want to do filesystem
  operations on message queues.

  If unsure, say Y.

config POSIX_MQUEUE_SYSCTL bool depends on POSIX_MQUEUE depends on SYSCTL default y

config CROSS_MEMORY_ATTACH bool "Enable process_vm_readv/writev syscalls" depends on MMU default y help Enabling this option adds the system calls process_vm_readv and process_vm_writev which allow a process with the correct privileges to directly read from or write to another process' address space. See the man page for more details.

config FHANDLE bool "open by fhandle syscalls" select EXPORTFS help If you say Y here, a user level program will be able to map file names to handle and then later use the handle for different file system operations. This is useful in implementing userspace file servers, which now track files using handles instead of names. The handle would remain the same even if file names get renamed. Enables open_by_handle_at(2) and name_to_handle_at(2) syscalls.

config USELIB bool "uselib syscall" default y help This option enables the uselib syscall, a system call used in the dynamic linker from libc5 and earlier. glibc does not use this system call. If you intend to run programs built on libc5 or earlier, you may need to enable this syscall. Current systems running glibc can safely disable this.

config AUDIT bool "Auditing support" depends on NET help Enable auditing infrastructure that can be used with another kernel subsystem, such as SELinux (which requires this for logging of avc messages output). Does not do system-call auditing without CONFIG_AUDITSYSCALL.

config HAVE_ARCH_AUDITSYSCALL bool

config AUDITSYSCALL bool "Enable system-call auditing support" depends on AUDIT && HAVE_ARCH_AUDITSYSCALL default y if SECURITY_SELINUX help Enable low-overhead system-call auditing infrastructure that can be used independently or with another kernel subsystem, such as SELinux.

config AUDIT_WATCH def_bool y depends on AUDITSYSCALL select FSNOTIFY

config AUDIT_TREE def_bool y depends on AUDITSYSCALL select FSNOTIFY

source "kernel/irq/Kconfig" source "kernel/time/Kconfig"

menu "CPU/Task time and stats accounting"

config VIRT_CPU_ACCOUNTING bool

choice prompt "Cputime accounting" default TICK_CPU_ACCOUNTING if !PPC64 default VIRT_CPU_ACCOUNTING_NATIVE if PPC64

Kind of a stub config for the pure tick based cputime accounting

config TICK_CPU_ACCOUNTING bool "Simple tick based cputime accounting" depends on !S390 && !NO_HZ_FULL help This is the basic tick based cputime accounting that maintains statistics about user, system and idle time spent on per jiffies granularity.

  If unsure, say Y.

config VIRT_CPU_ACCOUNTING_NATIVE bool "Deterministic task and CPU time accounting" depends on HAVE_VIRT_CPU_ACCOUNTING && !NO_HZ_FULL select VIRT_CPU_ACCOUNTING help Select this option to enable more accurate task and CPU time accounting. This is done by reading a CPU counter on each kernel entry and exit and on transitions within the kernel between system, softirq and hardirq state, so there is a small performance impact. In the case of s390 or IBM POWER > 5, this also enables accounting of stolen time on logically-partitioned systems.

config VIRT_CPU_ACCOUNTING_GEN bool "Full dynticks CPU time accounting" depends on HAVE_CONTEXT_TRACKING depends on HAVE_VIRT_CPU_ACCOUNTING_GEN select VIRT_CPU_ACCOUNTING select CONTEXT_TRACKING help Select this option to enable task and CPU time accounting on full dynticks systems. This accounting is implemented by watching every kernel-user boundaries using the context tracking subsystem. The accounting is thus performed at the expense of some significant overhead.

  For now this is only useful if you are working on the full
  dynticks subsystem development.

  If unsure, say N.

config IRQ_TIME_ACCOUNTING bool "Fine granularity task level IRQ time accounting" depends on HAVE_IRQ_TIME_ACCOUNTING && !NO_HZ_FULL help Select this option to enable fine granularity task irq time accounting. This is done by reading a timestamp on each transitions between softirq and hardirq state, so there can be a small performance impact.

  If in doubt, say N here.

endchoice

config BSD_PROCESS_ACCT bool "BSD Process Accounting" depends on MULTIUSER help If you say Y here, a user level program will be able to instruct the kernel (via a special system call) to write process accounting information to a file: whenever a process exits, information about that process will be appended to the file by the kernel. The information includes things such as creation time, owning user, command name, memory usage, controlling terminal etc. (the complete list is in the struct acct in file:include/linux/acct.h). It is up to the user level program to do useful things with this information. This is generally a good idea, so say Y.

config BSD_PROCESS_ACCT_V3 bool "BSD Process Accounting version 3 file format" depends on BSD_PROCESS_ACCT default n help If you say Y here, the process accounting information is written in a new file format that also logs the process IDs of each process and it's parent. Note that this file format is incompatible with previous v0/v1/v2 file formats, so you will need updated tools for processing it. A preliminary version of these tools is available at http://www.gnu.org/software/acct/.

config TASKSTATS bool "Export task/process statistics through netlink" depends on NET depends on MULTIUSER default n help Export selected statistics for tasks/processes through the generic netlink interface. Unlike BSD process accounting, the statistics are available during the lifetime of tasks/processes as responses to commands. Like BSD accounting, they are sent to user space on task exit.

  Say N if unsure.

config TASK_DELAY_ACCT bool "Enable per-task delay accounting" depends on TASKSTATS select SCHED_INFO help Collect information on time spent by a task waiting for system resources like cpu, synchronous block I/O completion and swapping in pages. Such statistics can help in setting a task's priorities relative to other tasks for cpu, io, rss limits etc.

  Say N if unsure.

config TASK_XACCT bool "Enable extended accounting over taskstats" depends on TASKSTATS help Collect extended task accounting data and send the data to userland for processing over the taskstats interface.

  Say N if unsure.

config TASK_IO_ACCOUNTING bool "Enable per-task storage I/O accounting" depends on TASK_XACCT help Collect information on the number of bytes of storage I/O which this task has caused.

  Say N if unsure.

endmenu # "CPU/Task time and stats accounting"

menu "RCU Subsystem"

config TREE_RCU bool default y if !PREEMPT && SMP help This option selects the RCU implementation that is designed for very large SMP system with hundreds or thousands of CPUs. It also scales down nicely to smaller systems.

config PREEMPT_RCU bool default y if PREEMPT help This option selects the RCU implementation that is designed for very large SMP systems with hundreds or thousands of CPUs, but for which real-time response is also required. It also scales down nicely to smaller systems.

  Select this option if you are unsure.

config TINY_RCU bool default y if !PREEMPT && !SMP help This option selects the RCU implementation that is designed for UP systems from which real-time response is not required. This option greatly reduces the memory footprint of RCU.

config RCU_EXPERT bool "Make expert-level adjustments to RCU configuration" default n help This option needs to be enabled if you wish to make expert-level adjustments to RCU configuration. By default, no such adjustments can be made, which has the often-beneficial side-effect of preventing "make oldconfig" from asking you all sorts of detailed questions about how you would like numerous obscure RCU options to be set up.

  Say Y if you need to make expert-level adjustments to RCU.

  Say N if you are unsure.

config SRCU bool help This option selects the sleepable version of RCU. This version permits arbitrary sleeping or blocking within RCU read-side critical sections.

config TASKS_RCU bool default n select SRCU help This option enables a task-based RCU implementation that uses only voluntary context switch (not preemption!), idle, and user-mode execution as quiescent states.

config RCU_STALL_COMMON def_bool ( TREE_RCU || PREEMPT_RCU || RCU_TRACE ) help This option enables RCU CPU stall code that is common between the TINY and TREE variants of RCU. The purpose is to allow the tiny variants to disable RCU CPU stall warnings, while making these warnings mandatory for the tree variants.

config CONTEXT_TRACKING bool

config CONTEXT_TRACKING_FORCE bool "Force context tracking" depends on CONTEXT_TRACKING default y if !NO_HZ_FULL help The major pre-requirement for full dynticks to work is to support the context tracking subsystem. But there are also other dependencies to provide in order to make the full dynticks working.

  This option stands for testing when an arch implements the
  context tracking backend but doesn't yet fullfill all the
  requirements to make the full dynticks feature working.
  Without the full dynticks, there is no way to test the support
  for context tracking and the subsystems that rely on it: RCU
  userspace extended quiescent state and tickless cputime
  accounting. This option copes with the absence of the full
  dynticks subsystem by forcing the context tracking on all
  CPUs in the system.

  Say Y only if you're working on the development of an
  architecture backend for the context tracking.

  Say N otherwise, this option brings an overhead that you
  don't want in production.

config RCU_FANOUT int "Tree-based hierarchical RCU fanout value" range 2 64 if 64BIT range 2 32 if !64BIT depends on (TREE_RCU || PREEMPT_RCU) && RCU_EXPERT default 64 if 64BIT default 32 if !64BIT help This option controls the fanout of hierarchical implementations of RCU, allowing RCU to work efficiently on machines with large numbers of CPUs. This value must be at least the fourth root of NR_CPUS, which allows NR_CPUS to be insanely large. The default value of RCU_FANOUT should be used for production systems, but if you are stress-testing the RCU implementation itself, small RCU_FANOUT values allow you to test large-system code paths on small(er) systems.

  Select a specific number if testing RCU itself.
  Take the default if unsure.

config RCU_FANOUT_LEAF int "Tree-based hierarchical RCU leaf-level fanout value" range 2 64 if 64BIT range 2 32 if !64BIT depends on (TREE_RCU || PREEMPT_RCU) && RCU_EXPERT default 16 help This option controls the leaf-level fanout of hierarchical implementations of RCU, and allows trading off cache misses against lock contention. Systems that synchronize their scheduling-clock interrupts for energy-efficiency reasons will want the default because the smaller leaf-level fanout keeps lock contention levels acceptably low. Very large systems (hundreds or thousands of CPUs) will instead want to set this value to the maximum value possible in order to reduce the number of cache misses incurred during RCU's grace-period initialization. These systems tend to run CPU-bound, and thus are not helped by synchronized interrupts, and thus tend to skew them, which reduces lock contention enough that large leaf-level fanouts work well.

  Select a specific number if testing RCU itself.

  Select the maximum permissible value for large systems.

  Take the default if unsure.

config RCU_FAST_NO_HZ bool "Accelerate last non-dyntick-idle CPU's grace periods" depends on NO_HZ_COMMON && SMP && RCU_EXPERT default n help This option permits CPUs to enter dynticks-idle state even if they have RCU callbacks queued, and prevents RCU from waking these CPUs up more than roughly once every four jiffies (by default, you can adjust this using the rcutree.rcu_idle_gp_delay parameter), thus improving energy efficiency. On the other hand, this option increases the duration of RCU grace periods, for example, slowing down synchronize_rcu().

  Say Y if energy efficiency is critically important, and you
  	don't care about increased grace-period durations.

  Say N if you are unsure.

config TREE_RCU_TRACE def_bool RCU_TRACE && ( TREE_RCU || PREEMPT_RCU ) select DEBUG_FS help This option provides tracing for the TREE_RCU and PREEMPT_RCU implementations, permitting Makefile to trivially select kernel/rcutree_trace.c.

config RCU_BOOST bool "Enable RCU priority boosting" depends on RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT default n help This option boosts the priority of preempted RCU readers that block the current preemptible RCU grace period for too long. This option also prevents heavy loads from blocking RCU callback invocation for all flavors of RCU.

  Say Y here if you are working with real-time apps or heavy loads
  Say N here if you are unsure.

config RCU_KTHREAD_PRIO int "Real-time priority to use for RCU worker threads" range 1 99 if RCU_BOOST range 0 99 if !RCU_BOOST default 1 if RCU_BOOST default 0 if !RCU_BOOST depends on RCU_EXPERT help This option specifies the SCHED_FIFO priority value that will be assigned to the rcuc/n and rcub/n threads and is also the value used for RCU_BOOST (if enabled). If you are working with a real-time application that has one or more CPU-bound threads running at a real-time priority level, you should set RCU_KTHREAD_PRIO to a priority higher than the highest-priority real-time CPU-bound application thread. The default RCU_KTHREAD_PRIO value of 1 is appropriate in the common case, which is real-time applications that do not have any CPU-bound threads.

  Some real-time applications might not have a single real-time
  thread that saturates a given CPU, but instead might have
  multiple real-time threads that, taken together, fully utilize
  that CPU.  In this case, you should set RCU_KTHREAD_PRIO to
  a priority higher than the lowest-priority thread that is
  conspiring to prevent the CPU from running any non-real-time
  tasks.  For example, if one thread at priority 10 and another
  thread at priority 5 are between themselves fully consuming
  the CPU time on a given CPU, then RCU_KTHREAD_PRIO should be
  set to priority 6 or higher.

  Specify the real-time priority, or take the default if unsure.

config RCU_BOOST_DELAY int "Milliseconds to delay boosting after RCU grace-period start" range 0 3000 depends on RCU_BOOST default 500 help This option specifies the time to wait after the beginning of a given grace period before priority-boosting preempted RCU readers blocking that grace period. Note that any RCU reader blocking an expedited RCU grace period is boosted immediately.

  Accept the default if unsure.

config RCU_NOCB_CPU bool "Offload RCU callback processing from boot-selected CPUs" depends on TREE_RCU || PREEMPT_RCU depends on RCU_EXPERT || NO_HZ_FULL default n help Use this option to reduce OS jitter for aggressive HPC or real-time workloads. It can also be used to offload RCU callback invocation to energy-efficient CPUs in battery-powered asymmetric multiprocessors.

  This option offloads callback invocation from the set of
  CPUs specified at boot time by the rcu_nocbs parameter.
  For each such CPU, a kthread ("rcuox/N") will be created to
  invoke callbacks, where the "N" is the CPU being offloaded,
  and where the "x" is "b" for RCU-bh, "p" for RCU-preempt, and
  "s" for RCU-sched.  Nothing prevents this kthread from running
  on the specified CPUs, but (1) the kthreads may be preempted
  between each callback, and (2) affinity or cgroups can be used
  to force the kthreads to run on whatever set of CPUs is desired.

  Say Y here if you want to help to debug reduced OS jitter.
  Say N here if you are unsure.

choice prompt "Build-forced no-CBs CPUs" default RCU_NOCB_CPU_NONE depends on RCU_NOCB_CPU help This option allows no-CBs CPUs (whose RCU callbacks are invoked from kthreads rather than from softirq context) to be specified at build time. Additional no-CBs CPUs may be specified by the rcu_nocbs= boot parameter.

config RCU_NOCB_CPU_NONE bool "No build_forced no-CBs CPUs" help This option does not force any of the CPUs to be no-CBs CPUs. Only CPUs designated by the rcu_nocbs= boot parameter will be no-CBs CPUs, whose RCU callbacks will be invoked by per-CPU kthreads whose names begin with "rcuo". All other CPUs will invoke their own RCU callbacks in softirq context.

  Select this option if you want to choose no-CBs CPUs at
  boot time, for example, to allow testing of different no-CBs
  configurations without having to rebuild the kernel each time.

config RCU_NOCB_CPU_ZERO bool "CPU 0 is a build_forced no-CBs CPU" help This option forces CPU 0 to be a no-CBs CPU, so that its RCU callbacks are invoked by a per-CPU kthread whose name begins with "rcuo". Additional CPUs may be designated as no-CBs CPUs using the rcu_nocbs= boot parameter will be no-CBs CPUs. All other CPUs will invoke their own RCU callbacks in softirq context.

  Select this if CPU 0 needs to be a no-CBs CPU for real-time
  or energy-efficiency reasons, but the real reason it exists
  is to ensure that randconfig testing covers mixed systems.

config RCU_NOCB_CPU_ALL bool "All CPUs are build_forced no-CBs CPUs" help This option forces all CPUs to be no-CBs CPUs. The rcu_nocbs= boot parameter will be ignored. All CPUs' RCU callbacks will be executed in the context of per-CPU rcuo kthreads created for this purpose. Assuming that the kthreads whose names start with "rcuo" are bound to "housekeeping" CPUs, this reduces OS jitter on the remaining CPUs, but might decrease memory locality during RCU-callback invocation, thus potentially degrading throughput.

  Select this if all CPUs need to be no-CBs CPUs for real-time
  or energy-efficiency reasons.

endchoice

config RCU_EXPEDITE_BOOT bool default n help This option enables expedited grace periods at boot time, as if rcu_expedite_gp() had been invoked early in boot. The corresponding rcu_unexpedite_gp() is invoked from rcu_end_inkernel_boot(), which is intended to be invoked at the end of the kernel-only boot sequence, just before init is exec'ed.

  Accept the default if unsure.

endmenu # "RCU Subsystem"

config BUILD_BIN2C bool default n

config IKCONFIG tristate "Kernel .config support" select BUILD_BIN2C ---help--- This option enables the complete Linux kernel ".config" file contents to be saved in the kernel. It provides documentation of which kernel options are used in a running kernel or in an on-disk kernel. This information can be extracted from the kernel image file with the script scripts/extract-ikconfig and used as input to rebuild the current kernel or to build another kernel. It can also be extracted from a running kernel by reading /proc/config.gz if enabled (below).

config IKCONFIG_PROC bool "Enable access to .config through /proc/config.gz" depends on IKCONFIG && PROC_FS ---help--- This option enables access to the kernel configuration file through /proc/config.gz.

config LOG_BUF_SHIFT int "Kernel log buffer size (16 => 64KB, 17 => 128KB)" range 12 25 default 17 depends on PRINTK help Select the minimal kernel log buffer size as a power of 2. The final size is affected by LOG_CPU_MAX_BUF_SHIFT config parameter, see below. Any higher size also might be forced by "log_buf_len" boot parameter.

  Examples:
	     17 => 128 KB
	     16 => 64 KB
	     15 => 32 KB
	     14 => 16 KB
	     13 =>  8 KB
	     12 =>  4 KB

config LOG_CPU_MAX_BUF_SHIFT int "CPU kernel log buffer size contribution (13 => 8 KB, 17 => 128KB)" depends on SMP range 0 21 default 12 if !BASE_SMALL default 0 if BASE_SMALL depends on PRINTK help This option allows to increase the default ring buffer size according to the number of CPUs. The value defines the contribution of each CPU as a power of 2. The used space is typically only few lines however it might be much more when problems are reported, e.g. backtraces.

  The increased size means that a new buffer has to be allocated and
  the original static one is unused. It makes sense only on systems
  with more CPUs. Therefore this value is used only when the sum of
  contributions is greater than the half of the default kernel ring
  buffer as defined by LOG_BUF_SHIFT. The default values are set
  so that more than 64 CPUs are needed to trigger the allocation.

  Also this option is ignored when "log_buf_len" kernel parameter is
  used as it forces an exact (power of two) size of the ring buffer.

  The number of possible CPUs is used for this computation ignoring
  hotplugging making the compuation optimal for the the worst case
  scenerio while allowing a simple algorithm to be used from bootup.

  Examples shift values and their meaning:
	     17 => 128 KB for each CPU
	     16 =>  64 KB for each CPU
	     15 =>  32 KB for each CPU
	     14 =>  16 KB for each CPU
	     13 =>   8 KB for each CPU
	     12 =>   4 KB for each CPU

Architectures with an unreliable sched_clock() should select this:

config HAVE_UNSTABLE_SCHED_CLOCK bool

config GENERIC_SCHED_CLOCK bool

For architectures that want to enable the support for NUMA-affine scheduler

balancing logic:

config ARCH_SUPPORTS_NUMA_BALANCING bool

For architectures that prefer to flush all TLBs after a number of pages

are unmapped instead of sending one IPI per page to flush. The architecture

must provide guarantees on what happens if a clean TLB cache entry is

written after the unmap. Details are in mm/rmap.c near the check for

should_defer_flush. The architecture should also consider if the full flush

and the refill costs are offset by the savings of sending fewer IPIs.

config ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH bool

For architectures that know their GCC __int128 support is sound

config ARCH_SUPPORTS_INT128 bool

For architectures that (ab)use NUMA to represent different memory regions

all cpu-local but of different latencies, such as SuperH.

config ARCH_WANT_NUMA_VARIABLE_LOCALITY bool

config NUMA_BALANCING bool "Memory placement aware NUMA scheduler" depends on ARCH_SUPPORTS_NUMA_BALANCING depends on !ARCH_WANT_NUMA_VARIABLE_LOCALITY depends on SMP && NUMA && MIGRATION help This option adds support for automatic NUMA aware memory/task placement. The mechanism is quite primitive and is based on migrating memory when it has references to the node the task is running on.

  This system will be inactive on UMA systems.

config NUMA_BALANCING_DEFAULT_ENABLED bool "Automatically enable NUMA aware memory/task placement" default y depends on NUMA_BALANCING help If set, automatic NUMA balancing will be enabled if running on a NUMA machine.

menuconfig CGROUPS bool "Control Group support" select KERNFS help This option adds support for grouping sets of processes together, for use with process control subsystems such as Cpusets, CFS, memory controls or device isolation. See - Documentation/scheduler/sched-design-CFS.txt (CFS) - Documentation/cgroups/ (features for grouping, isolation and resource control)

  Say N if unsure.

if CGROUPS

config CGROUP_DEBUG bool "Example debug cgroup subsystem" default n help This option enables a simple cgroup subsystem that exports useful debugging information about the cgroups framework.

  Say N if unsure.

config CGROUP_FREEZER bool "Freezer cgroup subsystem" help Provides a way to freeze and unfreeze all tasks in a cgroup.

config CGROUP_PIDS bool "PIDs cgroup subsystem" help Provides enforcement of process number limits in the scope of a cgroup. Any attempt to fork more processes than is allowed in the cgroup will fail. PIDs are fundamentally a global resource because it is fairly trivial to reach PID exhaustion before you reach even a conservative kmemcg limit. As a result, it is possible to grind a system to halt without being limited by other cgroup policies. The PIDs cgroup subsystem is designed to stop this from happening.

  It should be noted that organisational operations (such as attaching
  to a cgroup hierarchy will *not* be blocked by the PIDs subsystem),
  since the PIDs limit only affects a process's ability to fork, not to
  attach to a cgroup.

config CGROUP_DEVICE bool "Device controller for cgroups" help Provides a cgroup implementing whitelists for devices which a process in the cgroup can mknod or open.

config CPUSETS bool "Cpuset support" help This option will let you create and manage CPUSETs which allow dynamically partitioning a system into sets of CPUs and Memory Nodes and assigning tasks to run only within those sets. This is primarily useful on large SMP or NUMA systems.

  Say N if unsure.

config PROC_PID_CPUSET bool "Include legacy /proc//cpuset file" depends on CPUSETS default y

config CGROUP_CPUACCT bool "Simple CPU accounting cgroup subsystem" help Provides a simple Resource Controller for monitoring the total CPU consumed by the tasks in a cgroup.

config PAGE_COUNTER bool

config MEMCG bool "Memory Resource Controller for Control Groups" select PAGE_COUNTER select EVENTFD help Provides a memory resource controller that manages both anonymous memory and page cache. (See Documentation/cgroups/memory.txt)

config MEMCG_SWAP bool "Memory Resource Controller Swap Extension" depends on MEMCG && SWAP help Add swap management feature to memory resource controller. When you enable this, you can limit mem+swap usage per cgroup. In other words, when you disable this, memory resource controller has no cares to usage of swap...a process can exhaust all of the swap. This extension is useful when you want to avoid exhaustion swap but this itself adds more overheads and consumes memory for remembering information. Especially if you use 32bit system or small memory system, please be careful about enabling this. When memory resource controller is disabled by boot option, this will be automatically disabled and there will be no overhead from this. Even when you set this config=y, if boot option "swapaccount=0" is set, swap will not be accounted. Now, memory usage of swap_cgroup is 2 bytes per entry. If swap page size is 4096bytes, 512k per 1Gbytes of swap. config MEMCG_SWAP_ENABLED bool "Memory Resource Controller Swap Extension enabled by default" depends on MEMCG_SWAP default y help Memory Resource Controller Swap Extension comes with its price in a bigger memory consumption. General purpose distribution kernels which want to enable the feature but keep it disabled by default and let the user enable it by swapaccount=1 boot command line parameter should have this option unselected. For those who want to have the feature enabled by default should select this option (if, for some reason, they need to disable it then swapaccount=0 does the trick). config MEMCG_KMEM bool "Memory Resource Controller Kernel Memory accounting" depends on MEMCG depends on SLUB || SLAB help The Kernel Memory extension for Memory Resource Controller can limit the amount of memory used by kernel objects in the system. Those are fundamentally different from the entities handled by the standard Memory Controller, which are page-based, and can be swapped. Users of the kmem extension can use it to guarantee that no group of processes will ever exhaust kernel resources alone.

config CGROUP_HUGETLB bool "HugeTLB Resource Controller for Control Groups" depends on HUGETLB_PAGE select PAGE_COUNTER default n help Provides a cgroup Resource Controller for HugeTLB pages. When you enable this, you can put a per cgroup limit on HugeTLB usage. The limit is enforced during page fault. Since HugeTLB doesn't support page reclaim, enforcing the limit at page fault time implies that, the application will get SIGBUS signal if it tries to access HugeTLB pages beyond its limit. This requires the application to know beforehand how much HugeTLB pages it would require for its use. The control group is tracked in the third page lru pointer. This means that we cannot use the controller with huge page less than 3 pages.

config CGROUP_PERF bool "Enable perf_event per-cpu per-container group (cgroup) monitoring" depends on PERF_EVENTS && CGROUPS help This option extends the per-cpu mode to restrict monitoring to threads which belong to the cgroup specified and run on the designated cpu.

  Say N if unsure.

menuconfig CGROUP_SCHED bool "Group CPU scheduler" default n help This feature lets CPU scheduler recognize task groups and control CPU bandwidth allocation to such task groups. It uses cgroups to group tasks.

if CGROUP_SCHED config FAIR_GROUP_SCHED bool "Group scheduling for SCHED_OTHER" depends on CGROUP_SCHED default CGROUP_SCHED

config CFS_BANDWIDTH bool "CPU bandwidth provisioning for FAIR_GROUP_SCHED" depends on FAIR_GROUP_SCHED default n help This option allows users to define CPU bandwidth rates (limits) for tasks running within the fair group scheduler. Groups with no limit set are considered to be unconstrained and will run with no restriction. See tip/Documentation/scheduler/sched-bwc.txt for more information.

config RT_GROUP_SCHED bool "Group scheduling for SCHED_RR/FIFO" depends on CGROUP_SCHED default n help This feature lets you explicitly allocate real CPU bandwidth to task groups. If enabled, it will also make it impossible to schedule realtime tasks for non-root users until you allocate realtime bandwidth for them. See Documentation/scheduler/sched-rt-group.txt for more information.

endif #CGROUP_SCHED

config BLK_CGROUP bool "Block IO controller" depends on BLOCK default n ---help--- Generic block IO controller cgroup interface. This is the common cgroup interface which should be used by various IO controlling policies.

Currently, CFQ IO scheduler uses it to recognize task groups and
control disk bandwidth allocation (proportional time slice allocation)
to such task groups. It is also used by bio throttling logic in
block layer to implement upper limit in IO rates on a device.

This option only enables generic Block IO controller infrastructure.
One needs to also enable actual IO controlling logic/policy. For
enabling proportional weight division of disk bandwidth in CFQ, set
CONFIG_CFQ_GROUP_IOSCHED=y; for enabling throttling policy, set
CONFIG_BLK_DEV_THROTTLING=y.

See Documentation/cgroups/blkio-controller.txt for more information.

config DEBUG_BLK_CGROUP bool "Enable Block IO controller debugging" depends on BLK_CGROUP default n ---help--- Enable some debugging help. Currently it exports additional stat files in a cgroup which can be useful for debugging.

config CGROUP_WRITEBACK bool depends on MEMCG && BLK_CGROUP default y

endif # CGROUPS

config CHECKPOINT_RESTORE bool "Checkpoint/restore support" if EXPERT select PROC_CHILDREN default n help Enables additional kernel features in a sake of checkpoint/restore. In particular it adds auxiliary prctl codes to setup process text, data and heap segment sizes, and a few additional /proc filesystem entries.

  If unsure, say N here.

menuconfig NAMESPACES bool "Namespaces support" if EXPERT depends on MULTIUSER default !EXPERT help Provides the way to make tasks work with different objects using the same id. For example same IPC id may refer to different objects or same user id or pid may refer to different tasks when used in different namespaces.

if NAMESPACES

config UTS_NS bool "UTS namespace" default y help In this namespace tasks see different info provided with the uname() system call

config IPC_NS bool "IPC namespace" depends on (SYSVIPC || POSIX_MQUEUE) default y help In this namespace tasks work with IPC ids which correspond to different IPC objects in different namespaces.

config USER_NS bool "User namespace" default n help This allows containers, i.e. vservers, to use user namespaces to provide different user info for different servers.

  When user namespaces are enabled in the kernel it is
  recommended that the MEMCG and MEMCG_KMEM options also be
  enabled and that user-space use the memory control groups to
  limit the amount of memory a memory unprivileged users can
  use.

  If unsure, say N.

config PID_NS bool "PID Namespaces" default y help Support process id namespaces. This allows having multiple processes with the same pid as long as they are in different pid namespaces. This is a building block of containers.

config NET_NS bool "Network namespace" depends on NET default y help Allow user space to create what appear to be multiple instances of the network stack.

endif # NAMESPACES

config SCHED_AUTOGROUP bool "Automatic process group scheduling" select CGROUPS select CGROUP_SCHED select FAIR_GROUP_SCHED help This option optimizes the scheduler for common desktop workloads by automatically creating and populating task groups. This separation of workloads isolates aggressive CPU burners (like build jobs) from desktop applications. Task group autogeneration is currently based upon task session.

config SYSFS_DEPRECATED bool "Enable deprecated sysfs features to support old userspace tools" depends on SYSFS default n help This option adds code that switches the layout of the "block" class devices, to not show up in /sys/class/block/, but only in /sys/block/.

  This switch is only active when the sysfs.deprecated=1 boot option is
  passed or the SYSFS_DEPRECATED_V2 option is set.

  This option allows new kernels to run on old distributions and tools,
  which might get confused by /sys/class/block/. Since 2007/2008 all
  major distributions and tools handle this just fine.

  Recent distributions and userspace tools after 2009/2010 depend on
  the existence of /sys/class/block/, and will not work with this
  option enabled.

  Only if you are using a new kernel on an old distribution, you might
  need to say Y here.

config SYSFS_DEPRECATED_V2 bool "Enable deprecated sysfs features by default" default n depends on SYSFS depends on SYSFS_DEPRECATED help Enable deprecated sysfs by default.

  See the CONFIG_SYSFS_DEPRECATED option for more details about this
  option.

  Only if you are using a new kernel on an old distribution, you might
  need to say Y here. Even then, odds are you would not need it
  enabled, you can always pass the boot option if absolutely necessary.

config RELAY bool "Kernel->user space relay support (formerly relayfs)" help This option enables support for relay interface support in certain file systems (such as debugfs). It is designed to provide an efficient mechanism for tools and facilities to relay large amounts of data from kernel space to user space.

  If unsure, say N.

config BLK_DEV_INITRD bool "Initial RAM filesystem and RAM disk (initramfs/initrd) support" depends on BROKEN || !FRV help The initial RAM filesystem is a ramfs which is loaded by the boot loader (loadlin or lilo) and that is mounted as root before the normal boot procedure. It is typically used to load modules needed to mount the "real" root file system, etc. See file:Documentation/initrd.txt for details.

  If RAM disk support (BLK_DEV_RAM) is also included, this
  also enables initial RAM disk (initrd) support and adds
  15 Kbytes (more on some other architectures) to the kernel size.

  If unsure say Y.

if BLK_DEV_INITRD

source "usr/Kconfig"

endif

config CC_OPTIMIZE_FOR_SIZE bool "Optimize for size" help Enabling this option will pass "-Os" instead of "-O2" to your compiler resulting in a smaller kernel.

  If unsure, say N.

config SYSCTL bool

config ANON_INODES bool

config HAVE_UID16 bool

config SYSCTL_EXCEPTION_TRACE bool help Enable support for /proc/sys/debug/exception-trace.

config SYSCTL_ARCH_UNALIGN_NO_WARN bool help Enable support for /proc/sys/kernel/ignore-unaligned-usertrap Allows arch to define/use @no_unaligned_warning to possibly warn about unaligned access emulation going on under the hood.

config SYSCTL_ARCH_UNALIGN_ALLOW bool help Enable support for /proc/sys/kernel/unaligned-trap Allows arches to define/use @unaligned_enabled to runtime toggle the unaligned access emulation. see arch/parisc/kernel/unaligned.c for reference

config HAVE_PCSPKR_PLATFORM bool

interpreter that classic socket filters depend on

config BPF bool

menuconfig EXPERT bool "Configure standard kernel features (expert users)" # Unhide debug options, to make the on-by-default options visible select DEBUG_KERNEL help This option allows certain base kernel options and settings to be disabled or tweaked. This is for specialized environments which can tolerate a "non-standard" kernel. Only use this if you really know what you are doing.

config UID16 bool "Enable 16-bit UID system calls" if EXPERT depends on HAVE_UID16 && MULTIUSER default y help This enables the legacy 16-bit UID syscall wrappers.

config MULTIUSER bool "Multiple users, groups and capabilities support" if EXPERT default y help This option enables support for non-root users, groups and capabilities.

  If you say N here, all processes will run with UID 0, GID 0, and all
  possible capabilities.  Saying N here also compiles out support for
  system calls related to UIDs, GIDs, and capabilities, such as setuid,
  setgid, and capset.

  If unsure, say Y here.

config SGETMASK_SYSCALL bool "sgetmask/ssetmask syscalls support" if EXPERT def_bool PARISC || MN10300 || BLACKFIN || M68K || PPC || MIPS || X86 || SPARC || CRIS || MICROBLAZE || SUPERH ---help--- sys_sgetmask and sys_ssetmask are obsolete system calls no longer supported in libc but still enabled by default in some architectures.

  If unsure, leave the default option here.

config SYSFS_SYSCALL bool "Sysfs syscall support" if EXPERT default y ---help--- sys_sysfs is an obsolete system call no longer supported in libc. Note that disabling this option is more secure but might break compatibility with some systems.

  If unsure say Y here.

config SYSCTL_SYSCALL bool "Sysctl syscall support" if EXPERT depends on PROC_SYSCTL default n select SYSCTL ---help--- sys_sysctl uses binary paths that have been found challenging to properly maintain and use. The interface in /proc/sys using paths with ascii names is now the primary path to this information.

  Almost nothing using the binary sysctl interface so if you are
  trying to save some space it is probably safe to disable this,
  making your kernel marginally smaller.

  If unsure say N here.

config KALLSYMS bool "Load all symbols for debugging/ksymoops" if EXPERT default y help Say Y here to let the kernel print out symbolic crash information and symbolic stack backtraces. This increases the size of the kernel somewhat, as all symbols have to be loaded into the kernel image.

config KALLSYMS_ALL bool "Include all symbols in kallsyms" depends on DEBUG_KERNEL && KALLSYMS help Normally kallsyms only contains the symbols of functions for nicer OOPS messages and backtraces (i.e., symbols from the text and inittext sections). This is sufficient for most cases. And only in very rare cases (e.g., when a debugger is used) all symbols are required (e.g., names of variables from the data sections, etc).

   This option makes sure that all symbols are loaded into the kernel
   image (i.e., symbols from all sections) in cost of increased kernel
   size (depending on the kernel configuration, it may be 300KiB or
   something like this).

   Say N unless you really need all symbols.

config PRINTK default y bool "Enable support for printk" if EXPERT select IRQ_WORK help This option enables normal printk support. Removing it eliminates most of the message strings from the kernel image and makes the kernel more or less silent. As this makes it very difficult to diagnose system problems, saying N here is strongly discouraged.

config BUG bool "BUG() support" if EXPERT default y help Disabling this option eliminates support for BUG and WARN, reducing the size of your kernel image and potentially quietly ignoring numerous fatal conditions. You should only consider disabling this option for embedded systems with no facilities for reporting errors. Just say Y.

config ELF_CORE depends on COREDUMP default y bool "Enable ELF core dumps" if EXPERT help Enable support for generating core dumps. Disabling saves about 4k.

config PCSPKR_PLATFORM bool "Enable PC-Speaker support" if EXPERT depends on HAVE_PCSPKR_PLATFORM select I8253_LOCK default y help This option allows to disable the internal PC-Speaker support, saving some memory.

config BASE_FULL default y bool "Enable full-sized data structures for core" if EXPERT help Disabling this option reduces the size of miscellaneous core kernel data structures. This saves memory on small machines, but may reduce performance.

config FUTEX bool "Enable futex support" if EXPERT default y select RT_MUTEXES help Disabling this option will cause the kernel to be built without support for "fast userspace mutexes". The resulting kernel may not run glibc-based applications correctly.

config HAVE_FUTEX_CMPXCHG bool depends on FUTEX help Architectures should select this if futex_atomic_cmpxchg_inatomic() is implemented and always working. This removes a couple of runtime checks.

config EPOLL bool "Enable eventpoll support" if EXPERT default y select ANON_INODES help Disabling this option will cause the kernel to be built without support for epoll family of system calls.

config SIGNALFD bool "Enable signalfd() system call" if EXPERT select ANON_INODES default y help Enable the signalfd() system call that allows to receive signals on a file descriptor.

  If unsure, say Y.

config TIMERFD bool "Enable timerfd() system call" if EXPERT select ANON_INODES default y help Enable the timerfd() system call that allows to receive timer events on a file descriptor.

  If unsure, say Y.

config EVENTFD bool "Enable eventfd() system call" if EXPERT select ANON_INODES default y help Enable the eventfd() system call that allows to receive both kernel notification (ie. KAIO) or userspace notifications.

  If unsure, say Y.

syscall, maps, verifier

config BPF_SYSCALL bool "Enable bpf() system call" select ANON_INODES select BPF default n help Enable the bpf() system call that allows to manipulate eBPF programs and maps via file descriptors.

config SHMEM bool "Use full shmem filesystem" if EXPERT default y depends on MMU help The shmem is an internal filesystem used to manage shared memory. It is backed by swap and manages resource limits. It is also exported to userspace as tmpfs if TMPFS is enabled. Disabling this option replaces shmem and tmpfs with the much simpler ramfs code, which may be appropriate on small systems without swap.

config AIO bool "Enable AIO support" if EXPERT default y help This option enables POSIX asynchronous I/O which may by used by some high performance threaded applications. Disabling this option saves about 7k.

config ADVISE_SYSCALLS bool "Enable madvise/fadvise syscalls" if EXPERT default y help This option enables the madvise and fadvise syscalls, used by applications to advise the kernel about their future memory or file usage, improving performance. If building an embedded system where no applications use these syscalls, you can disable this option to save space.

config USERFAULTFD bool "Enable userfaultfd() system call" select ANON_INODES depends on MMU help Enable the userfaultfd() system call that allows to intercept and handle page faults in userland.

config PCI_QUIRKS default y bool "Enable PCI quirk workarounds" if EXPERT depends on PCI help This enables workarounds for various PCI chipset bugs/quirks. Disable this only if your target machine is unaffected by PCI quirks.

config MEMBARRIER bool "Enable membarrier() system call" if EXPERT default y help Enable the membarrier() system call that allows issuing memory barriers across all running threads, which can be used to distribute the cost of user-space memory barriers asymmetrically by transforming pairs of memory barriers into pairs consisting of membarrier() and a compiler barrier.

  If unsure, say Y.

config EMBEDDED bool "Embedded system" option allnoconfig_y select EXPERT help This option should be enabled if compiling the kernel for an embedded system so certain expert options are available for configuration.

config HAVE_PERF_EVENTS bool help See tools/perf/design.txt for details.

config PERF_USE_VMALLOC bool help See tools/perf/design.txt for details

menu "Kernel Performance Events And Counters"

config PERF_EVENTS bool "Kernel performance events and counters" default y if PROFILING depends on HAVE_PERF_EVENTS select ANON_INODES select IRQ_WORK select SRCU help Enable kernel support for various performance events provided by software and hardware.

  Software events are supported either built-in or via the
  use of generic tracepoints.

  Most modern CPUs support performance events via performance
  counter registers. These registers count the number of certain
  types of hw events: such as instructions executed, cachemisses
  suffered, or branches mis-predicted - without slowing down the
  kernel or applications. These registers can also trigger interrupts
  when a threshold number of events have passed - and can thus be
  used to profile the code that runs on that CPU.

  The Linux Performance Event subsystem provides an abstraction of
  these software and hardware event capabilities, available via a
  system call and used by the "perf" utility in tools/perf/. It
  provides per task and per CPU counters, and it provides event
  capabilities on top of those.

  Say Y if unsure.

config DEBUG_PERF_USE_VMALLOC default n bool "Debug: use vmalloc to back perf mmap() buffers" depends on PERF_EVENTS && DEBUG_KERNEL && !PPC select PERF_USE_VMALLOC help Use vmalloc memory to back perf mmap() buffers.

 Mostly useful for debugging the vmalloc code on platforms
 that don't require it.

 Say N if unsure.

endmenu

config VM_EVENT_COUNTERS default y bool "Enable VM event counters for /proc/vmstat" if EXPERT help VM event counters are needed for event counts to be shown. This option allows the disabling of the VM event counters on EXPERT systems. /proc/vmstat will only show page counts if VM event counters are disabled.

config SLUB_DEBUG default y bool "Enable SLUB debugging support" if EXPERT depends on SLUB && SYSFS help SLUB has extensive debug support features. Disabling these can result in significant savings in code size. This also disables SLUB sysfs support. /sys/slab will not exist and there will be no support for cache validation etc.

config COMPAT_BRK bool "Disable heap randomization" default y help Randomizing heap placement makes heap exploits harder, but it also breaks ancient binaries (including anything libc5 based). This option changes the bootup default to heap randomization disabled, and can be overridden at runtime by setting /proc/sys/kernel/randomize_va_space to 2.

  On non-ancient distros (post-2000 ones) N is usually a safe choice.

choice prompt "Choose SLAB allocator" default SLUB help This option allows to select a slab allocator.

config SLAB bool "SLAB" help The regular slab allocator that is established and known to work well in all environments. It organizes cache hot objects in per cpu and per node queues.

config SLUB bool "SLUB (Unqueued Allocator)" help SLUB is a slab allocator that minimizes cache line usage instead of managing queues of cached objects (SLAB approach). Per cpu caching is realized using slabs of objects instead of queues of objects. SLUB can use memory efficiently and has enhanced diagnostics. SLUB is the default choice for a slab allocator.

config SLOB depends on EXPERT bool "SLOB (Simple Allocator)" help SLOB replaces the stock allocator with a drastically simpler allocator. SLOB is generally more space efficient but does not perform as well on large systems.

endchoice

config SLUB_CPU_PARTIAL default y depends on SLUB && SMP bool "SLUB per cpu partial cache" help Per cpu partial caches accellerate objects allocation and freeing that is local to a processor at the price of more indeterminism in the latency of the free. On overflow these caches will be cleared which requires the taking of locks that may cause latency spikes. Typically one would choose no for a realtime system.

config MMAP_ALLOW_UNINITIALIZED bool "Allow mmapped anonymous memory to be uninitialized" depends on EXPERT && !MMU default n help Normally, and according to the Linux spec, anonymous memory obtained from mmap() has it's contents cleared before it is passed to userspace. Enabling this config option allows you to request that mmap() skip that if it is given an MAP_UNINITIALIZED flag, thus providing a huge performance boost. If this option is not enabled, then the flag will be ignored.

  This is taken advantage of by uClibc's malloc(), and also by
  ELF-FDPIC binfmt's brk and stack allocator.

  Because of the obvious security issues, this option should only be
  enabled on embedded devices where you control what is run in
  userspace.  Since that isn't generally a problem on no-MMU systems,
  it is normally safe to say Y here.

  See Documentation/nommu-mmap.txt for more information.

config SYSTEM_DATA_VERIFICATION def_bool n select SYSTEM_TRUSTED_KEYRING select KEYS select CRYPTO select ASYMMETRIC_KEY_TYPE select ASYMMETRIC_PUBLIC_KEY_SUBTYPE select PUBLIC_KEY_ALGO_RSA select ASN1 select OID_REGISTRY select X509_CERTIFICATE_PARSER select PKCS7_MESSAGE_PARSER help Provide PKCS#7 message verification using the contents of the system trusted keyring to provide public keys. This then can be used for module verification, kexec image verification and firmware blob verification.

config PROFILING bool "Profiling support" help Say Y here to enable the extended profiling support mechanisms used by profilers such as OProfile.

Place an empty function call at each tracepoint site. Can be

dynamically changed for a probe function.

config TRACEPOINTS bool

source "arch/Kconfig"

endmenu # General setup

config HAVE_GENERIC_DMA_COHERENT bool default n

config SLABINFO bool depends on PROC_FS depends on SLAB || SLUB_DEBUG default y

config RT_MUTEXES bool

config BASE_SMALL int default 0 if BASE_FULL default 1 if !BASE_FULL

menuconfig MODULES bool "Enable loadable module support" option modules help Kernel modules are small pieces of compiled code which can be inserted in the running kernel, rather than being permanently built into the kernel. You use the "modprobe" tool to add (and sometimes remove) them. If you say Y here, many parts of the kernel can be built as modules (by answering M instead of Y where indicated): this is most useful for infrequently used options which are not required for booting. For more information, see the man pages for modprobe, lsmod, modinfo, insmod and rmmod.

  If you say Y here, you will need to run "make
  modules_install" to put the modules under /lib/modules/
  where modprobe can find them (you may need to be root to do
  this).

  If unsure, say Y.

if MODULES

config MODULE_FORCE_LOAD bool "Forced module loading" default n help Allow loading of modules without version information (ie. modprobe --force). Forced module loading sets the 'F' (forced) taint flag and is usually a really bad idea.

config MODULE_UNLOAD bool "Module unloading" help Without this option you will not be able to unload any modules (note that some modules may not be unloadable anyway), which makes your kernel smaller, faster and simpler. If unsure, say Y.

config MODULE_FORCE_UNLOAD bool "Forced module unloading" depends on MODULE_UNLOAD help This option allows you to force a module to unload, even if the kernel believes it is unsafe: the kernel will remove the module without waiting for anyone to stop using it (using the -f option to rmmod). This is mainly for kernel developers and desperate users. If unsure, say N.

config MODVERSIONS bool "Module versioning support" help Usually, you have to use modules compiled with your kernel. Saying Y here makes it sometimes possible to use modules compiled for different kernels, by adding enough information to the modules to (hopefully) spot any changes which would make them incompatible with the kernel you are running. If unsure, say N.

config MODULE_SRCVERSION_ALL bool "Source checksum for all modules" help Modules which contain a MODULE_VERSION get an extra "srcversion" field inserted into their modinfo section, which contains a sum of the source files which made it. This helps maintainers see exactly which source was used to build a module (since others sometimes change the module source without updating the version). With this option, such a "srcversion" field will be created for all modules. If unsure, say N.

config MODULE_SIG bool "Module signature verification" depends on MODULES select SYSTEM_DATA_VERIFICATION help Check modules for valid signatures upon load: the signature is simply appended to the module. For more information see Documentation/module-signing.txt.

  Note that this option adds the OpenSSL development packages as a
  kernel build dependency so that the signing tool can use its crypto
  library.

  !!!WARNING!!!  If you enable this option, you MUST make sure that the
  module DOES NOT get stripped after being signed.  This includes the
  debuginfo strip done by some packagers (such as rpmbuild) and
  inclusion into an initramfs that wants the module size reduced.

config MODULE_SIG_FORCE bool "Require modules to be validly signed" depends on MODULE_SIG help Reject unsigned modules or signed modules for which we don't have a key. Without this, such modules will simply taint the kernel.

config MODULE_SIG_ALL bool "Automatically sign all modules" default y depends on MODULE_SIG help Sign all modules during make modules_install. Without this option, modules must be signed manually, using the scripts/sign-file tool.

comment "Do not forget to sign required modules with scripts/sign-file" depends on MODULE_SIG_FORCE && !MODULE_SIG_ALL

choice prompt "Which hash algorithm should modules be signed with?" depends on MODULE_SIG help This determines which sort of hashing algorithm will be used during signature generation. This algorithm must be built into the kernel directly so that signature verification can take place. It is not possible to load a signed module containing the algorithm to check the signature on that module.

config MODULE_SIG_SHA1 bool "Sign modules with SHA-1" select CRYPTO_SHA1

config MODULE_SIG_SHA224 bool "Sign modules with SHA-224" select CRYPTO_SHA256

config MODULE_SIG_SHA256 bool "Sign modules with SHA-256" select CRYPTO_SHA256

config MODULE_SIG_SHA384 bool "Sign modules with SHA-384" select CRYPTO_SHA512

config MODULE_SIG_SHA512 bool "Sign modules with SHA-512" select CRYPTO_SHA512

endchoice

config MODULE_SIG_HASH string depends on MODULE_SIG default "sha1" if MODULE_SIG_SHA1 default "sha224" if MODULE_SIG_SHA224 default "sha256" if MODULE_SIG_SHA256 default "sha384" if MODULE_SIG_SHA384 default "sha512" if MODULE_SIG_SHA512

config MODULE_COMPRESS bool "Compress modules on installation" depends on MODULES help

  Compresses kernel modules when 'make modules_install' is run; gzip or
  xz depending on "Compression algorithm" below.

  module-init-tools MAY support gzip, and kmod MAY support gzip and xz.

  Out-of-tree kernel modules installed using Kbuild will also be
  compressed upon installation.

  Note: for modules inside an initrd or initramfs, it's more efficient
  to compress the whole initrd or initramfs instead.

  Note: This is fully compatible with signed modules.

  If in doubt, say N.

choice prompt "Compression algorithm" depends on MODULE_COMPRESS default MODULE_COMPRESS_GZIP help This determines which sort of compression will be used during 'make modules_install'.

  GZIP (default) and XZ are supported.

config MODULE_COMPRESS_GZIP bool "GZIP"

config MODULE_COMPRESS_XZ bool "XZ"

endchoice

endif # MODULES

config MODULES_TREE_LOOKUP def_bool y depends on PERF_EVENTS || TRACING

config INIT_ALL_POSSIBLE bool help Back when each arch used to define their own cpu_online_mask and cpu_possible_mask, some of them chose to initialize cpu_possible_mask with all 1s, and others with all 0s. When they were centralised, it was better to provide this option than to break all the archs and have several arch maintainers pursuing me down dark alleys.

config STOP_MACHINE bool default y depends on (SMP && MODULE_UNLOAD) || HOTPLUG_CPU help Need stop_machine() primitive.

source "block/Kconfig"

config PREEMPT_NOTIFIERS bool

config PADATA depends on SMP bool

Can be selected by architectures with broken toolchains

that get confused by correct const<->read_only section

mappings

config BROKEN_RODATA bool

config ASN1 tristate help Build a simple ASN.1 grammar compiler that produces a bytecode output that can be interpreted by the ASN.1 stream decoder and used to inform it as to what tags are to be expected in a stream and what functions to call on what tags.

source "kernel/Kconfig.locks"

66 KiB Raw Blame History

Kind of a stub config for the pure tick based cputime accounting

Architectures with an unreliable sched_clock() should select this:

For architectures that want to enable the support for NUMA-affine scheduler

balancing logic:

For architectures that prefer to flush all TLBs after a number of pages

are unmapped instead of sending one IPI per page to flush. The architecture

must provide guarantees on what happens if a clean TLB cache entry is

written after the unmap. Details are in mm/rmap.c near the check for

should_defer_flush. The architecture should also consider if the full flush

and the refill costs are offset by the savings of sending fewer IPIs.

For architectures that know their GCC __int128 support is sound

For architectures that (ab)use NUMA to represent different memory regions

all cpu-local but of different latencies, such as SuperH.

interpreter that classic socket filters depend on

syscall, maps, verifier

Place an empty function call at each tracepoint site. Can be

dynamically changed for a probe function.

Can be selected by architectures with broken toolchains

that get confused by correct const<->read_only section

mappings

66 KiB

Raw Blame History