mirror of
				git://git.yoctoproject.org/linux-yocto.git
				synced 2025-10-22 23:13:01 +02:00 
			
		
		
		
	 61bf0009a7
			
		
	
	
		61bf0009a7
		
	
	
	
	
		
			
			net_dim() is currently passed a struct dim_sample argument by value.
struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64
passes it on the stack. All callers have already initialized dim_sample
on the stack, so passing it by value requires pushing a duplicated copy
to the stack. Either witing to the stack and immediately reading it, or
perhaps dereferencing addresses relative to the stack pointer in a chain
of push instructions, seems to perform quite poorly.
In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time,
94% of which is attributed to the first push instruction to copy
dim_sample on the stack for the call to net_dim():
// Call ktime_get()
  0.26 |4ead2:   call   4ead7 <mlx5e_handle_rx_dim+0x47>
// Pass the address of struct dim in %rdi
       |4ead7:   lea    0x3d0(%rbx),%rdi
// Set dim_sample.pkt_ctr
       |4eade:   mov    %r13d,0x8(%rsp)
// Set dim_sample.byte_ctr
       |4eae3:   mov    %r12d,0xc(%rsp)
// Set dim_sample.event_ctr
  0.15 |4eae8:   mov    %bp,0x10(%rsp)
// Duplicate dim_sample on the stack
 94.16 |4eaed:   push   0x10(%rsp)
  2.79 |4eaf1:   push   0x10(%rsp)
  0.07 |4eaf5:   push   %rax
// Call net_dim()
  0.21 |4eaf6:   call   4eafb <mlx5e_handle_rx_dim+0x6b>
To allow the caller to reuse the struct dim_sample already on the stack,
pass the struct dim_sample by reference to net_dim().
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Link: https://patch.msgid.link/20241031002326.3426181-2-csander@purestorage.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
		
	
			
		
			
				
	
	
		
			219 lines
		
	
	
		
			9.4 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			219 lines
		
	
	
		
			9.4 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| ======================================================
 | |
| Net DIM - Generic Network Dynamic Interrupt Moderation
 | |
| ======================================================
 | |
| 
 | |
| :Author: Tal Gilboa <talgi@mellanox.com>
 | |
| 
 | |
| .. contents:: :depth: 2
 | |
| 
 | |
| Assumptions
 | |
| ===========
 | |
| 
 | |
| This document assumes the reader has basic knowledge in network drivers
 | |
| and in general interrupt moderation.
 | |
| 
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| Dynamic Interrupt Moderation (DIM) (in networking) refers to changing the
 | |
| interrupt moderation configuration of a channel in order to optimize packet
 | |
| processing. The mechanism includes an algorithm which decides if and how to
 | |
| change moderation parameters for a channel, usually by performing an analysis on
 | |
| runtime data sampled from the system. Net DIM is such a mechanism. In each
 | |
| iteration of the algorithm, it analyses a given sample of the data, compares it
 | |
| to the previous sample and if required, it can decide to change some of the
 | |
| interrupt moderation configuration fields. The data sample is composed of data
 | |
| bandwidth, the number of packets and the number of events. The time between
 | |
| samples is also measured. Net DIM compares the current and the previous data and
 | |
| returns an adjusted interrupt moderation configuration object. In some cases,
 | |
| the algorithm might decide not to change anything. The configuration fields are
 | |
| the minimum duration (microseconds) allowed between events and the maximum
 | |
| number of wanted packets per event. The Net DIM algorithm ascribes importance to
 | |
| increase bandwidth over reducing interrupt rate.
 | |
| 
 | |
| 
 | |
| Net DIM Algorithm
 | |
| =================
 | |
| 
 | |
| Each iteration of the Net DIM algorithm follows these steps:
 | |
| 
 | |
| #. Calculates new data sample.
 | |
| #. Compares it to previous sample.
 | |
| #. Makes a decision - suggests interrupt moderation configuration fields.
 | |
| #. Applies a schedule work function, which applies suggested configuration.
 | |
| 
 | |
| The first two steps are straightforward, both the new and the previous data are
 | |
| supplied by the driver registered to Net DIM. The previous data is the new data
 | |
| supplied to the previous iteration. The comparison step checks the difference
 | |
| between the new and previous data and decides on the result of the last step.
 | |
| A step would result as "better" if bandwidth increases and as "worse" if
 | |
| bandwidth reduces. If there is no change in bandwidth, the packet rate is
 | |
| compared in a similar fashion - increase == "better" and decrease == "worse".
 | |
| In case there is no change in the packet rate as well, the interrupt rate is
 | |
| compared. Here the algorithm tries to optimize for lower interrupt rate so an
 | |
| increase in the interrupt rate is considered "worse" and a decrease is
 | |
| considered "better". Step #2 has an optimization for avoiding false results: it
 | |
| only considers a difference between samples as valid if it is greater than a
 | |
| certain percentage. Also, since Net DIM does not measure anything by itself, it
 | |
| assumes the data provided by the driver is valid.
 | |
| 
 | |
| Step #3 decides on the suggested configuration based on the result from step #2
 | |
| and the internal state of the algorithm. The states reflect the "direction" of
 | |
| the algorithm: is it going left (reducing moderation), right (increasing
 | |
| moderation) or standing still. Another optimization is that if a decision
 | |
| to stay still is made multiple times, the interval between iterations of the
 | |
| algorithm would increase in order to reduce calculation overhead. Also, after
 | |
| "parking" on one of the most left or most right decisions, the algorithm may
 | |
| decide to verify this decision by taking a step in the other direction. This is
 | |
| done in order to avoid getting stuck in a "deep sleep" scenario. Once a
 | |
| decision is made, an interrupt moderation configuration is selected from
 | |
| the predefined profiles.
 | |
| 
 | |
| The last step is to notify the registered driver that it should apply the
 | |
| suggested configuration. This is done by scheduling a work function, defined by
 | |
| the Net DIM API and provided by the registered driver.
 | |
| 
 | |
| As you can see, Net DIM itself does not actively interact with the system. It
 | |
| would have trouble making the correct decisions if the wrong data is supplied to
 | |
| it and it would be useless if the work function would not apply the suggested
 | |
| configuration. This does, however, allow the registered driver some room for
 | |
| manoeuvre as it may provide partial data or ignore the algorithm suggestion
 | |
| under some conditions.
 | |
| 
 | |
| 
 | |
| Registering a Network Device to DIM
 | |
| ===================================
 | |
| 
 | |
| Net DIM API exposes the main function net_dim().
 | |
| This function is the entry point to the Net
 | |
| DIM algorithm and has to be called every time the driver would like to check if
 | |
| it should change interrupt moderation parameters. The driver should provide two
 | |
| data structures: :c:type:`struct dim <dim>` and
 | |
| :c:type:`struct dim_sample <dim_sample>`. :c:type:`struct dim <dim>`
 | |
| describes the state of DIM for a specific object (RX queue, TX queue,
 | |
| other queues, etc.). This includes the current selected profile, previous data
 | |
| samples, the callback function provided by the driver and more.
 | |
| :c:type:`struct dim_sample <dim_sample>` describes a data sample,
 | |
| which will be compared to the data sample stored in :c:type:`struct dim <dim>`
 | |
| in order to decide on the algorithm's next
 | |
| step. The sample should include bytes, packets and interrupts, measured by
 | |
| the driver.
 | |
| 
 | |
| In order to use Net DIM from a networking driver, the driver needs to call the
 | |
| main net_dim() function. The recommended method is to call net_dim() on each
 | |
| interrupt. Since Net DIM has a built-in moderation and it might decide to skip
 | |
| iterations under certain conditions, there is no need to moderate the net_dim()
 | |
| calls as well. As mentioned above, the driver needs to provide an object of type
 | |
| :c:type:`struct dim <dim>` to the net_dim() function call. It is advised for
 | |
| each entity using Net DIM to hold a :c:type:`struct dim <dim>` as part of its
 | |
| data structure and use it as the main Net DIM API object.
 | |
| The :c:type:`struct dim_sample <dim_sample>` should hold the latest
 | |
| bytes, packets and interrupts count. No need to perform any calculations, just
 | |
| include the raw data.
 | |
| 
 | |
| The net_dim() call itself does not return anything. Instead Net DIM relies on
 | |
| the driver to provide a callback function, which is called when the algorithm
 | |
| decides to make a change in the interrupt moderation parameters. This callback
 | |
| will be scheduled and run in a separate thread in order not to add overhead to
 | |
| the data flow. After the work is done, Net DIM algorithm needs to be set to
 | |
| the proper state in order to move to the next iteration.
 | |
| 
 | |
| 
 | |
| Example
 | |
| =======
 | |
| 
 | |
| The following code demonstrates how to register a driver to Net DIM. The actual
 | |
| usage is not complete but it should make the outline of the usage clear.
 | |
| 
 | |
| .. code-block:: c
 | |
| 
 | |
|   #include <linux/dim.h>
 | |
| 
 | |
|   /* Callback for net DIM to schedule on a decision to change moderation */
 | |
|   void my_driver_do_dim_work(struct work_struct *work)
 | |
|   {
 | |
| 	/* Get struct dim from struct work_struct */
 | |
| 	struct dim *dim = container_of(work, struct dim,
 | |
| 				       work);
 | |
| 	/* Do interrupt moderation related stuff */
 | |
| 	...
 | |
| 
 | |
| 	/* Signal net DIM work is done and it should move to next iteration */
 | |
| 	dim->state = DIM_START_MEASURE;
 | |
|   }
 | |
| 
 | |
|   /* My driver's interrupt handler */
 | |
|   int my_driver_handle_interrupt(struct my_driver_entity *my_entity, ...)
 | |
|   {
 | |
| 	...
 | |
| 	/* A struct to hold current measured data */
 | |
| 	struct dim_sample dim_sample;
 | |
| 	...
 | |
| 	/* Initiate data sample struct with current data */
 | |
| 	dim_update_sample(my_entity->events,
 | |
| 		          my_entity->packets,
 | |
| 		          my_entity->bytes,
 | |
| 		          &dim_sample);
 | |
| 	/* Call net DIM */
 | |
| 	net_dim(&my_entity->dim, &dim_sample);
 | |
| 	...
 | |
|   }
 | |
| 
 | |
|   /* My entity's initialization function (my_entity was already allocated) */
 | |
|   int my_driver_init_my_entity(struct my_driver_entity *my_entity, ...)
 | |
|   {
 | |
| 	...
 | |
| 	/* Initiate struct work_struct with my driver's callback function */
 | |
| 	INIT_WORK(&my_entity->dim.work, my_driver_do_dim_work);
 | |
| 	...
 | |
|   }
 | |
| 
 | |
| 
 | |
| Tuning DIM
 | |
| ==========
 | |
| 
 | |
| Net DIM serves a range of network devices and delivers excellent acceleration
 | |
| benefits. Yet, it has been observed that some preset configurations of DIM may
 | |
| not align seamlessly with the varying specifications of network devices, and
 | |
| this discrepancy has been identified as a factor to the suboptimal performance
 | |
| outcomes of DIM-enabled network devices, related to a mismatch in profiles.
 | |
| 
 | |
| To address this issue, Net DIM introduces a per-device control to modify and
 | |
| access a device's ``rx-profile`` and ``tx-profile`` parameters:
 | |
| Assume that the target network device is named ethx, and ethx only declares
 | |
| support for RX profile setting and supports modification of ``usec`` field
 | |
| and ``pkts`` field (See the data structure:
 | |
| :c:type:`struct dim_cq_moder <dim_cq_moder>`).
 | |
| 
 | |
| You can use ethtool to modify the current RX DIM profile where all
 | |
| values are 64::
 | |
| 
 | |
|     $ ethtool -C ethx rx-profile 1,1,n_2,2,n_3,n,n_n,4,n_n,n,n
 | |
| 
 | |
| ``n`` means do not modify this field, and ``_`` separates structure
 | |
| elements of the profile array.
 | |
| 
 | |
| Querying the current profiles using::
 | |
| 
 | |
|     $ ethtool -c ethx
 | |
|     ...
 | |
|     rx-profile:
 | |
|     {.usec =   1, .pkts =   1, .comps = n/a,},
 | |
|     {.usec =   2, .pkts =   2, .comps = n/a,},
 | |
|     {.usec =   3, .pkts =  64, .comps = n/a,},
 | |
|     {.usec =  64, .pkts =   4, .comps = n/a,},
 | |
|     {.usec =  64, .pkts =  64, .comps = n/a,}
 | |
|     tx-profile:   n/a
 | |
| 
 | |
| If the network device does not support specific fields of DIM profiles,
 | |
| the corresponding ``n/a`` will display. If the ``n/a`` field is being
 | |
| modified, error messages will be reported.
 | |
| 
 | |
| 
 | |
| Dynamic Interrupt Moderation (DIM) library API
 | |
| ==============================================
 | |
| 
 | |
| .. kernel-doc:: include/linux/dim.h
 | |
|     :internal:
 |