When skb can not be allocated, update ethtool statisitics
rather than rx_dropped which is intended for netif_receive.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The values were not computed correctly. There are no significant
visible impact, though.
The intended size of RX buffer is 16 MB, and the default slot size is 1728.
So, NETVSC_DEFAULT_RX should be 16*1024*1024 / 1728 = 9709.
The intended size of TX buffer is 1 MB, and the slot size is 6144.
So, NETVSC_DEFAULT_TX should be 1024*1024 / 6144 = 170.
The patch puts the formula directly into the macro, and moves them to
hyperv_net.h, together with related macros.
Fixes: 5023a6db73 ("netvsc: increase default receive buffer size")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The max should be 31 MB on host with NVSP version > 2.
On legacy hosts (NVSP version <=2) only 15 MB receive buffer is allowed,
otherwise the buffer request will be rejected by the host, resulting
vNIC not coming up.
The NVSP version is only available after negotiation. So, we add the
limit checking for legacy hosts in netvsc_init_buf().
Fixes: 5023a6db73 ("netvsc: increase default receive buffer size")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Every packet sent checks the available ring space. The calculation
can be sped up by using reciprocal divide which is multiplication.
Since ring_size can only be configured by module parameter, so it doesn't
have to be passed around everywhere. Also it should be unsigned
since it is number of pages.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rndis_filter_device_add() is called both from netvsc_probe() when we
initially create the device and from set channels/mtu/ringparam
routines where we basically remove the device and add it back.
hw_features is reset in rndis_filter_device_add() and filled with
host data. However, we lose all additional flags which are set outside
of the driver, e.g. register_netdevice() adds NETIF_F_SOFT_FEATURES and
many others.
Unfortunately, calls to rndis_{query_hwcaps(), _set_offload_params()}
calls cannot be avoided on every RNDIS reset: host expects us to set
required features explicitly. Moreover, in theory hardware capabilities
can change and we need to reflect the change in hw_features.
Reset net->hw_features bits according to host data in
rndis_netdev_set_hwcaps(), clear corresponding feature bits
from net->features in case some features went missing (will never happen
in real life I guess but let's be consistent).
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename this variable because it is the Receive indirection
table.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This simplifies the logic and make it easier to add more
options.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report the numbers of events for stop_queue and wake_queue in
ethtool stats.
Example:
ethtool -S eth0
NIC statistics:
...
stop_queue: 7
wake_queue: 7
...
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If MTU is changed the host would reject the send buffer change.
This problem is result of recent change to allow changing send
buffer size.
Every time we change the MTU, we store the previous net_device section
count before destroying the buffer, but we don’t store the previous
section size. When we reinitialize the buffer, its size is calculated
by multiplying the previous count and previous size. Since we
continuously increase the MTU, the host returns us a decreasing count
value while the section size is reinitialized to 1728 bytes every
time.
This eventually leads to a condition where the calculated buf_size is
so small that the host rejects it.
Fixes: 8b5327975a ("netvsc: allow controlling send/recv buffer size")
Signed-off-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a virtual device is added dynamically (via host console), then
the vmbus sends an offer message for the primary channel. The processing
of this message for networking causes the network device to then
initialize the sub channels.
The problem is that setting up the sub channels needs to wait until
the subsequent subchannel offers have been processed. These offers
come in on the same ring buffer and work queue as where the primary
offer is being processed; leading to a deadlock.
This did not happen in older kernels, because the sub channel waiting
logic was broken (it wasn't really waiting).
The solution is to do the sub channel setup in its own work queue
context that is scheduled by the primary channel setup; and then
happens later.
Fixes: 732e49850c ("netvsc: fix race on sub channel creation")
Reported-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the parameter, num_queue in
rndis_filter_set_rss_param(), which is no longer in use.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch add the functions to switch UDP hash level between
L3 and L4 by ethtool command. UDP over IPv4 and v6 can be set
differently. The default hash level is L4. We currently only
allow switching TX hash level from within the guests.
On Azure, fragmented UDP packets have high loss rate with L4
hashing. Using L3 hashing is recommended in this case.
For example, for UDP over IPv4 on eth0:
To include UDP port numbers in hasing:
ethtool -N eth0 rx-flow-hash udp4 sdfn
To exclude UDP port numbers in hasing:
ethtool -N eth0 rx-flow-hash udp4 sd
To show UDP hash level:
ethtool -n eth0 rx-flow-hash udp4
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ethtool statistics for case where send chimmeny buffer is
exhausted and driver has to fall back to doing scatter/gather
send. Also, add statistic for case where ring buffer is full and
receive completions are delayed.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Control the size of the buffer areas via ethtool ring settings.
They aren't really traditional hardware rings, but host API breaks
receive and send buffer into chunks. The final size of the chunks are
controlled by the host.
The default value of send and receive buffer area for host DMA
is much larger than it needs to be. Experimentation shows that
4M receive and 1M send is sufficient.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When VF device is discovered, delay bring it automatically up in
order to allow userspace to some simple changes (like renaming).
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP offload conflict is dealt with by simply taking what is
in net-next where we have removed all of the UFO handling code
entirely.
The TCP conflict was a case of local variables in a function
being removed from both net and net-next.
In netvsc we had an assignment right next to where a missing
set of u64 stats sync object inits were added.
Signed-off-by: David S. Miller <davem@davemloft.net>
Go back to switching datapath directly in the notifier callback.
Otherwise datapath might not get switched on unregister.
No need for calling the NOTIFY_PEERS notifier since that is only for
a gratitious ARP/ND packet; but that is not required with Hyper-V
because both VF and synthetic NIC have the same MAC address.
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Fixes: 0c195567a8 ("netvsc: transparent VF management")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing sub channel code did not wait for all the sub-channels
to completely initialize. This could lead to race causing crash
in napi_netif_del() from bad list. The existing code would send
an init message, then wait only for the initial response that
the init message was received. It thought it was waiting for
sub channels but really the init response did the wakeup.
The new code keeps track of the number of open channels and
waits until that many are open.
Other issues here were:
* host might return less sub-channels than was requested.
* the new init status is not valid until after init was completed.
Fixes: b3e6b82a00 ("hv_netvsc: Wait for sub-channels to be processed during probe")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements transparent fail over from synthetic NIC to
SR-IOV virtual function NIC in Hyper-V environment. It is a better
alternative to using bonding as is done now. Instead, the receive and
transmit fail over is done internally inside the driver.
Using bonding driver has lots of issues because it depends on the
script being run early enough in the boot process and with sufficient
information to make the association. This patch moves all that
functionality into the kernel.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Optimize how receive completion ring are managed.
* Allocate only as many slots as needed for all buffers from host
* Allocate before setting up sub channel for better error detection
* Don't need to keep copy of initial receive section message
* Precompute the watermark for when receive flushing is needed
* Replace division with conditional test
* Replace atomic per-device variable with per-channel check.
* Handle corner case where receive completion send
fails if ring buffer to host is full.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The internal API was passing struct hv_page_buffer **
when only simple struct hv_page_buffer * was necessary
for passing an array.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This includes a bunch of fixups for issues reported by
lockdep.
* ethtool routines can assume RTNL
* send is done with RCU lock (and BH disable)
* avoid refetching internal device struct (netvsc)
instead pass it as a parameter.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This value has been calculated in rndis_device_attach since 4.11.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rndis functions are used when changing device state.
Therefore the references from network device to internal state
are protected by RTNL mutex.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keep back pointer in the per-channel data structure to
avoid any possible RCU related issues when napi poll is
called but netvsc_device is in RCU limbo.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netvsc_device structure should be accessed by rcu_dereference
in the send path. Change arguments to netvsc_send() to make
this easier to do correctly.
Remove no longer needed hv_device_to_netvsc_device.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rndis_filter_device_add function is called both in
probe context and RTNL context,and creates the netvsc_device
inner structure. It is easier to get the RTNL lock annotation
correct if it returns the object directly, rather than implicitly
by updating network device private data.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use device detach/attach to ensure that no packets are handed
to device during state changes. Call rndis_filter_open/close
directly as part of later VF related changes.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the VF NIC is opened, the synthetic NIC's carrier state is set to
off. This tells the host to transitions data path to the VF device. But
if startup script or user manipulates the admin state of the netvsc
device directly for example:
# ifconfig eth0 down
# ifconfig eth0 up
Then the carrier state of the synthetic NIC would be on, even though the
data path was still over the VF NIC. This patch sets the carrier state
of synthetic NIC with consideration of the related VF state.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We simply use rndis_device->link_state in the netdev_dbg. The variable,
link_state from struct netvsc_device_info, is not used anywhere else.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The element netvsc_device:extension is always a pointer to RNDIS
information.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The work queue and handling of network filter parameters should
be in rndis_device. This gets rid of warning from RCU checks,
eliminates a race and cleans up code.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
My change (introduced in 4.11) to use find_first_clear_bit
incorrectly assumed that the size argument was words, not bits.
The effect was only a small limited number of the available send
sections were being actually used. This can cause performance loss
with some workloads.
Since map_words is now used only during initialization, it can
be on stack instead of in per-device data.
Fixes: b58a185801 ("netvsc: simplify get next send section")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the outgoing skb has a RX queue mapping available, we use the queue
number directly, other than put it through Send Indirection Table.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than a lock and variable, use a refcount_t to keep track
of the number of sub channels. Don't need to wait for subchannels
on device removal since wait was already done in device_add.
Also fix the error handling; don't wait forever in case of
an error on request to create sub channels.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is cleaner to use RCU protected pointer (nvdev_ctx->nvdev)
to indicate device is in removed state, rather than having a separate
boolean flag. By using the pointer the context can be checked
by static checkers and dynamic lockdep.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netvsc driver has an internal structure (netvsc_device) which
is created when device is opened and released when device is closed.
And also opened/released when MTU or number of channels change.
Since this is referenced in the receive and transmit path, it is
safer to use RCU to protect/prevent use after free problems.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using NAPI, the single stream performance declined signifcantly
because the poll routine was updating host after every burst
of packets. This excess signalling caused host throttling.
This fix restores the old behavior. Host is only signalled
after the ring has been emptied.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/broadcom/genet/bcmgenet.c
net/core/sock.c
Conflicts were overlapping changes in bcmgenet and the
lockdep handling of sockets.
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the send indirection table from the inner device (netvsc)
to the network device context.
It is possible that netvsc_device is not present (remove in progress).
This solves potential use after free issues when packet is being
created during MTU change, shutdown, or queue count changes.
Fixes: d8e18ee0fa ("netvsc: enhance transmit select_queue")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use NAPI (softirq), to handle receive packets and send completions.
Previously this was handled by tasklet.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report packets and bytes transferred through a vmbus channel via ethtool.
This supersedes need for per-cpu statistics.
Example:
$ ethtool -S eth0
NIC statistics:
...
tx_queue_0_packets: 3523179
tx_queue_0_bytes: 505370920
rx_queue_0_packets: 41430490
rx_queue_0_bytes: 62714661254
tx_queue_1_packets: 0
tx_queue_1_bytes: 0
rx_queue_1_packets: 0
rx_queue_1_bytes: 0
...
Reviewed-by: Long Li <longli@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most drivers do not increment transmit statistics until after the
transmit is completed. This will also be necessary for BQL support.
Slight additional complexity because the netvsc driver aggregates
multiple packets into one transmit.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since now keep track of per-queue outstanding sends, we can avoid
one atomic update by removing no longer needed per-device atomic.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All caller's already have pointer to netvsc_device so pass it.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the caller's/callee's know that the format of the device_add
parameter is a netvsc_device_info struct.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do manual optimizations of receive path:
- remove checks for impossible conditions (but keep checks
for bad data from host)
- pass argument down, rather than having callee recompute what
is already known
- remove indirection about receive buffer datalength
- remove dependence on VLAN_TAG_PRESENCE
- use _hot/_cold and likely/unlikely
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Put all the per-channel state together in one data struct.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow setting receive indirection table. Also uses the system standard
for initialization.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows for number of channels to be managed in a manner similar
to existing hardware drivers. It also removes the restriction of
maximum 8 channels and allows as many as the host will allow.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some cases it is useful to be able to change RSS key value.
For example, replacing RSS key with a symmetric hash.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Redo how Hyper-V network driver negotiates offload features. Query the
host to determine offload settings, and use the result.
Also:
* disable IPv4 header checksum offload (not used by Linux)
* enable TSO only if host supports
* enable UDP checksum offload if supported
* don't advertise support for checksumming of non-IP protocols
* adjust GSO maximum segment size
* enable HIGHDMA
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ring buffer mapping now handles the wraparound case
inside get_next_pkt_raw. Therefore it is not necessary to have an
additional special receive staging buffer.
See commit 1562edaed8c164ca5199 ("Drivers: hv: ring_buffer: count on
wrap around mappings")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hyperv_net:
- set min/max_mtu, per Haiyang, after rndis_filter_device_add
virtio_net:
- set min/max_mtu
- remove virtnet_change_mtu
vmxnet3:
- set min/max_mtu
xen-netback:
- min_mtu = 0, max_mtu = 65517
xen-netfront:
- min_mtu = 0, max_mtu = 65535
unisys/visor:
- clean up defines a little to not clash with network core or add
redundat definitions
CC: netdev@vger.kernel.org
CC: virtualization@lists.linux-foundation.org
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Paul Durrant <paul.durrant@citrix.com>
CC: David Kershner <david.kershner@unisys.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Typo's and spelling errors. Also remove old comment from staging era.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Useful for debugging issues with multicast and SR-IOV to keep track
of number of received multicast packets.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since VF reference is now protected by RCU, no longer need the VF usage
counter and can use device flags to see whether to inject or not.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vf_netdev pointer in the netvsc device context can simply be protected
by RCU because network device destruction is already RCU synchronized.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Printing console messages is not helpful when system is out of memory;
and can be disastrous with netconsole. Instead keep statistics
of these anomalous conditions.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing code uses busy retry when unable to send out receive
completions due to full ring buffer. It also gives up retrying after limit
is reached, and causes receive buffer slots not being recycled.
This patch implements batching of receive completions. It also prevents
dropping receive completions due to full ring buffer.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor overlapping changes for both merge conflicts.
Resolution work done by Stephen Rothwell was used
as a reference.
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is a deadlock scenario:
- netvsc_vf_up() schedules netvsc_notify_peers() work and quits.
- netvsc_vf_down() runs before netvsc_notify_peers() gets executed. As it
is being executed from netdev notifier chain we hold rtnl lock when we
get here.
- we enter while (atomic_read(&net_device_ctx->vf_use_cnt) != 0) loop and
wait till netvsc_notify_peers() drops vf_use_cnt.
- netvsc_notify_peers() starts on some other CPU but netdev_notify_peers()
will hang on rtnl_lock().
- deadlock!
Instead of introducing additional synchronization I suggest we drop
gwrk.dwrk completely and call NETDEV_NOTIFY_PEERS directly. As we're
acting under rtnl lock this is legitimate.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct netvsc_device is not suitable for storing VF information as this
structure is being destroyed on MTU change / set channel operation (see
rndis_filter_device_remove()). Move all VF related stuff to struct
net_device_context which is persistent.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Hyper-V host 2016 and later, VMs gets an event message of the physical
link speed when vSwitch is changed. This patch handles this message, so
the updated link speed can be reported by ethtool.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We unpack 'struct net_device' in netvsc_set_mac_addr() to get to
'struct hv_device' pointer which we use in rndis_filter_set_device_mac()
to get back to 'struct net_device'.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both rndis_filter_open()/rndis_filter_close() use struct hv_device to
reach to struct netvsc_device only and all callers have it already.
While on it, rename net_device to nvdev in rndis_filter_open() as
net_device is misleading.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make it easier to get 'struct netvsc_device' from 'struct net_device' and
'struct hv_device' by introducing inline helpers.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify netvsvc pointer graph by getting rid of the redundant ndev
pointer. We can always get a pointer to struct net_device from somewhere
else.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have the following structures keeping netvsc adapter state:
- struct net_device
- struct net_device_context
- struct netvsc_device
- struct rndis_device
- struct hv_device
and there are pointers/dependencies between them:
- struct net_device_context is contained in struct net_device
- struct hv_device has driver_data pointer which points to
'struct net_device' OR 'struct netvsc_device' depending on driver's
state (!).
- struct net_device_context has a pointer to 'struct hv_device'.
- struct netvsc_device has pointers to 'struct hv_device' and
'struct net_device_context'.
- struct rndis_device has a pointer to 'struct netvsc_device'.
Different functions get different structures as parameters and use these
pointers for traveling. The problem is (in addition to keeping in mind
this complex graph) that some of these structures (struct netvsc_device
and struct rndis_device) are being removed and re-created on mtu change
(as we implement it as re-creation of hyper-v device) so our travel using
these pointers is dangerous.
Simplify this to a the following:
- add struct netvsc_device pointer to struct net_device_context (which is
a part of struct net_device and thus never disappears)
- remove struct hv_device and struct net_device_context pointers from
struct netvsc_device
- replace pointer to 'struct netvsc_device' with pointer to
'struct net_device'.
- always keep 'struct net_device' in hv_device driver_data.
We'll end up with the following 'circular' structure:
net_device:
[net_device_context] -> netvsc_device -> rndis_device -> net_device
-> hv_device -> net_device
On MTU change we'll be removing the 'netvsc_device -> rndis_device'
branch and re-creating it making the synchronization easier.
There is one additional redundant pointer left, it is struct net_device
link in struct netvsc_device, it is going to be removed in a separate
commit.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct netvsc_device is destroyed on mtu change so keeping the
protection flag there is not a good idea. Move it to struct
net_device_context which is preserved.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support VF drivers on Hyper-V. On Hyper-V, each VF instance presented to
the guest has an associated synthetic interface that shares the MAC address
with the VF instance. Typically these are bonded together to support
live migration. By default, the host delivers all the incoming packets
on the synthetic interface. Once the VF is up, we need to explicitly switch
the data path on the host to divert traffic onto the VF interface. Even after
switching the data path, broadcast and multicast packets are always delivered
on the synthetic interface and these will have to be injected back onto the
VF interface (if VF is up).
This patch implements the necessary support in netvsc to support Linux
VF drivers.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VRSS_CHANNEL_MAX is the max number of channels supported by Hyper-V
hosts. We use it for the related array sizes instead of using NR_CPUS,
which may be set to several thousands.
This patch reduces possible memory allocation failures.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows the user to set and retrieve speed and duplex of the
hv_netvsc device via ethtool.
Example:
$ ethtool eth0
Settings for eth0:
...
Speed: Unknown!
Duplex: Unknown! (255)
...
$ ethtool -s eth0 speed 1000 duplex full
$ ethtool eth0
Settings for eth0:
...
Speed: 1000Mb/s
Duplex: Full
...
This is based on patches by Roopa Prabhu and Nikolay Aleksandrov.
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since eliminating send_completion_tid from struct hv_netvsc_packet, we
haven't add proper book keeping for the skb of the batched packet. This
patch fixes this issue and allows the previous skb is properly freed.
Otherwise, a panic may happen.
Thanks to Simon Xiao <sixiao@microsoft.com> for bisecting and analysis.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate vlan_tci from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate status from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate xmit_more from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate completion_func from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate is_data_pkt from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate send_completion_tid from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate page_buf from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet scheduler guarantees there won't be multiple senders for the same
queue and as we use q_idx for multi_send_data the spinlock is redundant.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rndis header is 116 bytes big and can be placed in the default
head room that will be available in the skb. Since the netvsc packet
is less than 48 bytes, we can use the skb control buffer
for the netvsc packet. With these changes we don't need to
ask for additional head room.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate send_completion_ctx from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate send_completion from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminatte the data field from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate rndis_msg pointer from hv_netvsc_packet structure.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate the channel field in hv_netvsc_packet structure.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rearrange the elements of struct hv_negtvsc_packet for optimal layout -
eliminate unnecessary padding.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of reducing the size of the hv_netvsc_packet, resize some of the
variables based on their usage.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several issues in hv_netvsc driver with regards to link status
change handling:
- RNDIS_STATUS_NETWORK_CHANGE results in calling userspace helper doing
'/etc/init.d/network restart' and this is inappropriate and broken for
many reasons.
- link_watch infrastructure only sends one notification per second and
in case of e.g. paired disconnect/connect events we get only one
notification with last status. This makes it impossible to handle such
situations in userspace.
Redo link status changes handling in the following way:
- Create a list of reconfig events in network device context.
- On a reconfig event add it to the list of events and schedule
netvsc_link_change().
- In netvsc_link_change() ensure 2-second delay between link status
changes.
- Handle RNDIS_STATUS_NETWORK_CHANGE as a paired disconnect/connect event.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uses device_info->num_chn to pass user provided number of vRSS
queues (from ethtool --set-channels) to rndis_filter_device_add. If
nonzero and less than the maximum, set net_device->num_chn to the given
value; else default to prior algorithm.
Always initialize struct device_info to 0, otherwise not all its fields
are guaranteed to be 0, which is necessary when checking if num_chn has
been purposefully set.
Signed-off-by: Andrew Schwartzmeyer <andschwa@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>