Commit Graph

1137 Commits

Author SHA1 Message Date
Daisuke Matsuda
10af303192 RDMA/rxe: Fix spinlock recursion deadlock on requester
The following deadlock is observed:

 Call Trace:
  <IRQ>
  _raw_spin_lock_bh+0x29/0x30
  check_type_state.constprop.0+0x4e/0xc0 [rdma_rxe]
  rxe_rcv+0x173/0x3d0 [rdma_rxe]
  rxe_udp_encap_recv+0x69/0xd0 [rdma_rxe]
  ? __pfx_rxe_udp_encap_recv+0x10/0x10 [rdma_rxe]
  udp_queue_rcv_one_skb+0x258/0x520
  udp_unicast_rcv_skb+0x75/0x90
  __udp4_lib_rcv+0x364/0x5c0
  ip_protocol_deliver_rcu+0xa7/0x160
  ip_local_deliver_finish+0x73/0xa0
  ip_sublist_rcv_finish+0x80/0x90
  ip_sublist_rcv+0x191/0x220
  ip_list_rcv+0x132/0x160
  __netif_receive_skb_list_core+0x297/0x2c0
  netif_receive_skb_list_internal+0x1c5/0x300
  napi_complete_done+0x6f/0x1b0
  virtnet_poll+0x1f4/0x2d0 [virtio_net]
  __napi_poll+0x2c/0x1b0
  net_rx_action+0x293/0x350
  ? __napi_schedule+0x79/0x90
  __do_softirq+0xcb/0x2ab
  __irq_exit_rcu+0xb9/0xf0
  common_interrupt+0x80/0xa0
  </IRQ>
  <TASK>
  asm_common_interrupt+0x22/0x40
  RIP: 0010:_raw_spin_lock+0x17/0x30
  rxe_requester+0xe4/0x8f0 [rdma_rxe]
  ? xas_load+0x9/0xa0
  ? xa_load+0x70/0xb0
  do_task+0x64/0x1f0 [rdma_rxe]
  rxe_post_send+0x54/0x110 [rdma_rxe]
  ib_uverbs_post_send+0x5f8/0x680 [ib_uverbs]
  ? netif_receive_skb_list_internal+0x1e3/0x300
  ib_uverbs_write+0x3c8/0x500 [ib_uverbs]
  vfs_write+0xc5/0x3b0
  ksys_write+0xab/0xe0
  ? syscall_trace_enter.constprop.0+0x126/0x1a0
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x72/0xdc
  </TASK>

The deadlock is easily reproducible with perftest. Fix it by disabling
softirq when acquiring the lock in process context.

Fixes: f605f26ea1 ("RDMA/rxe: Protect QP state with qp->state_lock")
Link: https://lore.kernel.org/r/20230418090642.1849358-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-21 12:33:00 -03:00
Linus Torvalds
e1f2750edc x86: remove 'zerorest' argument from __copy_user_nocache()
Every caller passes in zero, meaning they don't want any partial copy to
zero the remainder of the destination buffer.

Which is just as well, because the implementation of that function
didn't actually even look at that argument, and wasn't even aware it
existed, although some misleading comments did mention it still.

The 'zerorest' thing is a historical artifact of how "copy_from_user()"
worked, in that it would zero the rest of the kernel buffer that it
copied into.

That zeroing still exists, but it's long since been moved to generic
code, and the raw architecture-specific code doesn't do it.  See
_copy_from_user() in lib/usercopy.c for this all.

However, while __copy_user_nocache() shares some history and superficial
other similarities with copy_from_user(), it is in many ways also very
different.

In particular, while the code makes it *look* similar to the generic
user copy functions that can copy both to and from user space, and take
faults on both reads and writes as a result, __copy_user_nocache() does
no such thing at all.

__copy_user_nocache() always copies to kernel space, and will never take
a page fault on the destination.  What *can* happen, though, is that the
non-temporal stores take a machine check because one of the use cases is
for writing to stable memory, and any memory errors would then take
synchronous faults.

So __copy_user_nocache() does look a lot like copy_from_user(), but has
faulting behavior that is more akin to our old copy_in_user() (which no
longer exists, but copied from user space to user space and could fault
on both source and destination).

And it very much does not have the "zero the end of the destination
buffer", since a problem with the destination buffer is very possibly
the very source of the partial copy.

So this whole thing was just a confusing historical artifact from having
shared some code with a completely different function with completely
different use cases.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-19 19:09:52 -07:00
Bob Pearson
f605f26ea1 RDMA/rxe: Protect QP state with qp->state_lock
Currently the rxe driver makes little effort to make the changes to qp
state (which includes qp->attr.qp_state, qp->attr.sq_draining and
qp->valid) atomic between different client threads and IO threads. In
particular a common template is for an RDMA application to call
ib_modify_qp() to move a qp to ERR state and then wait until all the
packet and work queues have drained before calling ib_destroy_qp(). None
of these state changes are protected by locks to assure that the changes
are executed atomically and that memory barriers are included. This has
been observed to lead to incorrect behavior around qp cleanup.

This patch continues the work of the previous patches in this series and
adds locking code around qp state changes and lookups.

Link: https://lore.kernel.org/r/20230405042611.6467-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17 16:34:04 -03:00
Bob Pearson
7b560b89a0 RDMA/rxe: Move code to check if drained to subroutine
Move two blocks of code in rxe_comp.c and rxe_req.c to subroutines that
check if draining is complete in the SQD state and, if so, generate a
SQ_DRAINED event.

Link: https://lore.kernel.org/r/20230405042611.6467-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17 16:01:44 -03:00
Bob Pearson
98e891b5e4 RDMA/rxe: Remove qp->req.state
The rxe driver has four different QP state variables,
    qp->attr.qp_state,
    qp->req.state,
    qp->comp.state, and
    qp->resp.state.
All of these basically carry the same information.

This patch replaces uses of qp->req.state by qp->attr.qp_state and enum
rxe_qp_state.  This is the third of three patches which will remove all
but the qp->attr.qp_state variable. This will bring the driver closer to
the IBA description.

Link: https://lore.kernel.org/r/20230405042611.6467-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17 16:01:44 -03:00
Bob Pearson
f55efc2ed2 RDMA/rxe: Remove qp->comp.state
The rxe driver has four different QP state variables,
    qp->attr.qp_state,
    qp->req.state,
    qp->comp.state, and
    qp->resp.state.
All of these basically carry the same information.

This patch replaces uses of qp->comp.state by qp->attr.qp_state.  This is
the second of three patches which will remove all but the
qp->attr.qp_state variable. This will bring the driver closer to the IBA
description.

Link: https://lore.kernel.org/r/20230405042611.6467-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17 16:01:44 -03:00
Bob Pearson
a588429a66 RDMA/rxe: Remove qp->resp.state
The rxe driver has four different QP state variables,
    qp->attr.qp_state,
    qp->req.state,
    qp->comp.state, and
    qp->resp.state.
All of these basically carry the same information.

This patch replaces uses of qp->resp.state by qp->attr.qp_state.  This is
the first of three patches which will remove all but the qp->attr.qp_state
variable. This will bring the driver closer to the IBA description.

Link: https://lore.kernel.org/r/20230405042611.6467-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17 16:01:44 -03:00
Jason Gunthorpe
8d7c7c0eeb RDMA: Add ib_virt_dma_to_page()
Make it clearer what is going on by adding a function to go back from the
"virtual" dma_addr to a kva and another to a struct page. This is used in the
ib_uses_virt_dma() style drivers (siw, rxe, hfi, qib).

Call them instead of a naked casting and  virt_to_page() when working with dma_addr
values encoded by the various ib_map functions.

This also fixes the virt_to_page() casting problem Linus Walleij has been
chasing.

Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/0-v2-05ea785520ed+10-ib_virt_page_jgg@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-16 11:08:07 +03:00
Zhu Yanjun
b2b1ddc457 RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"
In the function rxe_create_qp(), rxe_qp_from_init() is called to
initialize qp, internally things like rxe_init_task are not setup until
rxe_qp_init_req().

If an error occurred before this point then the unwind will call
rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task()
which will oops when trying to access the uninitialized spinlock.

If rxe_init_task is not executed, rxe_cleanup_task will not be called.

Reported-by: syzbot+cfcc1a3c85be15a40cba@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=fd85757b74b3eb59f904138486f755f71e090df8
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Fixes: 2d4b21e0a2 ("IB/rxe: Prevent from completer to operate on non valid QP")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://lore.kernel.org/r/20230413101115.1366068-1-yanjun.zhu@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-16 10:51:33 +03:00
Bob Pearson
67a00d29c3 RDMA/rxe: Fix incorrect TASKLET_STATE_SCHED check in rxe_task.c
In a previous patch TASKLET_STATE_SCHED was used as a mask but it is a bit
position instead. Add the missing shift.

Link: https://lore.kernel.org/r/20230329193308.7489-1-rpearsonhpe@gmail.com
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/linux-rdma/8a054b78-6d50-4bc6-8d8a-83f85fbdb82f@kili.mountain/
Fixes: d946716325 ("RDMA/rxe: Rewrite rxe_task.c")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-12 13:11:51 -03:00
Tetsuo Handa
266e9b3475 RDMA/siw: Remove namespace check from siw_netdev_event()
syzbot is reporting that siw_netdev_event(NETDEV_UNREGISTER) cannot destroy
siw_device created after unshare(CLONE_NEWNET) due to net namespace check.
It seems that this check was by error there and should be removed.

Reported-by: syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Suggested-by: Leon Romanovsky <leon@kernel.org>
Fixes: bdcf26bf9b ("rdma/siw: network and RDMA core interface")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/a44e9ac5-44e2-d575-9e30-02483cc7ffd1@I-love.SAKURA.ne.jp
Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-03 21:24:21 +03:00
Leon Romanovsky
b6ba68555d RDMA/rxe: Clean kzalloc failure paths
There is no need to print any debug messages after failure to
allocate memory, because kernel will print OOM dumps anyway.

Together with removal of these messages, remove useless goto jumps.

Fixes: 5bf944f241 ("RDMA/rxe: Add error messages")
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/all/ea43486f-43dd-4054-b1d5-3a0d202be621@kili.mountain
Link: https://lore.kernel.org/r/d3cedf723b84e73e8062a67b7489d33802bafba2.1680113597.git.leon@kernel.org
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-03-30 09:54:32 +03:00
Bob Pearson
78b26a3353 RDMA/rxe: Remove tasklet call from rxe_cq.c
Remove the tasklet call in rxe_cq.c and also the is_dying in the
cq struct. There is no reason for the rxe driver to defer the call
to the cq completion handler by scheduling a tasklet. rxe_cq_post()
is not called in a hard irq context.

The rxe driver currently is incorrect because the tasklet call is
made without protecting the cq pointer with a reference from having
the underlying memory freed before the deferred routine is called.
Executing the comp_handler inline fixes this problem.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Link: https://lore.kernel.org/r/20230327215643.10410-1-rpearsonhpe@gmail.com
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29 14:25:11 +03:00
Bob Pearson
d946716325 RDMA/rxe: Rewrite rxe_task.c
This patch is a major rewrite of the tasklet routines in rxe_task.c.  The
main motivation for this is the realization that the code violates the
safety of the qp pointer by correct reference counting.  When a tasklet is
scheduled from a verbs API the calling thread has a valid reference to the
qp and schedules the tasklet to run at a later time carrying a pointer to
the qp. Once the calling code returns however the qp can be destroyed at
any time. In order to correct this a reference to the qp must be taken
when the task is scheduled and held until it finishes running. This is
complicated by the tasklet library not alwys running a task that is
scheduled depending on whether someone else has scheduled it.

This patch moves the logic for deciding whether to run or schedule a task
outside of do_task() and guarantees that there is only one copy of the
task scheduled or running at a time.

Secondly the separate flags controlling teardown and draining of the task
are included in the task state machine and all references to the state are
protected by spinlocks to avoid consistency and memory barrier issues.

Link: https://lore.kernel.org/r/20230304174533.11296-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
f455a1bc97 RDMA/rxe: Make tasks schedule each other
Replace rxe_run_task() by rxe_sched_task() when tasks call each other.
These are not performance critical and mainly involve error paths but they
run the risk of causing deadlocks.

Link: https://lore.kernel.org/r/20230304174533.11296-8-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
960ebe97e5 RDMA/rxe: Remove __rxe_do_task()
The subroutine __rxe_do_task is not thread safe and it has no way to
guarantee that the tasks, which are designed with the assumption that they
are non-reentrant, are not reentered. All of its uses are non-performance
critical.

This patch replaces calls to __rxe_do_task with calls to
rxe_sched_task. It also removes irrelevant or unneeded if tests.

Instead of calling the task machinery a single call to the tasklet
function (rxe_requester, etc.) is sufficient to draing the queues if task
execution has been disabled or stopped.

Together these changes allow the removal of __rxe_do_task.

Link: https://lore.kernel.org/r/20230304174533.11296-7-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
a246aa2e8a RDMA/rxe: Remove qp reference counting in tasks
Currently each of the three tasklets requester, completer and responder in
the rxe driver take and release a reference to the qp argument at the
beginning and end of the subroutines. The caller passing in the qp
argument should be responsible for holding a reference to qp so these are
not required. Further doing so breaks the qp cleanup code in
rxe_qp_do_cleanup which calls these routines after all the references have
been dropped so they cannot drain the packet and work request queues as
intended.

In fact if these routines are deferred by calling tasklet_schedule there
is no guarantee that the calling code does have a qp reference.  That is a
bug in rxe_task.c which will be fixed later in this series.

Link: https://lore.kernel.org/r/20230304174533.11296-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
fbdeb828a2 RDMA/rxe: Cleanup error state handling in rxe_comp.c
Cleanup the handling of qp in the error state, reset state and during
rxe_qp_do_cleanup. Make the same as rxe_resp.c

Link: https://lore.kernel.org/r/20230304174533.11296-5-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
49dc9c1f0c RDMA/rxe: Cleanup reset state handling in rxe_resp.c
Cleanup the handling of qp in the error state, reset state and during
rxe_qp_do_cleanup. The error state does about the same thing as the others
but has code spread all over.

This patch combines them in a cleaner way.

Link: https://lore.kernel.org/r/20230304174533.11296-4-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
3946fc2a42 RDMA/rxe: Convert tasklet args to queue pairs
Originally is was thought that the tasklet machinery in rxe_task.c would
be used in other applications but that has not happened for years. This
patch replaces the 'void *arg' by struct 'rxe_qp *qp' in the parameters to
the tasklet calls. This change will have no affect on performance but may
make the code a little clearer.

Link: https://lore.kernel.org/r/20230304174533.11296-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:14:38 -03:00
Bob Pearson
5bf944f241 RDMA/rxe: Add error messages
This patch adds error and debug messages so that every interaction
with rdma-core through a verbs API call or a completion error return
will generate at least one error message backed up by debug messages
with more detail.

With dynamic debugging one can follow up after seeing an error message
by turning on the appropriate debug messages.

Link: https://lore.kernel.org/r/20230303221623.8053-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 10:41:49 -03:00
Bob Pearson
9ac01f434a RDMA/rxe: Extend dbg log messages to err and info
Extend the dbg log messages (e.g. rxe_dbg_xxx) to include
err and info types. rxe.c is modified to use these new log
messages as examples.

Link: https://lore.kernel.org/r/20230303221623.8053-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 10:41:49 -03:00
Bob Pearson
a9fb328721 RDMA/rxe: Change rxe_dbg to rxe_dbg_dev
Replace the name rxe_dbg with rxe_dbg_dev which better matches
the remaining rxe_dbg_xxx macros for debug messages with a
rxe device parameter. Reuse the name rxe_dbg for debug messages
which do not have a rxe device parameter.

Link: https://lore.kernel.org/r/20230303221623.8053-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 10:41:49 -03:00
Bob Pearson
9168d125ea RDMA/rxe: Replace exists by rxe in rxe.c
'exists' looks like a boolean. This patch replaces it by the
normal name used for the rxe device, 'rxe', which should be a
little less confusing. The second rxe_dbg() message is
incorrect since rxe is known to be NULL and this will cause a
seg fault if this message were ever sent. Replace it by pr_debug
for the moment.

Fixes: c6aba5ea00 ("RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe.c")
Link: https://lore.kernel.org/r/20230303221623.8053-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 10:41:48 -03:00
Natalia Petrova
b73a0b80c6 RDMA/rdmavt: Delete unnecessary NULL check
There is no need to check 'rdi->qp_dev' for NULL. The field 'qp_dev'
is created in rvt_register_device() which will fail if the 'qp_dev'
allocation fails in rvt_driver_qp_init(). Overwise this pointer
doesn't changed and passed to rvt_qp_exit() by the next step.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 0acb0cc7ec ("IB/rdmavt: Initialize and teardown of qpn table")
Signed-off-by: Natalia Petrova <n.petrova@fintech.ru>
Link: https://lore.kernel.org/r/20230303124408.16685-1-n.petrova@fintech.ru
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-14 11:57:01 +02:00
Kees Cook
f2f6e1661d IB/rdmavt: Fix target union member for rvt_post_one_wr()
The "cplen" result used by the memcpy() into struct rvt_swqe "wqe" may
be sized to 80 for struct rvt_ud_wr (which is member "ud_wr", not "wr"
which is only 40 bytes in size). Change the destination union member so
the compiler can use the correct bounds check.

struct rvt_swqe {
        union {
                struct ib_send_wr wr;   /* don't use wr.sg_list */
                struct rvt_ud_wr ud_wr;
		...
	};
	...
};

Silences false positive memcpy() run-time warning:

  memcpy: detected field-spanning write (size 80) of single field "&wqe->wr" at drivers/infiniband/sw/rdmavt/qp.c:2043 (size 40)

Reported-by: Zhang Yi <yizhan@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216561
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230218185701.never.779-kees@kernel.org
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-13 13:56:31 +02:00
Daniil Dulov
271bfcfb83 RDMA/siw: Fix potential page_array out of range access
When seg is equal to MAX_ARRAY, the loop should break, otherwise
it will result in out of range access.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: b9be6f18cf ("rdma/siw: transmit path")
Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
Link: https://lore.kernel.org/r/20230227091751.589612-1-d.dulov@aladdin.ru
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-13 13:54:49 +02:00
Linus Torvalds
8cbd92339d v6.3 RDMA pull request
Small cycle this time:
 
 - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw, mana
 
 - inline CQE support for hns
 
 - Have mlx5 display device error codes
 
 - Pinned DMABUF support for irdma
 
 - Continued rxe cleanups, particularly converting the MRs to use xarray
 
 - Improvements to what can be cached in the mlx5 mkey cache
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/gPmgAKCRCFwuHvBreF
 YW5IAP4xOAiTif4f87vD1twRU/ebq4VEX0r+C2NX5x5fwlCJrAEA7RLV8uG9Uii2
 ez0BuWNxfajuvFHntnZ1E+7UDP0S8gk=
 =CgUH
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Quite a small cycle this time, even with the rc8. I suppose everyone
  went to sleep over xmas.

   - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw,
     mana

   - inline CQE support for hns

   - Have mlx5 display device error codes

   - Pinned DMABUF support for irdma

   - Continued rxe cleanups, particularly converting the MRs to use
     xarray

   - Improvements to what can be cached in the mlx5 mkey cache"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (61 commits)
  IB/mlx5: Extend debug control for CC parameters
  IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors
  IB/hfi1: Fix math bugs in hfi1_can_pin_pages()
  RDMA/irdma: Add support for dmabuf pin memory regions
  RDMA/mlx5: Use query_special_contexts for mkeys
  net/mlx5e: Use query_special_contexts for mkeys
  net/mlx5: Change define name for 0x100 lkey value
  net/mlx5: Expose bits for querying special mkeys
  RDMA/rxe: Fix missing memory barriers in rxe_queue.h
  RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering memory on first packet
  RDMA/rxe: Remove rxe_alloc()
  RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size
  Subject: RDMA/rxe: Handle zero length rdma
  iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry()
  RDMA/mlx5: Use rdma_umem_for_each_dma_block()
  RDMA/umem: Remove unused 'work' member from struct ib_umem
  RDMA/irdma: Cap MSIX used to online CPUs + 1
  RDMA/mlx5: Check reg_create() create for errors
  RDMA/restrack: Correct spelling
  RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish()
  ...
2023-02-24 15:11:03 -08:00
Bob Pearson
a77a52385e RDMA/rxe: Fix missing memory barriers in rxe_queue.h
An earlier patch which introduced smp_load_acquire/smp_store_release
into rxe_queue.h incorrectly assumed that surrounding spin-locks in
rxe_verbs.c around queue updates for kernel ulps was sufficient to
protect the passing of data through the queues between the ulp and
the rxe tasklets. But this was incorrect. The typical sequence was

	ulp				rxe requester tasklet
	------------------------	---------------------
	spin_lock_irqsave()		wqe = queue_head(queue)
	if (!queue_full(q)) {		if (!wqe)
		spin_unlock_irqrestore		return;
		return -ENOMEM
	}				<process wqe>
	wqe = queue_producer_addr(q)
	<fill in wqe>			queue_advance_consumer(queue)
	queue_advance_producer(q)
	spin_unlock_irqrestore()

queue_head() calls queue_empty() which calls smp_load_acquire()
For user space apps queue_advance_producer() calls smp_store_release()
so that there is a memory barrier between the producer and the
consumer but for kernel ulps queue_advance_produce() just incremented
the producer index because the lock function is a release function.
But to work the barrier has to come between filling in the wqe and
updating the producer index. This patch adds the missing barriers.
It also changes the enum names for the ulp queue types to
	QUEUE_TYPE_FROM/TO_ULP instead of QUEUE_TYPE_TO/FROM_DRIVER
which is very ambiguous. This bug is suspected as the cause of very
rare lockups in a very high scale storage application. It is a bug
in any case and should be corrected.

Fixes: 0a67c46d2e ("RDMA/rxe: Protect user space index loads/stores")
Link: https://lore.kernel.org/r/20230214071053.5395-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16 12:07:05 -04:00
Bob Pearson
72a0362744 RDMA/rxe: Remove rxe_alloc()
Currently all the object types in the rxe driver are allocated in
rdma-core except for MRs. By moving tha kzalloc() call outside of
the pool code the rxe_alloc() subroutine can be eliminated and code
checking for MR as a special case can be removed.

This patch moves the kzalloc() and kfree_rcu() calls into the mr
registration and destruction verbs. It removes that code from
rxe_pool.c including the rxe_alloc() subroutine which is no longer
used.

Link: https://lore.kernel.org/r/20230213225551.12437-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16 11:30:11 -04:00
Bob Pearson
5ff31dfcd6 Subject: RDMA/rxe: Handle zero length rdma
Currently the rxe driver does not handle all cases of zero length rdma
operations correctly. The client does not have to provide an rkey for zero
length RDMA read or write operations so the rkey provided may be invalid
and should not be used to lookup an mr.

This patch corrects the driver to ignore the provided rkey if the reth
length is zero for read or write operations and make sure to set the mr to
NULL. In read_reply() if length is zero rxe_recheck_mr() is not
called. Warnings are added in the routines in rxe_mr.c to catch NULL MRs
when the length is non-zero.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20230202044240.6304-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16 11:13:52 -04:00
Bernard Metzler
65a8fc30fb RDMA/siw: Fix user page pinning accounting
To avoid racing with other user memory reservations, immediately
account full amount of pages to be pinned.

Fixes: 2251334dca ("rdma/siw: application buffer management")
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20230202101000.402990-1-bmt@zurich.ibm.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-06 14:46:50 +02:00
Jakub Kicinski
b568d3072a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/intel/ice/ice_main.c
  418e53401e ("ice: move devlink port creation/deletion")
  643ef23bd9 ("ice: Introduce local var for readability")
https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/
https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/

drivers/net/ethernet/engleder/tsnep_main.c
  3d53aaef43 ("tsnep: Fix TX queue stop/wake for multiple queues")
  25faa6a4c5 ("tsnep: Replace TX spin_lock with __netif_tx_lock")
https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/

net/netfilter/nf_conntrack_proto_sctp.c
  13bd9b31a9 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"")
  a44b765148 ("netfilter: conntrack: unify established states for SCTP paths")
  f71cb8f45d ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets")
https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/
https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-27 22:56:18 -08:00
Bob Pearson
592627ccbd RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray
Replace struct rxe-phys_buf and struct rxe_map by struct xarray
in rxe_verbs.h. This allows using rcu locking on reads for
the memory maps stored in each mr.

This is based off of a sketch of a patch from Jason Gunthorpe in the
link below. Some changes were needed to make this work. It applies
cleanly to the current for-next and passes the pyverbs, perftest
and the same blktests test cases which run today.

Link: https://lore.kernel.org/r/20230119235936.19728-7-rpearsonhpe@gmail.com
Link: https://lore.kernel.org/linux-rdma/Y3gvZr6%2FNCii9Avy@nvidia.com/
Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27 12:14:14 -04:00
Bob Pearson
325a7eb851 RDMA/rxe: Cleanup page variables in rxe_mr.c
Cleanup usage of mr->page_shift and mr->page_mask and introduce
an extractor for mr->ibmr.page_size. Normal usage in the kernel
has page_mask masking out offset in page rather than masking out
the page number. The rxe driver had reversed that which was confusing.
Implicitly there can be a per mr page_size which was not uniformly
supported.

Link: https://lore.kernel.org/r/20230119235936.19728-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:46 -04:00
Bob Pearson
d8bdb0ebca RDMA-rxe: Isolate mr code from atomic_write_reply()
Isolate mr specific code from atomic_write_reply() in rxe_resp.c into
a subroutine rxe_mr_do_atomic_write() in rxe_mr.c.
Check length for atomic write operation.
Make iova_to_vaddr() static.

Link: https://lore.kernel.org/r/20230119235936.19728-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:45 -04:00
Bob Pearson
f04d5b3d91 RDMA-rxe: Isolate mr code from atomic_reply()
Isolate mr specific code from atomic_reply() in rxe_resp.c into
a subroutine rxe_mr_do_atomic_op() in rxe_mr.c.
Minor cleanups to rxe_check_range() and iova_to_vaddr().
Move enum resp_state to rxe.h

Link: https://lore.kernel.org/r/20230119235936.19728-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:45 -04:00
Bob Pearson
db4729a525 RDMA/rxe: Move rxe_map_mr_sg to rxe_mr.c
Move rxe_map_mr_sg() to rxe_mr.c where it makes a little more sense.

Link: https://lore.kernel.org/r/20230119235936.19728-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:45 -04:00
Bob Pearson
ade58da2a7 RDMA/rxe: Cleanup mr_check_range
Remove blank lines and replace EFAULT by EINVAL when an invalid
mr type is used.

Link: https://lore.kernel.org/r/20230119235936.19728-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:44 -04:00
Peilin Ye
40e0b09081 net/sock: Introduce trace_sk_data_ready()
As suggested by Cong, introduce a tracepoint for all ->sk_data_ready()
callback implementations.  For example:

<...>
  iperf-609  [002] .....  70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable
  iperf-609  [002] .....  70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable
<...>

Suggested-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23 11:26:50 +00:00
Daisuke Matsuda
1aefe5c177 RDMA/rxe: Prevent faulty rkey generation
If you create MRs more than 0x10000 times after loading the module,
responder starts to reply NAKs for RDMA/Atomic operations because of rkey
violation detected in check_rkey(). The root cause is that rkeys are
incremented each time a new MR is created and the value overflows into the
range reserved for MWs.

This commit also increases the value of RXE_MAX_MW that has been limited
unlike other parameters.

Fixes: 0994a1bcd5 ("RDMA/rxe: Bump up default maximum values used via uverbs")
Link: https://lore.kernel.org/r/20221220080848.253785-2-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-09 10:48:16 -04:00
Daisuke Matsuda
3a73746b26 RDMA/rxe: Fix inaccurate constants in rxe_type_info
ibv_query_device() has reported incorrect device attributes, which are
actually not used by the device. Make the constants correspond with the
attributes shown to users.

Fixes: 3ccffe8abf ("RDMA/rxe: Move max_elem into rxe_type_info")
Fixes: 3225717f6d ("RDMA/rxe: Replace red-black trees by xarrays")
Link: https://lore.kernel.org/r/20221220080848.253785-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-09 10:48:16 -04:00
Linus Torvalds
ed56954cf5 v6.2 merge window 2nd pull request
Fix two build warnings on 32 bit platforms
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY50UewAKCRCFwuHvBreF
 YazuAQCHY/yOiziPBqNvI9jSb/MVGh1oaZQZRTkt5fOvLSGt8gD/Rv30/h6EcKfG
 S5A6eQ3qrVzEhp7P8mST2Q/MkGmQ+AQ=
 =Qw7m
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Fix two build warnings on 32 bit platforms

  It seems the linux-next CI and 0-day bot are not testing enough 32 bit
  configurations, as soon as you merged the rdma pull request there were
  two instant reports of warnings on these sytems that I would have
  thought should have been covered by time in linux-next

  Anyhow, here are the fixes so people don't hit problems with -Werror"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/siw: Fix pointer cast warning
  RDMA/rxe: Fix compile warnings on 32-bit
2022-12-17 08:23:42 -06:00
Arnd Bergmann
5244ca8867 RDMA/siw: Fix pointer cast warning
The previous build fix left a remaining issue in configurations with
64-bit dma_addr_t on 32-bit architectures:

drivers/infiniband/sw/siw/siw_qp_tx.c: In function 'siw_get_pblpage':
drivers/infiniband/sw/siw/siw_qp_tx.c:32:37: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
   32 |                 return virt_to_page((void *)paddr);
      |                                     ^

Use the same double cast here that the driver uses elsewhere to convert
between dma_addr_t and void*.

Fixes: 0d1b756acf ("RDMA/siw: Pass a pointer to virt_to_page()")
Link: https://lore.kernel.org/r/20221215170347.2612403-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-16 16:07:38 -04:00
Jason Gunthorpe
5fc24e6022 RDMA/rxe: Fix compile warnings on 32-bit
Move the conditional code into a function, with two varients so it is
harder to make these kinds of mistakes.

 drivers/infiniband/sw/rxe/rxe_resp.c: In function 'atomic_write_reply':
 drivers/infiniband/sw/rxe/rxe_resp.c:794:13: error: unused variable 'payload' [-Werror=unused-variable]
   794 |         int payload = payload_size(pkt);
       |             ^~~~~~~
 drivers/infiniband/sw/rxe/rxe_resp.c:793:24: error: unused variable 'mr' [-Werror=unused-variable]
   793 |         struct rxe_mr *mr = qp->resp.mr;
       |                        ^~
 drivers/infiniband/sw/rxe/rxe_resp.c:791:19: error: unused variable 'dst' [-Werror=unused-variable]
   791 |         u64 src, *dst;
       |                   ^~~
 drivers/infiniband/sw/rxe/rxe_resp.c:791:13: error: unused variable 'src' [-Werror=unused-variable]
   791 |         u64 src, *dst;

Fixes: 034e285f8b ("RDMA/rxe: Make responder support atomic write on RC service")
Link: https://lore.kernel.org/linux-rdma/Y5s+EVE7eLWQqOwv@nvidia.com/
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-15 11:32:06 -04:00
Linus Torvalds
ab425febda v6.2 merge window pull request
Usual size of updates, a new driver a most of the bulk focusing on rxe:
 
 - Usual typos, style, and language updates
 
 - Driver updates for mlx5, irdma, siw, rts, srp, hfi1, hns, erdma, mlx4, srp
 
 - Lots of RXE updates
   * Improve reply error handling for bad MR operations
   * Code tidying
   * Debug printing uses common loggers
   * Remove half implemented RD related stuff
   * Support IBA's recently defined Atomic Write and Flush operations
 
 - erdma support for atomic operations
 
 - New driver "mana" for Ethernet HW available in Azure VMs. This driver
   only supports DPDK
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5eIggAKCRCFwuHvBreF
 YeX7AP9+l5Y9J48OmK7y/YgADNo9g05agXp3E8EuUDmBU+PREgEAigdWaJVf2oea
 IctVja0ApLW5W+wsFt8Qh+V4PMiYTAM=
 =Q5V+
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Usual size of updates, a new driver, and most of the bulk focusing on
  rxe:

   - Usual typos, style, and language updates

   - Driver updates for mlx5, irdma, siw, rts, srp, hfi1, hns, erdma,
     mlx4, srp

   - Lots of RXE updates:
      * Improve reply error handling for bad MR operations
      * Code tidying
      * Debug printing uses common loggers
      * Remove half implemented RD related stuff
      * Support IBA's recently defined Atomic Write and Flush operations

   - erdma support for atomic operations

   - New driver 'mana' for Ethernet HW available in Azure VMs. This
     driver only supports DPDK"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (122 commits)
  IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces
  RDMA: Add missed netdev_put() for the netdevice_tracker
  RDMA/rxe: Enable RDMA FLUSH capability for rxe device
  RDMA/cm: Make QP FLUSHABLE for supported device
  RDMA/rxe: Implement flush completion
  RDMA/rxe: Implement flush execution in responder side
  RDMA/rxe: Implement RC RDMA FLUSH service in requester side
  RDMA/rxe: Extend rxe packet format to support flush
  RDMA/rxe: Allow registering persistent flag for pmem MR only
  RDMA/rxe: Extend rxe user ABI to support flush
  RDMA: Extend RDMA kernel verbs ABI to support flush
  RDMA: Extend RDMA user ABI to support flush
  RDMA/rxe: Fix incorrect responder length checking
  RDMA/rxe: Fix oops with zero length reads
  RDMA/mlx5: Remove not-used IB_FLOW_SPEC_IB define
  RDMA/hns: Fix XRC caps on HIP08
  RDMA/hns: Fix error code of CMD
  RDMA/hns: Fix page size cap from firmware
  RDMA/hns: Fix PBL page MTR find
  RDMA/hns: Fix AH attr queried by query_qp
  ...
2022-12-14 09:27:13 -08:00
Li Zhijian
124011e6e9 RDMA/rxe: Enable RDMA FLUSH capability for rxe device
Now we are ready to enable RDMA FLUSH capability for RXE.
It can support Global Visibility and Persistence placement types.

Link: https://lore.kernel.org/r/20221206130201.30986-11-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
Li Zhijian
70aad902ce RDMA/rxe: Implement flush completion
Per IBA SPEC, FLUSH will ack in rdma read response with 0 length.

Use IB_WC_FLUSH (aka IB_UVERBS_WC_FLUSH) code to tell userspace a FLUSH
completion.

Link: https://lore.kernel.org/r/20221206130201.30986-9-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
Li Zhijian
ea1bb00ee9 RDMA/rxe: Implement flush execution in responder side
Only the requested placement types that also registered in the destination
memory region are acceptable.
Otherwise, responder will also reply NAK "Remote Access Error" if it
found a placement type violation.

We will persist data via arch_wb_cache_pmem(), which could be
architecture specific.

This commit also adds 2 helpers to update qp.resp from the incoming packet.

Link: https://lore.kernel.org/r/20221206130201.30986-8-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
Li Zhijian
fa1fd682ad RDMA/rxe: Implement RC RDMA FLUSH service in requester side
Implement FLUSH request operation in the requester.

Link: https://lore.kernel.org/r/20221206130201.30986-7-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
Li Zhijian
02e9a31c89 RDMA/rxe: Extend rxe packet format to support flush
Extend rxe opcode tables, headers, helper and constants to support
flush operations.

Refer to the IBA A19.4.1 for more FETH definition details

Link: https://lore.kernel.org/r/20221206130201.30986-6-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
Li Zhijian
02ea0a5115 RDMA/rxe: Allow registering persistent flag for pmem MR only
Memory region could  support at most 2 flush access flags:
IB_ACCESS_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL

But we only allow user to register persistent flush flags to the pmem MR
where it has the ability of persisting data across power cycles.

So registering a persistent access flag to a non-pmem MR will be rejected.

Link: https://lore.kernel.org/r/20221206130201.30986-5-lizhijian@fujitsu.com
CC: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
Bob Pearson
689c5421bf RDMA/rxe: Fix incorrect responder length checking
The code in rxe_resp.c at check_length() is incorrect as it compares
pkt->opcode an 8 bit value against various mask bits which are all higher
than 256 so nothing is ever reported.

This patch rewrites this to compare against pkt->mask which is
correct. However this now exposes another error. For UD send packets the
value of the pmtu cannot be determined from qp->mtu.  All that is required
here is to later check if the payload fits into the posted receive buffer
in that case.

Fixes: 837a55847e ("RDMA/rxe: Implement packet length validation on responder")
Link: https://lore.kernel.org/r/20221208210945.28607-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:26:27 -04:00
Daisuke Matsuda
3282a549cf RDMA/rxe: Fix oops with zero length reads
The commit 686d348476 ("RDMA/rxe: Remove unnecessary mr testing") causes
a kernel crash. If responder get a zero-byte RDMA Read request,
qp->resp.mr is not set in check_rkey() (see IBA C9-88). The mr is NULL in
this case, and a NULL pointer dereference occurs as shown below.

 BUG: kernel NULL pointer dereference, address: 0000000000000010
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD 0 P4D 0
 Oops: 0002 [#1] PREEMPT SMP PTI
 CPU: 2 PID: 3622 Comm: python3 Kdump: loaded Not tainted 6.1.0-rc3+ #34
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
 RIP: 0010:__rxe_put+0xc/0x60 [rdma_rxe]
 Code: cc cc cc 31 f6 e8 64 36 1b d3 41 b8 01 00 00 00 44 89 c0 c3 cc cc cc cc 41 89 c0 eb c1 90 0f 1f 44 00 00 41 54 b8 ff ff ff ff <f0> 0f c1 47 10 83 f8 01 74 11 45 31 e4 85 c0 7e 20 44 89 e0 41 5c
 RSP: 0018:ffffb27bc012ce78 EFLAGS: 00010246
 RAX: 00000000ffffffff RBX: ffff9790857b0580 RCX: 0000000000000000
 RDX: ffff979080fe145a RSI: 000055560e3e0000 RDI: 0000000000000000
 RBP: ffff97909c7dd800 R08: 0000000000000001 R09: e7ce43d97f7bed0f
 R10: ffff97908b29c300 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: ffff97908b29c300 R15: 0000000000000000
 FS:  00007f276f7bd740(0000) GS:ffff9792b5c80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000010 CR3: 0000000114230002 CR4: 0000000000060ee0
 Call Trace:
  <IRQ>
  read_reply+0xda/0x310 [rdma_rxe]
  rxe_responder+0x82d/0xe50 [rdma_rxe]
  do_task+0x84/0x170 [rdma_rxe]
  tasklet_action_common.constprop.0+0xa7/0x120
  __do_softirq+0xcb/0x2ac
  do_softirq+0x63/0x90
  </IRQ>

Support a NULL mr during read_reply()

Fixes: 686d348476 ("RDMA/rxe: Remove unnecessary mr testing")
Fixes: b5f9a01fae ("RDMA/rxe: Fix mr leak in RESPST_ERR_RNR")
Link: https://lore.kernel.org/r/20221209045926.531689-1-matsuda-daisuke@fujitsu.com
Link: https://lore.kernel.org/r/20221202145713.13152-1-lizhijian@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 15:57:51 -04:00
Jason Gunthorpe
d69e8c63fc Linux 6.1-rc8
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmONI6weHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG9xgH/jqXGuMoO1ikfmGb
 7oY0W/f69G9V/e0DxFLvnIjhFgCUzdnNsmD4jQJA4x6QsxwLWuvpI282Ez+bHV5T
 U4RPsxJZIIMsXE2lKM9BRgeLzDdCt0aK4Pj+3x2x7NZC5cWFSQ8PyQJkCwg+0PQo
 u8Ly+GO8c4RUMf4/rrAZQq16qZUqGDaGm1EJhtSoa+KiR81LmUUmbDIK9Mr53rmQ
 wou+95XhibwMWr17WgXA28bTgYqn9UGr67V3qvTH2LC7GW8BCoKvn+3wh6TVhlWj
 dsWplXgcOP0/OHvSC5Sb1Uibk5Gx3DlIzYa6OfNZQuZ5xmQqm9kXjW8lmYpWFHy/
 38/5HWc=
 =EuoA
 -----END PGP SIGNATURE-----

Merge tag 'v6.1-rc8' into rdma.git for-next

For dependencies in following patches

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 15:52:17 -04:00
Xiao Yang
4cd9f1d320 RDMA/rxe: Enable atomic write capability for rxe device
The capability shows that rxe device supports atomic write operation.

Link: https://lore.kernel.org/r/1669905568-62-4-git-send-email-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01 19:51:10 -04:00
Xiao Yang
3aec427bb1 RDMA/rxe: Implement atomic write completion
Generate an atomic write completion when the atomic write request
has been finished.

Link: https://lore.kernel.org/r/1669905568-62-3-git-send-email-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01 19:51:10 -04:00
Xiao Yang
034e285f8b RDMA/rxe: Make responder support atomic write on RC service
Make responder process an atomic write request and send a read response
on RC service.

Link: https://lore.kernel.org/r/1669905568-62-2-git-send-email-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01 19:51:09 -04:00
Xiao Yang
abb633cf28 RDMA/rxe: Make requester support atomic write on RC service
Make requester process and send an atomic write request on RC service.

Link: https://lore.kernel.org/r/1669905568-62-1-git-send-email-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01 19:51:09 -04:00
Xiao Yang
5c7af6c793 RDMA/rxe: Extend rxe packet format to support atomic write
Extend rxe_wr_opcode_info[] and rxe_opcode[] for new atomic write opcode.

Link: https://lore.kernel.org/r/1669905432-14-5-git-send-email-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01 19:51:09 -04:00
David Hildenbrand
129e636fe9 RDMA/siw: remove FOLL_FORCE usage
GUP now supports reliable R/O long-term pinning in COW mappings, such
that we break COW early. MAP_SHARED VMAs only use the shared zeropage so
far in one corner case (DAXFS file with holes), which can be ignored
because GUP does not support long-term pinning in fsdax (see
check_vma_flags()).

Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required
for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop
using FOLL_FORCE, which is really only for ptrace access.

Link: https://lkml.kernel.org/r/20221116102659.70287-13-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bernard Metzler <bmt@zurich.ibm.com>
Cc: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-30 15:58:59 -08:00
Zhang Xiaoxu
f67376d801 RDMA/rxe: Fix NULL-ptr-deref in rxe_qp_do_cleanup() when socket create failed
There is a null-ptr-deref when mount.cifs over rdma:

  BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
  Read of size 8 at addr 0000000000000018 by task mount.cifs/3046

  CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3
  Call Trace:
   <TASK>
   dump_stack_lvl+0x34/0x44
   kasan_report+0xad/0x130
   rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
   execute_in_process_context+0x25/0x90
   __rxe_cleanup+0x101/0x1d0 [rdma_rxe]
   rxe_create_qp+0x16a/0x180 [rdma_rxe]
   create_qp.part.0+0x27d/0x340
   ib_create_qp_kernel+0x73/0x160
   rdma_create_qp+0x100/0x230
   _smbd_get_connection+0x752/0x20f0
   smbd_get_connection+0x21/0x40
   cifs_get_tcp_session+0x8ef/0xda0
   mount_get_conns+0x60/0x750
   cifs_mount+0x103/0xd00
   cifs_smb3_do_mount+0x1dd/0xcb0
   smb3_get_tree+0x1d5/0x300
   vfs_get_tree+0x41/0xf0
   path_mount+0x9b3/0xdd0
   __x64_sys_mount+0x190/0x1d0
   do_syscall_64+0x35/0x80
   entry_SYSCALL_64_after_hwframe+0x46/0xb0

The root cause of the issue is the socket create failed in
rxe_qp_init_req().

So move the reset rxe_qp_do_cleanup() after the NULL ptr check.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20221122151437.1057671-1-zhangxiaoxu5@huawei.com
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-22 15:55:54 -04:00
Jason Gunthorpe
cb6562c380 RDMA/rxe: Do not NULL deref on debugging failure path
Correct the mistake, mr is obviously NULL in this code path.

Fixes: 2778b72b1d ("RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mr.c")
Link: https://lore.kernel.org/r/Y3eeJW0AdyJYhYyQ@kili
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-22 15:53:02 -04:00
Li Zhijian
7d984dac8f RDMA/rxe: Fix mr->map double free
rxe_mr_cleanup() which tries to free mr->map again will be called when
rxe_mr_init_user() fails:

   CPU: 0 PID: 4917 Comm: rdma_flush_serv Kdump: loaded Not tainted 6.1.0-rc1-roce-flush+ #25
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
   Call Trace:
    <TASK>
    dump_stack_lvl+0x45/0x5d
    panic+0x19e/0x349
    end_report.part.0+0x54/0x7c
    kasan_report.cold+0xa/0xf
    rxe_mr_cleanup+0x9d/0xf0 [rdma_rxe]
    __rxe_cleanup+0x10a/0x1e0 [rdma_rxe]
    rxe_reg_user_mr+0xb7/0xd0 [rdma_rxe]
    ib_uverbs_reg_mr+0x26a/0x480 [ib_uverbs]
    ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x1a2/0x250 [ib_uverbs]
    ib_uverbs_cmd_verbs+0x1397/0x15a0 [ib_uverbs]

This issue was firstly exposed since commit b18c7da63f ("RDMA/rxe: Fix
memory leak in error path code") and then we fixed it in commit
8ff5f5d9d8 ("RDMA/rxe: Prevent double freeing rxe_map_set()") but this
fix was reverted together at last by commit 1e75550648 (Revert
"RDMA/rxe: Create duplicate mapping tables for FMRs")

Simply let rxe_mr_cleanup() always handle freeing the mr->map once it is
successfully allocated.

Fixes: 1e75550648 ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"")
Link: https://lore.kernel.org/r/1667099073-2-1-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18 20:15:51 -04:00
Zhu Yanjun
8e1a76493b RDMA/rxe: Remove reliable datagram support
The rdma_rxe driver does not actually support the reliable datagram
transport but contains a variable with RD opcodes in driver code.  And
this variable is never used. So remove it.

Link: https://lore.kernel.org/r/20221112023537.432912-1-yanjun.zhu@intel.com
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18 19:57:46 -04:00
Bernard Metzler
60da2d11fc RDMA/siw: Set defined status for work completion with undefined status
A malicious user may write undefined values into memory mapped completion
queue elements status or opcode. Undefined status or opcode values will
result in out-of-bounds access to an array mapping siw internal
representation of opcode and status to RDMA core representation when
reaping CQ elements. While siw detects those undefined values, it did not
correctly set completion status to a defined value, thus defeating the
whole purpose of the check.

This bug leads to the following Smatch static checker warning:

	drivers/infiniband/sw/siw/siw_cq.c:96 siw_reap_cqe()
	error: buffer overflow 'map_cqe_status' 10 <= 21

Fixes: bdf1da5df9 ("RDMA/siw: Fix immediate work request flush to completion queue")
Link: https://lore.kernel.org/r/20221115170747.1263298-1-bmt@zurich.ibm.com
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-15 16:47:00 -04:00
Bob Pearson
5de087250f RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mmap.c
Replace calls to pr_xxx() in rxe_mmap.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-17-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:07 -04:00
Bob Pearson
813728043b RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_icrc.c
Replace calls to pr_xxx() in rxe_icrc.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-16-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:06 -04:00
Bob Pearson
c6aba5ea00 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe.c
Replace calls to pr_xxx() in rxe.c with rxe_dbg_xxx().
Calls with a rxe device not yet in scope are left as is.

Link: https://lore.kernel.org/r/20221103171013.20659-15-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:06 -04:00
Bob Pearson
fc50597934 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_task.c
Replace calls to pr_xxx() in rxe_task.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-14-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:06 -04:00
Bob Pearson
25fd735a4c RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_av.c
Replace calls to pr_xxx() in rxe_av.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-13-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:05 -04:00
Bob Pearson
14e501fdb0 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_verbs.c
Replace calls to pr_xxx() in rxe_verbs.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-12-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:05 -04:00
Bob Pearson
0e6090024b RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_srq.c
Replace calls to pr_xxx() in rxe_srq.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-11-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:05 -04:00
Bob Pearson
74ddf7233c RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_resp.c
Replace calls to pr_xxx() in rxe_resp.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-10-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:05 -04:00
Bob Pearson
0edfb15e30 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_req.c
Replace calls to pr_xxx() in rxe_req.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:04 -04:00
Bob Pearson
6af70060d2 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_qp.c
Replace calls to pr_xxx() in rxe_qp.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-8-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:04 -04:00
Bob Pearson
34549e88e0 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_net.c
Replace (some) calls to pr_xxx() in rxe_net.c with rxe_dbg_xxx().
Calls with a rxe device not yet in scope are left as is.

Link: https://lore.kernel.org/r/20221103171013.20659-7-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:04 -04:00
Bob Pearson
e8a87efdf8 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mw.c
Replace calls to pr_xxx() int rxe_mw.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:03 -04:00
Bob Pearson
2778b72b1d RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mr.c
Replace calls to pr_xxx() in rxe_mr.c by rxe_dbg_mr().

Link: https://lore.kernel.org/r/20221103171013.20659-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:03 -04:00
Bob Pearson
52920f537a RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_cq.c
Replace calls to pr_xxx() in rxe_cq.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:03 -04:00
Bob Pearson
27c4c520bd RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_comp.c
Replace calls to pr_xxx() in rxe_comp.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:02 -04:00
Bob Pearson
4554bac48a RDMA/rxe: Add ibdev_dbg macros for rxe
Add macros borrowed from siw to call dynamic debug macro ibdev_dbg.

Link: https://lore.kernel.org/r/20221103171013.20659-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:02 -04:00
Daisuke Matsuda
837a55847e RDMA/rxe: Implement packet length validation on responder
The function check_length() is supposed to check the length of inbound
packets on responder, but it actually has been a stub since the driver was
born. Let it check the payload length and the DMA length.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Link: https://lore.kernel.org/r/20221107055338.357184-1-matsuda-daisuke@fujitsu.com
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-09 19:54:57 +02:00
Bernard Metzler
bdf1da5df9 RDMA/siw: Fix immediate work request flush to completion queue
Correctly set send queue element opcode during immediate work request
flushing in post sendqueue operation, if the QP is in ERROR state.
An undefined ocode value results in out-of-bounds access to an array
for mapping the opcode between siw internal and RDMA core representation
in work completion generation. It resulted in a KASAN BUG report
of type 'global-out-of-bounds' during NFSoRDMA testing.

This patch further fixes a potential case of a malicious user which may
write undefined values for completion queue elements status or opcode,
if the CQ is memory mapped to user land. It avoids the same out-of-bounds
access to arrays for status and opcode mapping as described above.

Fixes: 303ae1cdfd ("rdma/siw: application interface")
Fixes: b0fff7317b ("rdma/siw: completion queue methods")
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20221107145057.895747-1-bmt@zurich.ibm.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-09 15:26:49 +02:00
Yunsheng Lin
692373d186 RDMA/rxe: cleanup some error handling in rxe_verbs.c
Instead of 'goto and return', just return directly to
simplify the error handling, and avoid some unnecessary
return value check.

Link: https://lore.kernel.org/r/20221028075053.3990467-1-xuhaoyue1@hisilicon.com
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 15:11:44 -03:00
Xiao Yang
b071850ef6 RDMA/rxe: Remove the duplicate assignment of mr->map_shift
mr->map_shift is set to ilog2(RXE_BUF_PER_MAP) in both rxe_mr_init() and
rxe_mr_alloc() so remove the duplicate one in rxe_mr_init().

Link: https://lore.kernel.org/r/1666855893-145-1-git-send-email-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 15:08:41 -03:00
Li Zhijian
875ab4a8d9 RDMA/rxe: Make sure requested access is a subset of {mr,mw}->access
We should reject the requests with access flags that is not registered by
MR/MW. For example, lookup_mr() should return NULL when requested access
is 0x03 and mr->access is 0x01.

Link: https://lore.kernel.org/r/20220927055337.22630-2-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 14:39:47 -03:00
Bob Pearson
63a18baef2 RDMA/rxe: Rename task->state_lock to task->lock
Rename task-state_lock to task->lock

Link: https://lore.kernel.org/r/20221021200118.2163-7-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 13:47:16 -03:00
Bob Pearson
dcef28528c RDMA/rxe: Make rxe_do_task static
The subroutine rxe_do_task() is only called in rxe_task.c. This patch
makes it static and renames it do_task().

Link: https://lore.kernel.org/r/20221021200118.2163-6-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 13:47:15 -03:00
Bob Pearson
dccb23f6c3 RDMA/rxe: Split rxe_run_task() into two subroutines
Split rxe_run_task(task, sched) into rxe_run_task(task) and
rxe_sched_task(task).

Link: https://lore.kernel.org/r/20221021200118.2163-5-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 13:47:15 -03:00
Bob Pearson
de669ae8af RDMA/rxe: Removed unused name from rxe_task struct
The name field in struct rxe_task is never used. This patch removes it.

Link: https://lore.kernel.org/r/20221021200118.2163-4-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 13:47:15 -03:00
Bob Pearson
98a54f1706 RDMA/rxe: Remove init of task locks from rxe_qp.c
The calls to spin_lock_init() for the tasklet spinlocks in
rxe_qp_init_misc() are redundant since they are intiialized in
rxe_init_task().  This patch removes them.

Link: https://lore.kernel.org/r/20221021200118.2163-3-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 13:47:15 -03:00
Bob Pearson
05e88ebb9e RDMA/rxe: Remove redundant header files
Remove unneeded include files.

Link: https://lore.kernel.org/r/20221021200118.2163-2-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 13:47:15 -03:00
Li Zhijian
b5f9a01fae RDMA/rxe: Fix mr leak in RESPST_ERR_RNR
rxe_recheck_mr() will increase mr's ref_cnt, so we should call rxe_put(mr)
to drop mr's ref_cnt in RESPST_ERR_RNR to avoid below warning:

  WARNING: CPU: 0 PID: 4156 at drivers/infiniband/sw/rxe/rxe_pool.c:259 __rxe_cleanup+0x1df/0x240 [rdma_rxe]
...
  Call Trace:
   rxe_dereg_mr+0x4c/0x60 [rdma_rxe]
   ib_dereg_mr_user+0xa8/0x200 [ib_core]
   ib_mr_pool_destroy+0x77/0xb0 [ib_core]
   nvme_rdma_destroy_queue_ib+0x89/0x240 [nvme_rdma]
   nvme_rdma_free_queue+0x40/0x50 [nvme_rdma]
   nvme_rdma_teardown_io_queues.part.0+0xc3/0x120 [nvme_rdma]
   nvme_rdma_error_recovery_work+0x4d/0xf0 [nvme_rdma]
   process_one_work+0x582/0xa40
   ? pwq_dec_nr_in_flight+0x100/0x100
   ? rwlock_bug.part.0+0x60/0x60
   worker_thread+0x2a9/0x700
   ? process_one_work+0xa40/0xa40
   kthread+0x168/0x1a0
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork+0x22/0x30

Link: https://lore.kernel.org/r/20221024052049.20577-1-lizhijian@fujitsu.com
Fixes: 8a1a0be894 ("RDMA/rxe: Replace mr by rkey in responder resources")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-25 08:59:29 +03:00
Li Zhijian
686d348476 RDMA/rxe: Remove unnecessary mr testing
Before the testing, we already passed it to rxe_mr_copy() where mr could
be dereferenced. so this checking is not needed.

The only way that mr is NULL is when it reaches below line 780 with
 'qp->resp.mr = NULL', which is not possible in Bob's explanation[1].

 778         if (res->state == rdatm_res_state_new) {
 779                 if (!res->replay) {
 780                         mr = qp->resp.mr;
 781                         qp->resp.mr = NULL;
 782                 } else {

[1] https://lore.kernel.org/lkml/30ff25c4-ce66-eac4-eaa2-64c0db203a19@gmail.com/

Link: https://lore.kernel.org/r/1666582315-2-1-git-send-email-lizhijian@fujitsu.com
CC: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-25 08:56:32 +03:00
Daisuke Matsuda
5ac814e02e RDMA/rxe: Handle remote errors in the midst of a Read reply sequence
Requesting nodes do not handle a reported error correctly if it is
generated in the middle of multi-packet Read responses, and the node tries
to resend the request endlessly. Let completer terminate the connection in
that case.

Link: https://lore.kernel.org/r/20221013014724.3786212-2-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-25 08:56:32 +03:00
Daisuke Matsuda
5ebc548f4f RDMA/rxe: Make responder handle RDMA Read failures
Currently, responder can reply packets with invalid payloads if it fails
to copy messages to the packets. Add an error handling in read_reply() to
inform a requesting node of the failure.

Link: https://lore.kernel.org/r/20221013014724.3786212-1-matsuda-daisuke@fujitsu.com
Suggested-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-25 08:56:32 +03:00
yangx.jy@fujitsu.com
71d2363991 RDMA/rxe: Remove the member 'type' of struct rxe_mr
The member 'type' is included in both struct rxe_mr and struct ib_mr
so remove the duplicate one of struct rxe_mr.

Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Link: https://lore.kernel.org/r/20221021134513.17730-1-yangx.jy@fujitsu.com
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24 14:50:14 +03:00
Jason Gunthorpe
33331a728c Linux 6.0
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmM5/fMeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG+Y0H/jG0HQjUHfIOGjPQ
 T33eaF5POoZKzZmsTNITbtya7B3/OTZZRGdSq9B8t5tElyPnZrdIfaXds17mCI8y
 5DJUEQGdv6fqRttYLGKf6rk1kzABhhaTS8n9BgDFEsmvdqwG4AV6dLQr3HL09gTV
 Lu+Is86RPwmpgH0Z9u7DyFCF8yLjefyu0vl6eFm/SXjCE8gQM/LZQSi9mv5/YDxa
 MVKlsZKIkQm6P8a6r8wbGedKYBped4+4gYedsf/IcS0lvKdLIs/P7YgR5wKklSbM
 tqECvBOmKq1Fwj/oxH+fx0KLX/ZD1xxQwoQd+a9DOSo+BuPBt6KGojYT9gQRyFJp
 R7gyUCo=
 =2UOD
 -----END PGP SIGNATURE-----

Merge tag 'v6.0' into rdma.git for-next

Trvial merge conflicts against rdma.git for-rc resolved matching
linux-next:
            drivers/infiniband/hw/hns/hns_roce_hw_v2.c
            drivers/infiniband/hw/hns/hns_roce_main.c

https://lore.kernel.org/r/20220929124005.105149-1-broonie@kernel.org

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-06 19:48:45 -03:00
Daisuke Matsuda
8ad891ed43 RDMA/rxe: Remove error/warning messages from packet receiver path
Incoming packets to rxe are passed from UDP layer using an encapsulation
socket. If there are any clients reachable to a node, they can invoke the
encapsulation handler arbitrarily by sending malicious or irrelevant
packets. This can potentially cause a message overflow and a subsequent
slowdown on the node.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Link: https://lore.kernel.org/r/20220929080023.304242-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-09-29 12:57:56 +03:00
Xiu Jianfeng
78657a445c IB/rdmavt: Add __init/__exit annotations to module init/exit funcs
Add missing __init/__exit annotations to module init/exit funcs.

Fixes: 0194621b22 ("IB/rdmavt: Create module framework and handle driver registration")
Link: https://lore.kernel.org/r/20220924091457.52446-1-xiujianfeng@huawei.com
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27 10:15:25 -03:00
Bob Pearson
6c5e683925 RDMA/rxe: Remove redundant num_sge fields
In include/uapi/rdma/rdma_user_rxe.h there are redundant copies of num_sge
in the rxe_send_wr, rxe_recv_wqe, and rxe_dma_info. Only the ones in
rxe_dma_info are actually used by the rxe kernel driver.

The userspace would set these values, but the kernel never read them.

This change has no affect on the current ABI and new or old versions of
rdma-core operate correctly with new or old versions of the kernel rxe
driver.

Link: https://lore.kernel.org/r/20220913222716.18335-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27 10:15:24 -03:00
Bob Pearson
fda5d0cf8a RDMA/rxe: Fix resize_finish() in rxe_queue.c
Currently in resize_finish() in rxe_queue.c there is a loop which copies
the entries in the original queue into a newly allocated queue.  The
termination logic for this loop is incorrect. The call to
queue_next_index() updates cons but has no effect on whether the queue is
empty. So if the queue starts out empty nothing is copied but if it is not
then the loop will run forever. This patch changes the loop to compare the
value of cons to the original producer index.

Fixes: ae6e843fe0 ("RDMA/rxe: Add memory barriers to kernel queues")
Link: https://lore.kernel.org/r/20220825221446.6512-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27 10:15:24 -03:00
Bob Pearson
58651bbb30 RDMA/rxe: Set pd early in mr alloc routines
Move setting of pd in mr objects ahead of any possible errors so that it
will always be set in rxe_mr_cleanup() to avoid seg faults when
rxe_put(mr_pd(mr)) is called.

Fixes: cf40367961 ("RDMA/rxe: Move mr cleanup code to rxe_mr_cleanup()")
Link: https://lore.kernel.org/r/20220805183153.32007-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27 10:15:24 -03:00
Li Zhijian
f994ae0a14 RDMA/rxe: Add send_common_ack() helper
Most code in send_ack() and send_atomic_ack() are duplicate, move them to
a new helper send_common_ack().

In newer IBA spec, some opcodes require acknowledge with a zero-length
read response, with this new helper, we can easily implement it later.

Link: https://lore.kernel.org/r/1659335010-2-1-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-26 14:14:25 -03:00
Daisuke Matsuda
954afc5a8f RDMA/rxe: Use members of generic struct in rxe_mr
rxe_mr and ib_mr have interchangeable members. Remove device specific
members and use ones in the generic struct. Both 'iova' and 'length' are
filled in ib_uverbs or ib_core layer after MR registration.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Link: https://lore.kernel.org/r/20220921080844.1616883-2-matsuda-daisuke@fujitsu.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-22 12:46:39 +03:00
Bernard Metzler
a3c278807a RDMA/siw: Fix QP destroy to wait for all references dropped.
Delay QP destroy completion until all siw references to QP are
dropped. The calling RDMA core will free QP structure after
successful return from siw_qp_destroy() call, so siw must not
hold any remaining reference to the QP upon return.
A use-after-free was encountered in xfstest generic/460, while
testing NFSoRDMA. Here, after a TCP connection drop by peer,
the triggered siw_cm_work_handler got delayed until after
QP destroy call, referencing a QP which has already freed.

Fixes: 303ae1cdfd ("rdma/siw: application interface")
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20220920082503.224189-1-bmt@zurich.ibm.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-20 21:23:52 +03:00
Bernard Metzler
754209850d RDMA/siw: Always consume all skbuf data in sk_data_ready() upcall.
For header and trailer/padding processing, siw did not consume new
skb data until minimum amount present to fill current header or trailer
structure, including potential payload padding. Not consuming any
data during upcall may cause a receive stall, since tcp_read_sock()
is not upcalling again if no new data arrive.
A NFSoRDMA client got stuck at RDMA Write reception of unaligned
payload, if the current skb did contain only the expected 3 padding
bytes, but not the 4 bytes CRC trailer. Expecting 4 more bytes already
arrived in another skb, and not consuming those 3 bytes in the current
upcall left the Write incomplete, waiting for the CRC forever.

Fixes: 8b6a361b8c ("rdma/siw: receive path")
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Tested-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20220920081202.223629-1-bmt@zurich.ibm.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-20 21:21:18 +03:00
Li Zhijian
415a04844a RDMA/rxe: convert pr_warn to pr_debug
They could be triggered by user APIs with invalid parameters.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://lore.kernel.org/r/1662518901-2-2-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-08 11:03:15 +03:00
Li Zhijian
e2edba67fc RDMA/rxe: use %u to print u32 variables
struct ib_qp_cap {
        u32     max_send_wr;
        u32     max_recv_wr;
        u32     max_send_sge;
        u32     max_recv_sge;
        u32     max_inline_data;
...

To avoid getting a negative value from dmesg:
[410580.579965] rdma_rxe: invalid send sge = 65535 > 32
[410580.583818] rdma_rxe: invalid send wr = -1 > 1048576
[410582.771323] rdma_rxe: invalid recv sge = 65535 > 32
[410582.775310] rdma_rxe: invalid recv wr = -1 > 1048576

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://lore.kernel.org/r/1662518901-2-1-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-08 11:03:15 +03:00
Linus Walleij
0d1b756acf RDMA/siw: Pass a pointer to virt_to_page()
Functions that work on a pointer to virtual memory such as
virt_to_pfn() and users of that function such as
virt_to_page() are supposed to pass a pointer to virtual
memory, ideally a (void *) or other pointer. However since
many architectures implement virt_to_pfn() as a macro,
this function becomes polymorphic and accepts both a
(unsigned long) and a (void *).

If we instead implement a proper virt_to_pfn(void *addr)
function the following happens (occurred on arch/arm):

drivers/infiniband/sw/siw/siw_qp_tx.c:32:23: warning: incompatible
  integer to pointer conversion passing 'dma_addr_t' (aka 'unsigned int')
  to parameter of type 'const void *' [-Wint-conversion]
drivers/infiniband/sw/siw/siw_qp_tx.c:32:37: warning: passing argument
  1 of 'virt_to_pfn' makes pointer from integer without a cast
  [-Wint-conversion]
drivers/infiniband/sw/siw/siw_qp_tx.c:538:36: warning: incompatible
  integer to pointer conversion passing 'unsigned long long'
  to parameter of type 'const void *' [-Wint-conversion]

Fix this with an explicit cast. In one case where the SIW
SGE uses an unaligned u64 we need a double cast modifying the
virtual address (va) to a platform-specific uintptr_t before
casting to a (void *).

Fixes: b9be6f18cf ("rdma/siw: transmit path")
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20220902215918.603761-1-linus.walleij@linaro.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-04 10:21:59 +03:00
Tom Talpey
fc5e1acf6a RDMA/siw: Add missing Kconfig selections
The SoftiWARP Kconfig is missing "select" for CRYPTO and CRYPTO_CRC32C.

In addition, it improperly "depends on" LIBCRC32C, this should be a
"select", similar to net/sctp and others. As a dependency, SIW fails
to appear in generic configurations.

Link: https://lore.kernel.org/r/d366bf02-3271-754f-fc68-1a84016d0e19@talpey.com
Signed-off-by: Tom Talpey <tom@talpey.com>
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-01 10:12:01 +03:00
Daisuke Matsuda
2c02249fcb RDMA/rxe: Delete error messages triggered by incoming Read requests
An incoming Read request causes multiple Read responses. If a user MR to
copy data from is unavailable or responder cannot send a reply, then the
error messages can be printed for each response attempt, resulting in
message overflow.

Link: https://lore.kernel.org/r/20220829071218.1639065-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-31 09:57:09 +03:00
Zhu Yanjun
f07853582d RDMA/rxe: Remove the unused variable obj
The member variable obj in struct rxe_task is not needed.
So remove it to save memory.

Link: https://lore.kernel.org/r/20220822011615.805603-4-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-31 09:53:13 +03:00
Zhu Yanjun
548ce2e667 RDMA/rxe: Fix the error caused by qp->sk
When sock_create_kern in the function rxe_qp_init_req fails,
qp->sk is set to NULL.

Then the function rxe_create_qp will call rxe_qp_do_cleanup
to handle allocated resource.

Before handling qp->sk, this variable should be checked.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220822011615.805603-3-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-31 09:53:13 +03:00
Zhu Yanjun
a625ca30ef RDMA/rxe: Fix "kernel NULL pointer dereference" error
When rxe_queue_init in the function rxe_qp_init_req fails,
both qp->req.task.func and qp->req.task.arg are not initialized.

Because of creation of qp fails, the function rxe_create_qp will
call rxe_qp_do_cleanup to handle allocated resource.

Before calling __rxe_do_task, both qp->req.task.func and
qp->req.task.arg should be checked.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220822011615.805603-2-yanjun.zhu@linux.dev
Reported-by: syzbot+ab99dc4c6e961eed8b8e@syzkaller.appspotmail.com
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-31 09:53:12 +03:00
Daisuke Matsuda
d4ecb56e86 RDMA/rxe: Remove an unused member from struct rxe_mr
Commit 1e75550648 ("Revert "RDMA/rxe: Create duplicate mapping tables for
FMRs"") brought back the member 'va' to struct rxe_mr. However, it is
actually used by nobody and thus can be removed.

Fixes: 1e75550648 ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"")
Link: https://lore.kernel.org/r/20220829012335.1212697-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-29 09:44:07 +03:00
Zhu Yanjun
fd5382c580 RDMA/rxe: Fix error unwind in rxe_create_qp()
In the function rxe_create_qp(), rxe_qp_from_init() is called to
initialize qp, internally things like the spin locks are not setup until
rxe_qp_init_req().

If an error occures before this point then the unwind will call
rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task()
which will oops when trying to access the uninitialized spinlock.

Move the spinlock initializations earlier before any failures.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220731063621.298405-1-yanjun.zhu@linux.dev
Reported-by: syzbot+833061116fa28df97f3b@syzkaller.appspotmail.com
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-08-02 14:29:41 -03:00
Bob Pearson
62494ec7fb RDMA/rxe: Split qp state for requester and completer
Currently the requester can continue to process send wqes after an local
qp operation error is detected because the setting of the qp state to the
error state is deferred until later. This patch splits the qp state for
the completer and requester into two separate states and sets
qp->req.state = QP_STATE_ERROR as soon as the error is detected before
another wqe can be executed.

Link: https://lore.kernel.org/r/1658307368-1851-4-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-08-02 13:53:36 -03:00
Li Zhijian
ae720bdb70 RDMA/rxe: Generate error completion for error requester QP state
As per IBTA specification, all subsequent WQEs while QP is in error state
should be completed with a flush error.

Here we check QP_STATE_ERROR after req_next_wqe() so that rxe_completer()
has chance to be called where it will set CQ state to FLUSH ERROR and the
completion can associate with its WQE.

Link: https://lore.kernel.org/r/1658307368-1851-3-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-08-02 13:45:12 -03:00
Li Zhijian
dea4266f7b RDMA/rxe: Update wqe_index for each wqe error completion
Previously, if user space keeps sending abnormal wqe, queue.index will
keep increasing while qp->req.wqe_index doesn't. Once
qp->req.wqe_index==queue.index in next round, req_next_wqe() will treat
queue as empty. In such case, no new completion would be generated.

Update wqe_index for each wqe completion so that req_next_wqe() can get
next wqe properly.

Link: https://lore.kernel.org/r/1658307368-1851-2-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-08-02 13:41:36 -03:00
Li Zhijian
1e75550648 Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"
Below 2 commits will be reverted:
 commit 8ff5f5d9d8 ("RDMA/rxe: Prevent double freeing rxe_map_set()")
 commit 647bf13ce9 ("RDMA/rxe: Create duplicate mapping tables for FMRs")

The community has a few bug reports which pointed this commit at last.
Some proposals are raised up in the meantime but all of them have no
follow-up operation.

The previous commit led the map_set of FMR to be not available any more if
the MR is registered again after invalidating. Although the mentioned
patch try to fix a potential race in building/accessing the same table
for fast memory regions, it broke rtrs etc ULPs. Since the latter could
be worse, revert this patch.

With previous commit, it's observed that a same MR in rnbd server will
trigger below code path:
 -> rxe_mr_init_fast()
 |-> alloc map_set() # map_set is uninitialized
 |...-> rxe_map_mr_sg() # build the map_set
     |-> rxe_mr_set_page()
 |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means
                          # we can access host memory(such rxe_mr_copy)
 |...-> rxe_invalidate_mr() # mr->state change to FREE from VALID
 |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE,
                          # but map_set was not built again
 |...-> rxe_mr_copy() # kernel crash due to access wild addresses
                      # that lookup from the map_set

The backtraces are not always identical.
[1st]----------
  RIP: 0010:lookup_iova+0x66/0xa0 [rdma_rxe]
  Code: 00 00 00 48 d3 ee 89 32 c3 4c 8b 18 49 8b 3b 48 8b 47 08 48 39 c6 72 38 48 29 c6 45 31 d2 b8 01 00 00 00 48 63 c8 48 c1 e1 04 <48> 8b 4c 0f 08 48 39 f1 77 21 83 c0 01 48 29 ce 3d 00 01 00 00 75
  RSP: 0018:ffffb7ff80063bf0 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff9b9949d86800 RCX: 0000000000000000
  RDX: ffffb7ff80063c00 RSI: 0000000049f6b378 RDI: 002818da00000004
  RBP: 0000000000000120 R08: ffffb7ff80063c08 R09: ffffb7ff80063c04
  R10: 0000000000000002 R11: ffff9b9916f7eef8 R12: ffff9b99488a0038
  R13: ffff9b99488a0038 R14: ffff9b9914fb346a R15: ffff9b990ab27000
  FS:  0000000000000000(0000) GS:ffff9b997dc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007efc33a98ed0 CR3: 0000000014f32004 CR4: 00000000001706f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   rxe_mr_copy.part.0+0x6f/0x140 [rdma_rxe]
   rxe_responder+0x12ee/0x1b60 [rdma_rxe]
   ? rxe_icrc_check+0x7e/0x100 [rdma_rxe]
   ? rxe_rcv+0x1d0/0x780 [rdma_rxe]
   ? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe]
   rxe_do_task+0x67/0xb0 [rdma_rxe]
   rxe_xmit_packet+0xc7/0x210 [rdma_rxe]
   rxe_requester+0x680/0xee0 [rdma_rxe]
   ? update_load_avg+0x5f/0x690
   ? update_load_avg+0x5f/0x690
   ? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client]

[2nd]----------
  RIP: 0010:rxe_mr_copy.part.0+0xa8/0x140 [rdma_rxe]
  Code: 00 00 49 c1 e7 04 48 8b 00 4c 8d 2c d0 48 8b 44 24 10 4d 03 7d 00 85 ed 7f 10 eb 6c 89 54 24 0c 49 83 c7 10 31 c0 85 ed 7e 5e <49> 8b 3f 8b 14 24 4c 89 f6 48 01 c7 85 d2 74 06 48 89 fe 4c 89 f7
  RSP: 0018:ffffae3580063bf8 EFLAGS: 00010202
  RAX: 0000000000018978 RBX: ffff9d7ef7a03600 RCX: 0000000000000008
  RDX: 000000000000007c RSI: 000000000000007c RDI: ffff9d7ef7a03600
  RBP: 0000000000000120 R08: ffffae3580063c08 R09: ffffae3580063c04
  R10: ffff9d7efece0038 R11: ffff9d7ec4b1db00 R12: ffff9d7efece0038
  R13: ffff9d7ef4098260 R14: ffff9d7f11e23c6a R15: 4c79500065708144
  FS:  0000000000000000(0000) GS:ffff9d7f3dc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fce47276c60 CR3: 0000000003f66004 CR4: 00000000001706f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   rxe_responder+0x12ee/0x1b60 [rdma_rxe]
   ? rxe_icrc_check+0x7e/0x100 [rdma_rxe]
   ? rxe_rcv+0x1d0/0x780 [rdma_rxe]
   ? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe]
   rxe_do_task+0x67/0xb0 [rdma_rxe]
   rxe_xmit_packet+0xc7/0x210 [rdma_rxe]
   rxe_requester+0x680/0xee0 [rdma_rxe]
   ? update_load_avg+0x5f/0x690
   ? update_load_avg+0x5f/0x690
   ? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client]
   rxe_do_task+0x67/0xb0 [rdma_rxe]
   tasklet_action_common.constprop.0+0x92/0xc0
   __do_softirq+0xe1/0x2d8
   run_ksoftirqd+0x21/0x30
   smpboot_thread_fn+0x183/0x220
   ? sort_range+0x20/0x20
   kthread+0xe2/0x110
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork+0x22/0x30

Link: https://lore.kernel.org/r/1658805386-2-1-git-send-email-lizhijian@fujitsu.com
Link: https://lore.kernel.org/all/20220210073655.42281-1-guoqing.jiang@linux.dev/T/
Link: https://www.spinics.net/lists/linux-rdma/msg110836.html
Link: https://lore.kernel.org/lkml/94a5ea93-b8bb-3a01-9497-e2021f29598a@linux.dev/t/
Tested-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-27 12:22:57 +03:00
Bob Pearson
c2ea08ca5e RDMA/rxe: Replace __rxe_do_task by rxe_run_task
In rxe_req.c replace calls to __rxe_do_task() by calls to rxe_run_task(..,
0). Using __rxe_do_task is an error because the completer tasklet is not
designed to be re-entrant and __rxe_do_task() should only be called when
it is clear that no one else could be calling the completer tasklet as is
the case in rxe_qp.c where this call is used in safe environments.

Link: https://lore.kernel.org/r/20220630190425.2251-10-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-22 17:43:00 -03:00
Bob Pearson
eff6d998ca RDMA/rxe: Limit the number of calls to each tasklet
Limit the maximum number of calls to each tasklet from rxe_do_task()
before yielding the cpu. When the limit is reached reschedule the tasklet
and exit the calling loop. This patch prevents one tasklet from consuming
100% of a cpu core and causing a deadlock or soft lockup.

Link: https://lore.kernel.org/r/20220630190425.2251-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-22 17:43:00 -03:00
Bob Pearson
8bb143c534 RDMA/rxe: Make the tasklet exits the same
Make changes to the three tasklets so that the exit logic from each is the
same. This makes the code easier to understand.

Link: https://lore.kernel.org/r/20220630190425.2251-8-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-22 17:43:00 -03:00
Bob Pearson
445fd4f4fb RDMA/rxe: Fix rnr retry behavior
Currently the completer tasklet when retransmit timer or the rnr timer
fires the same flag (qp->req.need_retry) is set so that if either timer
fires it will attempt to perform a retry flow on the send queue.  This has
the effect of responding to an RNR NAK at the first retransmit timer event
which might not allow the requested rnr timeout.

This patch adds a new flag (qp->req.wait_for_rnr_timer) which, if set,
prevents a retry flow until the rnr nak timer fires.

This patch fixes rnr retry errors which can be observed by running the
pyverbs test_rdmacm_async_traffic_external_qp multiple times. With this
patch applied they do not occur.

Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
Link: https://lore.kernel.org/linux-rdma/2bafda9e-2bb6-186d-12a1-179e8f6a2678@talpey.com/
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220630190425.2251-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-22 17:43:00 -03:00
Bob Pearson
930119a172 RDMA/rxe: Add rxe_is_fenced() subroutine
The code thc that decides whether to defer execution of a wqe in
rxe_requester.c is isolated into a subroutine rxe_is_fenced() and removed
from the call to req_next_wqe(). The condition whether a wqe should be
fenced is changed to comply with the IBA. Currently an operation is fenced
if the fence bit is set in the wqe flags and the last wqe has not
completed. For normal operations the IBA actually only requires that the
last read or atomic operation is complete.

Link: https://lore.kernel.org/r/20220630190425.2251-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-22 17:43:00 -03:00
Md Haris Iqbal
174e7b1370 RDMA/rxe: For invalidate compare according to set keys in mr
The 'rkey' input can be an lkey or rkey, and in rxe the lkey or rkey have
the same value, including the variant bits.

So, if mr->rkey is set, compare the invalidate key with it, otherwise
compare with the mr->lkey.

Since we already did a lookup on the non-varient bits to get this far, the
check's only purpose is to confirm that the wqe has the correct variant
bits.

Fixes: 001345339f ("RDMA/rxe: Separate HW and SW l/rkeys")
Link: https://lore.kernel.org/r/20220707073006.328737-1-haris.phnx@gmail.com
Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-22 17:35:53 -03:00
Bob Pearson
1603f89935 RDMA/rxe: Fix mw bind to allow any consumer key portion
The current implementation of rxe_check_bind_mw() in rxe_mw.c is incorrect
since it requires the new key portion provided by the mw consumer to be
different than the previous key portion. This is not required by the
IBA. Remove the test.

Link: https://lore.kernel.org/linux-rdma/fb4614e7-4cac-0dc7-3ef7-766dfd10e8f2@gmail.com/
Fixes: 32a577b4c3 ("Add support for bind MW work requests")
Link: https://lore.kernel.org/r/20220714204619.13396-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-21 11:35:31 -03:00
Zhang Jiaming
5abb71b47c RDMA/rxe: Fix spelling mistake in error print
There is a spelling mistake (writeable) in function rxe_check_bind_mw.
Fix it.

Signed-off-by: Zhang Jiaming <jiaming@nfschina.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-21 09:59:29 +03:00
Xiao Yang
68691bad98 RDMA/rxe: Remove unused qp parameter
The qp parameter in free_rd_atomic_resource() has become
unused so remove it directly.

Fixes: 15ae1375ea ("RDMA/rxe: Fix qp reference counting for atomic ops")
Link: https://lore.kernel.org/all/20220708035547.6592-1-yangx.jy@fujitsu.com/
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-19 11:31:09 +03:00
lizhijian@fujitsu.com
03905ac285 RDMA/rxe: Remove unused mask parameter
This parameter had been deprecated since below commit:
1a7085b342 ("RDMA/rxe: Skip adjusting remote addr for write in retry operation")

Link: https://lore.kernel.org/r/20220715035340.1900168-1-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18 14:50:35 +03:00
Xiao Yang
548c56dd2e RDMA/rxe: Rename rxe_atomic_reply to atomic_reply
It's better to use the unified naming format.

Link: https://lore.kernel.org/r/20220705145212.12014-2-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18 14:36:18 +03:00
Xiao Yang
882736fb3b RDMA/rxe: Add common rxe_prepare_res()
It's redundant to prepare resources for Read and Atomic
requests by different functions. Replace them by a common
rxe_prepare_res() with different parameters. In addition,
the common rxe_prepare_res() can also be used by new Flush
and Atomic Write requests in the future.

Link: https://lore.kernel.org/r/20220705145212.12014-1-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18 14:36:11 +03:00
Zhu Yanjun
37da51efe6 RDMA/rxe: Fix BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup
The function rxe_create_qp calls rxe_qp_from_init. If some error
occurs, the error handler of function rxe_qp_from_init will set
both scq and rcq to NULL.

Then rxe_create_qp calls rxe_put to handle qp. In the end,
rxe_qp_do_cleanup is called by rxe_put. rxe_qp_do_cleanup directly
accesses scq and rcq before checking them. This will cause
null-ptr-deref error.

The call graph is as below:

rxe_create_qp {
  ...
  rxe_qp_from_init {
    ...
  err1:
    ...
    qp->rcq = NULL;  <---rcq is set to NULL
    qp->scq = NULL;  <---scq is set to NULL
    ...
  }

qp_init:
  rxe_put{
    ...
    rxe_qp_do_cleanup {
      ...
      atomic_dec(&qp->scq->num_wq); <--- scq is accessed
      ...
      atomic_dec(&qp->rcq->num_wq); <--- rcq is accessed
    }
}

Fixes: 4703b4f0d9 ("RDMA/rxe: Enforce IBA C11-17")
Link: https://lore.kernel.org/r/20220705225414.315478-1-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18 14:32:39 +03:00
Cheng Xu
3056fc6c32 RDMA/siw: Fix duplicated reported IW_CM_EVENT_CONNECT_REPLY event
If siw_recv_mpa_rr returns -EAGAIN, it means that the MPA reply hasn't
been received completely, and should not report IW_CM_EVENT_CONNECT_REPLY
in this case. This may trigger a call trace in iw_cm. A simple way to
trigger this:
 server: ib_send_lat
 client: ib_send_lat -R <server_ip>

The call trace looks like this:

 kernel BUG at drivers/infiniband/core/iwcm.c:894!
 invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
 <...>
 Workqueue: iw_cm_wq cm_work_handler [iw_cm]
 Call Trace:
  <TASK>
  cm_work_handler+0x1dd/0x370 [iw_cm]
  process_one_work+0x1e2/0x3b0
  worker_thread+0x49/0x2e0
  ? rescuer_thread+0x370/0x370
  kthread+0xe5/0x110
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>

Fixes: 6c52fdc244 ("rdma/siw: connection management")
Link: https://lore.kernel.org/r/dae34b5fd5c2ea2bd9744812c1d2653a34a94c67.1657706960.git.chengyou@linux.alibaba.com
Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18 14:20:52 +03:00
Andrey Strachuk
aeea6cc067 RDMA: remove useless condition in siw_create_cq()
Comparison of 'cq' with NULL is useless since
'cq' is a result of container_of and cannot be NULL
in any reasonable scenario.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 303ae1cdfd ("rdma/siw: application interface")
Link: https://lore.kernel.org/r/20220711151251.17089-1-strochuk@ispras.ru
Signed-off-by: Andrey Strachuk <strochuk@ispras.ru>
Acked-by: Bernard Metzler <bmt@zurich.ibm.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18 12:27:28 +03:00
Zhang Jiaming
2635d2a8d4 IB: Fix spelling of 'writable'
There is a typo (writeable) in qib_file_ops.c, qib_sd7220.c's comments,
and in rxe_check_bind_mw()

Link: https://lore.kernel.org/r/20220701074812.12615-1-jiaming@nfschina.com
Link: https://lore.kernel.org/r/20220701080019.13329-1-jiaming@nfschina.com
Signed-off-by: Zhang Jiaming <jiaming@nfschina.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-04 10:14:57 -03:00
Bob Pearson
96938258b1 RDMA/rxe: Remove unnecessary include statement
rxe_verbs.h includes the file <rdma/rdma_user_rxe.h>.  It should have been
<uapi/rdma/rdma_user_rxe.h>, however, it is not used and not required in
this file.

This patch removes the include statement.

Link: https://lore.kernel.org/r/20220630190425.2251-4-rpearsonhpe@gmail.com
Reported-by: Frank Zago <frank.zago@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-04 10:04:09 -03:00
Bob Pearson
f5d1f6d63c RDMA/rxe: Replace include statement
rxe_queue.h currently includes <uapi/rdma/rdma_user_rxe.h> for a
definition of struct rxe_queue_buf. But it is only used as a pointer so
the definition is not needed.

This patch replaces the include statement with the declaration

     struct rxe_queue_buf;

Link: https://lore.kernel.org/r/20220630190425.2251-5-rpearsonhpe@gmail.com
Reported-by: Frank Zago <frank.zago@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30 20:45:00 -03:00
Bob Pearson
cae3fa541e RDMA/rxe: Convert pr_warn/err to pr_debug in pyverbs
The pyverbs test suite generates a few dmesg traces from intentional error
tests. This patch replaces those messages with pr_debug() calls which
improves the usefullness of the tests.

Link: https://lore.kernel.org/r/20220630190425.2251-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30 20:45:00 -03:00
Bob Pearson
7cb33d1bc1 RDMA/rxe: Fix deadlock in rxe_do_local_ops()
When a local operation (invalidate mr, reg mr, bind mw) is finished there
will be no ack packet coming from a responder to cause the wqe to be
completed. This may happen anyway if a subsequent wqe performs
IO. Currently if the wqe is signalled the completer tasklet is scheduled
immediately but not otherwise.

This leads to a deadlock if the next wqe has the fence bit set in send
flags and the operation is not signalled. This patch removes the condition
that the wqe must be signalled in order to schedule the completer tasklet
which is the simplest fix for this deadlock and is fairly low cost. This
is the analog for local operations of always setting the ackreq bit in all
last or only request packets even if the operation is not signalled.

Link: https://lore.kernel.org/r/20220523223251.15350-1-rpearsonhpe@gmail.com
Reported-by: Jenny Hack <jhack@hpe.com>
Fixes: c1a411268a ("RDMA/rxe: Move local ops to subroutine")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30 17:00:23 -03:00
Bob Pearson
dc18483881 RDMA/rxe: Merge normal and retry atomic flows
Make the execution of the atomic operation in rxe_atomic_reply()
conditional on res->replay and make duplicate_request() call into
rxe_atomic_reply() to merge the two flows. This is modeled on the behavior
of read reply. Delete the skb from the atomic responder resource since it
is no longer used. Adjust the reference counting of the qp in
send_atomic_ack() for this flow.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220606143836.3323-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30 14:00:21 -03:00
Bob Pearson
8264411595 RDMA/rxe: Move atomic original value to res
Move the saved original value to the atomic responder resource.  This
replaces saving it in the qp. In preparation for merging the normal and
retry atomic responder flows.

Link: https://lore.kernel.org/r/20220606143836.3323-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30 14:00:21 -03:00
Bob Pearson
220e842815 RDMA/rxe: Move atomic responder res to atomic_reply
Move the allocation of the atomic responder resource up into
rxe_atomic_reply() from send_atomic_ack(). In preparation for merging the
normal and retry atomic responder flows.

Link: https://lore.kernel.org/r/20220606143836.3323-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30 14:00:21 -03:00
Bob Pearson
0ed5493e43 RDMA/rxe: Add a responder state for atomic reply
Add a responder state for atomic reply similar to read reply and rename
process_atomic() rxe_atomic_reply(). In preparation for merging the normal
and retry atomic responder flows.

Link: https://lore.kernel.org/r/20220606143836.3323-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30 13:54:04 -03:00
Bob Pearson
24f0ab0102 RDMA/rxe: Move code to rxe_prepare_atomic_res()
Separate the code that prepares the atomic responder resource into a
subroutine. This is preparation for merging the normal and retry atomic
responder flows.

Link: https://lore.kernel.org/r/20220606143836.3323-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30 13:54:03 -03:00
Bob Pearson
b54c2a25ac RDMA/rxe: Convert read side locking to rcu
Use rcu_read_lock() for protecting read side operations in rxe_pool.c.

Link: https://lore.kernel.org/r/20220612223434.31462-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30 10:56:05 -03:00
Bob Pearson
215d0a755e RDMA/rxe: Stop lookup of partially built objects
Currently the rdma_rxe driver has a security weakness due to giving
objects which are partially initialized indices allowing external actors
to gain access to them by sending packets which refer to their
index (e.g. qpn, rkey, etc) causing unpredictable results.

This patch adds a new API rxe_finalize(obj) which enables looking up pool
objects from indices using rxe_pool_get_index() for AH, QP, MR, and
MW. They are added in create verbs only after the objects are fully
initialized.

It also adds wait for completion to destroy/dealloc verbs to assure that
all references have been dropped before returning to rdma_core by
implementing a new rxe_pool API rxe_cleanup() which drops a reference to
the object and then waits for all other references to be dropped.  When
the last reference is dropped the object is completed by kref.  After that
it cleans up the object and if locally allocated frees the memory. In the
special case of address handle objects the delay is implemented separately
if the destroy_ah call is not sleepable.

Combined with deferring cleanup code to type specific cleanup routines
this allows all pending activity referring to objects to complete before
returning to rdma_core.

Link: https://lore.kernel.org/r/20220612223434.31462-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30 10:56:01 -03:00
Xiao Yang
80a14dd4c3 RDMA/rxe: Remove useless pkt parameters
The pkt parameters in prepare_ack_packet(), send_ack() and
send_atomic_ack() have become useless by the following commits.  So remove
them directly.

Fixes: bf139b58af ("RDMA/rxe: Remove unused pkt->offset")
Fixes: 3896bde92d ("RDMA/rxe: Fix extra copy in prepare_ack_packet")
Link: https://lore.kernel.org/r/20220623131627.18903-1-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-24 19:35:21 -03:00