Commit Graph

701 Commits

Author SHA1 Message Date
Nicolas Morey
84510a61ef RDMA/rxe: Remove dangling declaration of rxe_cq_disable()
rxe_cq_disable() has been removed but not its declaration.

Fixes: 78b26a3353 ("RDMA/rxe: Remove tasklet call from rxe_cq.c")
Link: https://lore.kernel.org/r/4f20ffc5-b2c4-0c11-2883-a835caf01a94@suse.com
Signed-off-by: Nicolas Morey <nmorey@suse.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-01 12:59:32 -03:00
Daisuke Matsuda
42b0a5e691 RDMA/rxe: Fix comments about removed tasklets
The commit 9b4b7c1f9f ("RDMA/rxe: Add workqueue support for rxe tasks")
removed tasklets and replaced them with a workqueue, but relevant comments
are still remaining in the source code.

Fixes: 9b4b7c1f9f ("RDMA/rxe: Add workqueue support for rxe tasks")
Link: https://lore.kernel.org/r/20230518070027.942715-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-19 12:02:26 -03:00
Bob Pearson
9b4b7c1f9f RDMA/rxe: Add workqueue support for rxe tasks
Replace tasklets by work queues for the three main rxe tasklets:
rxe_requester, rxe_completer and rxe_responder.

work queues are a more modern way to process work from an IRQ and provide
more control over how that work is run for future patches.

Link: https://lore.kernel.org/r/20230428171321.5774-1-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Tested-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-17 15:34:25 -03:00
Guoqing Jiang
b5f3fe27c5 RDMA/rxe: Convert spin_{lock_bh,unlock_bh} to spin_{lock_irqsave,unlock_irqrestore}
We need to call spin_lock_irqsave()/spin_unlock_irqrestore() for
state_lock in rxe, otherwsie the callchain:

  ib_post_send_mad
	-> spin_lock_irqsave
	-> ib_post_send -> rxe_post_send
				-> spin_lock_bh
				-> spin_unlock_bh
	-> spin_unlock_irqrestore

Causes below traces during run block nvmeof-mp/001 test due to mismatched
spinlock nesting:

  WARNING: CPU: 0 PID: 94794 at kernel/softirq.c:376 __local_bh_enable_ip+0xc2/0x140
  [ ... ]
  CPU: 0 PID: 94794 Comm: kworker/u4:1 Tainted: G            E      6.4.0-rc1 #9
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
  Workqueue: rdma_cm cma_work_handler [rdma_cm]
  RIP: 0010:__local_bh_enable_ip+0xc2/0x140
  Code: 48 85 c0 74 72 5b 41 5c 5d 31 c0 89 c2 89 c1 89 c6 89 c7 41 89 c0 e9 bd 0e 11 01 65 8b 05 f2 65 72 48 85 c0 0f 85 76 ff ff ff <0f> 0b e9 6f ff ff ff e8 d2 39 1c 00 eb 80 4c 89 e7 e8 68 ad 0a 00
  RSP: 0018:ffffb7cf818539f0 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: 0000000000000201 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000201 RDI: ffffffffc0f25f79
  RBP: ffffb7cf81853a00 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc0f25f79
  R13: ffff8db1f0fa6000 R14: ffff8db2c63ff000 R15: 00000000000000e8
  FS:  0000000000000000(0000) GS:ffff8db33bc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000559758db0f20 CR3: 0000000105124000 CR4: 00000000003506f0
  Call Trace:
   <TASK>
   _raw_spin_unlock_bh+0x31/0x40
   rxe_post_send+0x59/0x8b0 [rdma_rxe]
   ib_send_mad+0x26b/0x470 [ib_core]
   ib_post_send_mad+0x150/0xb40 [ib_core]
   ? cm_form_tid+0x5b/0x90 [ib_cm]
   ib_send_cm_req+0x7c8/0xb70 [ib_cm]
   rdma_connect_locked+0x433/0x940 [rdma_cm]
   nvme_rdma_cm_handler+0x5d7/0x9c0 [nvme_rdma]
   cma_cm_event_handler+0x4f/0x170 [rdma_cm]
   cma_work_handler+0x6a/0xe0 [rdma_cm]
   process_one_work+0x2a9/0x580
   worker_thread+0x52/0x3f0
   ? __pfx_worker_thread+0x10/0x10
   kthread+0x109/0x140
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x2c/0x50
   </TASK>


  raw_local_irq_restore() called with IRQs enabled
  WARNING: CPU: 0 PID: 94794 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x37/0x60
  [ ... ]
  CPU: 0 PID: 94794 Comm: kworker/u4:1 Tainted: G        W   E      6.4.0-rc1 #9
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
  Workqueue: rdma_cm cma_work_handler [rdma_cm]
  RIP: 0010:warn_bogus_irq_restore+0x37/0x60
  Code: fb 01 77 36 83 e3 01 74 0e 48 8b 5d f8 c9 31 f6 89 f7 e9 ac ea 01 00 48 c7 c7 e0 52 33 b9 c6 05 bb 1c 69 01 01 e8 39 24 f0 fe <0f> 0b 48 8b 5d f8 c9 31 f6 89 f7 e9 89 ea 01 00 0f b6 f3 48 c7 c7
  RSP: 0018:ffffb7cf81853a58 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffffb7cf81853a60 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff8db2cfb1a9e8
  R13: ffff8db2cfb1a9d8 R14: ffff8db2c63ff000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff8db33bc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000559758db0f20 CR3: 0000000105124000 CR4: 00000000003506f0
  Call Trace:
   <TASK>
   _raw_spin_unlock_irqrestore+0x91/0xa0
   ib_send_mad+0x1e3/0x470 [ib_core]
   ib_post_send_mad+0x150/0xb40 [ib_core]
   ? cm_form_tid+0x5b/0x90 [ib_cm]
   ib_send_cm_req+0x7c8/0xb70 [ib_cm]
   rdma_connect_locked+0x433/0x940 [rdma_cm]
   nvme_rdma_cm_handler+0x5d7/0x9c0 [nvme_rdma]
   cma_cm_event_handler+0x4f/0x170 [rdma_cm]
   cma_work_handler+0x6a/0xe0 [rdma_cm]
   process_one_work+0x2a9/0x580
   worker_thread+0x52/0x3f0
   ? __pfx_worker_thread+0x10/0x10
   kthread+0x109/0x140
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x2c/0x50
   </TASK>

Fixes: f605f26ea1 ("RDMA/rxe: Protect QP state with qp->state_lock")
Link: https://lore.kernel.org/r/20230510035056.881196-1-guoqing.jiang@linux.dev
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-16 21:07:33 -03:00
Bob Pearson
17eabd6a04 RDMA/rxe: Fix double unlock in rxe_qp.c
A recent patch can cause a double spin_unlock_bh() in rxe_qp_to_attr() at
line 715 in rxe_qp.c. Move the 2nd unlock into the if statement.

Fixes: f605f26ea1 ("RDMA/rxe: Protect QP state with qp->state_lock")
Link: https://lore.kernel.org/r/20230515201056.1591140-1-rpearsonhpe@gmail.com
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/27773078-40ce-414f-8b97-781954da9f25@kili.mountain
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-16 16:52:45 -03:00
Daisuke Matsuda
10af303192 RDMA/rxe: Fix spinlock recursion deadlock on requester
The following deadlock is observed:

 Call Trace:
  <IRQ>
  _raw_spin_lock_bh+0x29/0x30
  check_type_state.constprop.0+0x4e/0xc0 [rdma_rxe]
  rxe_rcv+0x173/0x3d0 [rdma_rxe]
  rxe_udp_encap_recv+0x69/0xd0 [rdma_rxe]
  ? __pfx_rxe_udp_encap_recv+0x10/0x10 [rdma_rxe]
  udp_queue_rcv_one_skb+0x258/0x520
  udp_unicast_rcv_skb+0x75/0x90
  __udp4_lib_rcv+0x364/0x5c0
  ip_protocol_deliver_rcu+0xa7/0x160
  ip_local_deliver_finish+0x73/0xa0
  ip_sublist_rcv_finish+0x80/0x90
  ip_sublist_rcv+0x191/0x220
  ip_list_rcv+0x132/0x160
  __netif_receive_skb_list_core+0x297/0x2c0
  netif_receive_skb_list_internal+0x1c5/0x300
  napi_complete_done+0x6f/0x1b0
  virtnet_poll+0x1f4/0x2d0 [virtio_net]
  __napi_poll+0x2c/0x1b0
  net_rx_action+0x293/0x350
  ? __napi_schedule+0x79/0x90
  __do_softirq+0xcb/0x2ab
  __irq_exit_rcu+0xb9/0xf0
  common_interrupt+0x80/0xa0
  </IRQ>
  <TASK>
  asm_common_interrupt+0x22/0x40
  RIP: 0010:_raw_spin_lock+0x17/0x30
  rxe_requester+0xe4/0x8f0 [rdma_rxe]
  ? xas_load+0x9/0xa0
  ? xa_load+0x70/0xb0
  do_task+0x64/0x1f0 [rdma_rxe]
  rxe_post_send+0x54/0x110 [rdma_rxe]
  ib_uverbs_post_send+0x5f8/0x680 [ib_uverbs]
  ? netif_receive_skb_list_internal+0x1e3/0x300
  ib_uverbs_write+0x3c8/0x500 [ib_uverbs]
  vfs_write+0xc5/0x3b0
  ksys_write+0xab/0xe0
  ? syscall_trace_enter.constprop.0+0x126/0x1a0
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x72/0xdc
  </TASK>

The deadlock is easily reproducible with perftest. Fix it by disabling
softirq when acquiring the lock in process context.

Fixes: f605f26ea1 ("RDMA/rxe: Protect QP state with qp->state_lock")
Link: https://lore.kernel.org/r/20230418090642.1849358-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-21 12:33:00 -03:00
Bob Pearson
f605f26ea1 RDMA/rxe: Protect QP state with qp->state_lock
Currently the rxe driver makes little effort to make the changes to qp
state (which includes qp->attr.qp_state, qp->attr.sq_draining and
qp->valid) atomic between different client threads and IO threads. In
particular a common template is for an RDMA application to call
ib_modify_qp() to move a qp to ERR state and then wait until all the
packet and work queues have drained before calling ib_destroy_qp(). None
of these state changes are protected by locks to assure that the changes
are executed atomically and that memory barriers are included. This has
been observed to lead to incorrect behavior around qp cleanup.

This patch continues the work of the previous patches in this series and
adds locking code around qp state changes and lookups.

Link: https://lore.kernel.org/r/20230405042611.6467-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17 16:34:04 -03:00
Bob Pearson
7b560b89a0 RDMA/rxe: Move code to check if drained to subroutine
Move two blocks of code in rxe_comp.c and rxe_req.c to subroutines that
check if draining is complete in the SQD state and, if so, generate a
SQ_DRAINED event.

Link: https://lore.kernel.org/r/20230405042611.6467-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17 16:01:44 -03:00
Bob Pearson
98e891b5e4 RDMA/rxe: Remove qp->req.state
The rxe driver has four different QP state variables,
    qp->attr.qp_state,
    qp->req.state,
    qp->comp.state, and
    qp->resp.state.
All of these basically carry the same information.

This patch replaces uses of qp->req.state by qp->attr.qp_state and enum
rxe_qp_state.  This is the third of three patches which will remove all
but the qp->attr.qp_state variable. This will bring the driver closer to
the IBA description.

Link: https://lore.kernel.org/r/20230405042611.6467-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17 16:01:44 -03:00
Bob Pearson
f55efc2ed2 RDMA/rxe: Remove qp->comp.state
The rxe driver has four different QP state variables,
    qp->attr.qp_state,
    qp->req.state,
    qp->comp.state, and
    qp->resp.state.
All of these basically carry the same information.

This patch replaces uses of qp->comp.state by qp->attr.qp_state.  This is
the second of three patches which will remove all but the
qp->attr.qp_state variable. This will bring the driver closer to the IBA
description.

Link: https://lore.kernel.org/r/20230405042611.6467-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17 16:01:44 -03:00
Bob Pearson
a588429a66 RDMA/rxe: Remove qp->resp.state
The rxe driver has four different QP state variables,
    qp->attr.qp_state,
    qp->req.state,
    qp->comp.state, and
    qp->resp.state.
All of these basically carry the same information.

This patch replaces uses of qp->resp.state by qp->attr.qp_state.  This is
the first of three patches which will remove all but the qp->attr.qp_state
variable. This will bring the driver closer to the IBA description.

Link: https://lore.kernel.org/r/20230405042611.6467-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17 16:01:44 -03:00
Jason Gunthorpe
8d7c7c0eeb RDMA: Add ib_virt_dma_to_page()
Make it clearer what is going on by adding a function to go back from the
"virtual" dma_addr to a kva and another to a struct page. This is used in the
ib_uses_virt_dma() style drivers (siw, rxe, hfi, qib).

Call them instead of a naked casting and  virt_to_page() when working with dma_addr
values encoded by the various ib_map functions.

This also fixes the virt_to_page() casting problem Linus Walleij has been
chasing.

Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/0-v2-05ea785520ed+10-ib_virt_page_jgg@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-16 11:08:07 +03:00
Zhu Yanjun
b2b1ddc457 RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"
In the function rxe_create_qp(), rxe_qp_from_init() is called to
initialize qp, internally things like rxe_init_task are not setup until
rxe_qp_init_req().

If an error occurred before this point then the unwind will call
rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task()
which will oops when trying to access the uninitialized spinlock.

If rxe_init_task is not executed, rxe_cleanup_task will not be called.

Reported-by: syzbot+cfcc1a3c85be15a40cba@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=fd85757b74b3eb59f904138486f755f71e090df8
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Fixes: 2d4b21e0a2 ("IB/rxe: Prevent from completer to operate on non valid QP")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://lore.kernel.org/r/20230413101115.1366068-1-yanjun.zhu@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-16 10:51:33 +03:00
Bob Pearson
67a00d29c3 RDMA/rxe: Fix incorrect TASKLET_STATE_SCHED check in rxe_task.c
In a previous patch TASKLET_STATE_SCHED was used as a mask but it is a bit
position instead. Add the missing shift.

Link: https://lore.kernel.org/r/20230329193308.7489-1-rpearsonhpe@gmail.com
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/linux-rdma/8a054b78-6d50-4bc6-8d8a-83f85fbdb82f@kili.mountain/
Fixes: d946716325 ("RDMA/rxe: Rewrite rxe_task.c")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-12 13:11:51 -03:00
Leon Romanovsky
b6ba68555d RDMA/rxe: Clean kzalloc failure paths
There is no need to print any debug messages after failure to
allocate memory, because kernel will print OOM dumps anyway.

Together with removal of these messages, remove useless goto jumps.

Fixes: 5bf944f241 ("RDMA/rxe: Add error messages")
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/all/ea43486f-43dd-4054-b1d5-3a0d202be621@kili.mountain
Link: https://lore.kernel.org/r/d3cedf723b84e73e8062a67b7489d33802bafba2.1680113597.git.leon@kernel.org
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-03-30 09:54:32 +03:00
Bob Pearson
78b26a3353 RDMA/rxe: Remove tasklet call from rxe_cq.c
Remove the tasklet call in rxe_cq.c and also the is_dying in the
cq struct. There is no reason for the rxe driver to defer the call
to the cq completion handler by scheduling a tasklet. rxe_cq_post()
is not called in a hard irq context.

The rxe driver currently is incorrect because the tasklet call is
made without protecting the cq pointer with a reference from having
the underlying memory freed before the deferred routine is called.
Executing the comp_handler inline fixes this problem.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Link: https://lore.kernel.org/r/20230327215643.10410-1-rpearsonhpe@gmail.com
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29 14:25:11 +03:00
Bob Pearson
d946716325 RDMA/rxe: Rewrite rxe_task.c
This patch is a major rewrite of the tasklet routines in rxe_task.c.  The
main motivation for this is the realization that the code violates the
safety of the qp pointer by correct reference counting.  When a tasklet is
scheduled from a verbs API the calling thread has a valid reference to the
qp and schedules the tasklet to run at a later time carrying a pointer to
the qp. Once the calling code returns however the qp can be destroyed at
any time. In order to correct this a reference to the qp must be taken
when the task is scheduled and held until it finishes running. This is
complicated by the tasklet library not alwys running a task that is
scheduled depending on whether someone else has scheduled it.

This patch moves the logic for deciding whether to run or schedule a task
outside of do_task() and guarantees that there is only one copy of the
task scheduled or running at a time.

Secondly the separate flags controlling teardown and draining of the task
are included in the task state machine and all references to the state are
protected by spinlocks to avoid consistency and memory barrier issues.

Link: https://lore.kernel.org/r/20230304174533.11296-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
f455a1bc97 RDMA/rxe: Make tasks schedule each other
Replace rxe_run_task() by rxe_sched_task() when tasks call each other.
These are not performance critical and mainly involve error paths but they
run the risk of causing deadlocks.

Link: https://lore.kernel.org/r/20230304174533.11296-8-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
960ebe97e5 RDMA/rxe: Remove __rxe_do_task()
The subroutine __rxe_do_task is not thread safe and it has no way to
guarantee that the tasks, which are designed with the assumption that they
are non-reentrant, are not reentered. All of its uses are non-performance
critical.

This patch replaces calls to __rxe_do_task with calls to
rxe_sched_task. It also removes irrelevant or unneeded if tests.

Instead of calling the task machinery a single call to the tasklet
function (rxe_requester, etc.) is sufficient to draing the queues if task
execution has been disabled or stopped.

Together these changes allow the removal of __rxe_do_task.

Link: https://lore.kernel.org/r/20230304174533.11296-7-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
a246aa2e8a RDMA/rxe: Remove qp reference counting in tasks
Currently each of the three tasklets requester, completer and responder in
the rxe driver take and release a reference to the qp argument at the
beginning and end of the subroutines. The caller passing in the qp
argument should be responsible for holding a reference to qp so these are
not required. Further doing so breaks the qp cleanup code in
rxe_qp_do_cleanup which calls these routines after all the references have
been dropped so they cannot drain the packet and work request queues as
intended.

In fact if these routines are deferred by calling tasklet_schedule there
is no guarantee that the calling code does have a qp reference.  That is a
bug in rxe_task.c which will be fixed later in this series.

Link: https://lore.kernel.org/r/20230304174533.11296-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
fbdeb828a2 RDMA/rxe: Cleanup error state handling in rxe_comp.c
Cleanup the handling of qp in the error state, reset state and during
rxe_qp_do_cleanup. Make the same as rxe_resp.c

Link: https://lore.kernel.org/r/20230304174533.11296-5-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
49dc9c1f0c RDMA/rxe: Cleanup reset state handling in rxe_resp.c
Cleanup the handling of qp in the error state, reset state and during
rxe_qp_do_cleanup. The error state does about the same thing as the others
but has code spread all over.

This patch combines them in a cleaner way.

Link: https://lore.kernel.org/r/20230304174533.11296-4-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:21:36 -03:00
Bob Pearson
3946fc2a42 RDMA/rxe: Convert tasklet args to queue pairs
Originally is was thought that the tasklet machinery in rxe_task.c would
be used in other applications but that has not happened for years. This
patch replaces the 'void *arg' by struct 'rxe_qp *qp' in the parameters to
the tasklet calls. This change will have no affect on performance but may
make the code a little clearer.

Link: https://lore.kernel.org/r/20230304174533.11296-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 11:14:38 -03:00
Bob Pearson
5bf944f241 RDMA/rxe: Add error messages
This patch adds error and debug messages so that every interaction
with rdma-core through a verbs API call or a completion error return
will generate at least one error message backed up by debug messages
with more detail.

With dynamic debugging one can follow up after seeing an error message
by turning on the appropriate debug messages.

Link: https://lore.kernel.org/r/20230303221623.8053-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 10:41:49 -03:00
Bob Pearson
9ac01f434a RDMA/rxe: Extend dbg log messages to err and info
Extend the dbg log messages (e.g. rxe_dbg_xxx) to include
err and info types. rxe.c is modified to use these new log
messages as examples.

Link: https://lore.kernel.org/r/20230303221623.8053-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 10:41:49 -03:00
Bob Pearson
a9fb328721 RDMA/rxe: Change rxe_dbg to rxe_dbg_dev
Replace the name rxe_dbg with rxe_dbg_dev which better matches
the remaining rxe_dbg_xxx macros for debug messages with a
rxe device parameter. Reuse the name rxe_dbg for debug messages
which do not have a rxe device parameter.

Link: https://lore.kernel.org/r/20230303221623.8053-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 10:41:49 -03:00
Bob Pearson
9168d125ea RDMA/rxe: Replace exists by rxe in rxe.c
'exists' looks like a boolean. This patch replaces it by the
normal name used for the rxe device, 'rxe', which should be a
little less confusing. The second rxe_dbg() message is
incorrect since rxe is known to be NULL and this will cause a
seg fault if this message were ever sent. Replace it by pr_debug
for the moment.

Fixes: c6aba5ea00 ("RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe.c")
Link: https://lore.kernel.org/r/20230303221623.8053-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-03-24 10:41:48 -03:00
Linus Torvalds
8cbd92339d v6.3 RDMA pull request
Small cycle this time:
 
 - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw, mana
 
 - inline CQE support for hns
 
 - Have mlx5 display device error codes
 
 - Pinned DMABUF support for irdma
 
 - Continued rxe cleanups, particularly converting the MRs to use xarray
 
 - Improvements to what can be cached in the mlx5 mkey cache
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY/gPmgAKCRCFwuHvBreF
 YW5IAP4xOAiTif4f87vD1twRU/ebq4VEX0r+C2NX5x5fwlCJrAEA7RLV8uG9Uii2
 ez0BuWNxfajuvFHntnZ1E+7UDP0S8gk=
 =CgUH
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Quite a small cycle this time, even with the rc8. I suppose everyone
  went to sleep over xmas.

   - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw,
     mana

   - inline CQE support for hns

   - Have mlx5 display device error codes

   - Pinned DMABUF support for irdma

   - Continued rxe cleanups, particularly converting the MRs to use
     xarray

   - Improvements to what can be cached in the mlx5 mkey cache"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (61 commits)
  IB/mlx5: Extend debug control for CC parameters
  IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors
  IB/hfi1: Fix math bugs in hfi1_can_pin_pages()
  RDMA/irdma: Add support for dmabuf pin memory regions
  RDMA/mlx5: Use query_special_contexts for mkeys
  net/mlx5e: Use query_special_contexts for mkeys
  net/mlx5: Change define name for 0x100 lkey value
  net/mlx5: Expose bits for querying special mkeys
  RDMA/rxe: Fix missing memory barriers in rxe_queue.h
  RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering memory on first packet
  RDMA/rxe: Remove rxe_alloc()
  RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size
  Subject: RDMA/rxe: Handle zero length rdma
  iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry()
  RDMA/mlx5: Use rdma_umem_for_each_dma_block()
  RDMA/umem: Remove unused 'work' member from struct ib_umem
  RDMA/irdma: Cap MSIX used to online CPUs + 1
  RDMA/mlx5: Check reg_create() create for errors
  RDMA/restrack: Correct spelling
  RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish()
  ...
2023-02-24 15:11:03 -08:00
Bob Pearson
a77a52385e RDMA/rxe: Fix missing memory barriers in rxe_queue.h
An earlier patch which introduced smp_load_acquire/smp_store_release
into rxe_queue.h incorrectly assumed that surrounding spin-locks in
rxe_verbs.c around queue updates for kernel ulps was sufficient to
protect the passing of data through the queues between the ulp and
the rxe tasklets. But this was incorrect. The typical sequence was

	ulp				rxe requester tasklet
	------------------------	---------------------
	spin_lock_irqsave()		wqe = queue_head(queue)
	if (!queue_full(q)) {		if (!wqe)
		spin_unlock_irqrestore		return;
		return -ENOMEM
	}				<process wqe>
	wqe = queue_producer_addr(q)
	<fill in wqe>			queue_advance_consumer(queue)
	queue_advance_producer(q)
	spin_unlock_irqrestore()

queue_head() calls queue_empty() which calls smp_load_acquire()
For user space apps queue_advance_producer() calls smp_store_release()
so that there is a memory barrier between the producer and the
consumer but for kernel ulps queue_advance_produce() just incremented
the producer index because the lock function is a release function.
But to work the barrier has to come between filling in the wqe and
updating the producer index. This patch adds the missing barriers.
It also changes the enum names for the ulp queue types to
	QUEUE_TYPE_FROM/TO_ULP instead of QUEUE_TYPE_TO/FROM_DRIVER
which is very ambiguous. This bug is suspected as the cause of very
rare lockups in a very high scale storage application. It is a bug
in any case and should be corrected.

Fixes: 0a67c46d2e ("RDMA/rxe: Protect user space index loads/stores")
Link: https://lore.kernel.org/r/20230214071053.5395-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16 12:07:05 -04:00
Bob Pearson
72a0362744 RDMA/rxe: Remove rxe_alloc()
Currently all the object types in the rxe driver are allocated in
rdma-core except for MRs. By moving tha kzalloc() call outside of
the pool code the rxe_alloc() subroutine can be eliminated and code
checking for MR as a special case can be removed.

This patch moves the kzalloc() and kfree_rcu() calls into the mr
registration and destruction verbs. It removes that code from
rxe_pool.c including the rxe_alloc() subroutine which is no longer
used.

Link: https://lore.kernel.org/r/20230213225551.12437-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16 11:30:11 -04:00
Bob Pearson
5ff31dfcd6 Subject: RDMA/rxe: Handle zero length rdma
Currently the rxe driver does not handle all cases of zero length rdma
operations correctly. The client does not have to provide an rkey for zero
length RDMA read or write operations so the rkey provided may be invalid
and should not be used to lookup an mr.

This patch corrects the driver to ignore the provided rkey if the reth
length is zero for read or write operations and make sure to set the mr to
NULL. In read_reply() if length is zero rxe_recheck_mr() is not
called. Warnings are added in the routines in rxe_mr.c to catch NULL MRs
when the length is non-zero.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20230202044240.6304-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16 11:13:52 -04:00
Bob Pearson
592627ccbd RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray
Replace struct rxe-phys_buf and struct rxe_map by struct xarray
in rxe_verbs.h. This allows using rcu locking on reads for
the memory maps stored in each mr.

This is based off of a sketch of a patch from Jason Gunthorpe in the
link below. Some changes were needed to make this work. It applies
cleanly to the current for-next and passes the pyverbs, perftest
and the same blktests test cases which run today.

Link: https://lore.kernel.org/r/20230119235936.19728-7-rpearsonhpe@gmail.com
Link: https://lore.kernel.org/linux-rdma/Y3gvZr6%2FNCii9Avy@nvidia.com/
Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27 12:14:14 -04:00
Bob Pearson
325a7eb851 RDMA/rxe: Cleanup page variables in rxe_mr.c
Cleanup usage of mr->page_shift and mr->page_mask and introduce
an extractor for mr->ibmr.page_size. Normal usage in the kernel
has page_mask masking out offset in page rather than masking out
the page number. The rxe driver had reversed that which was confusing.
Implicitly there can be a per mr page_size which was not uniformly
supported.

Link: https://lore.kernel.org/r/20230119235936.19728-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:46 -04:00
Bob Pearson
d8bdb0ebca RDMA-rxe: Isolate mr code from atomic_write_reply()
Isolate mr specific code from atomic_write_reply() in rxe_resp.c into
a subroutine rxe_mr_do_atomic_write() in rxe_mr.c.
Check length for atomic write operation.
Make iova_to_vaddr() static.

Link: https://lore.kernel.org/r/20230119235936.19728-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:45 -04:00
Bob Pearson
f04d5b3d91 RDMA-rxe: Isolate mr code from atomic_reply()
Isolate mr specific code from atomic_reply() in rxe_resp.c into
a subroutine rxe_mr_do_atomic_op() in rxe_mr.c.
Minor cleanups to rxe_check_range() and iova_to_vaddr().
Move enum resp_state to rxe.h

Link: https://lore.kernel.org/r/20230119235936.19728-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:45 -04:00
Bob Pearson
db4729a525 RDMA/rxe: Move rxe_map_mr_sg to rxe_mr.c
Move rxe_map_mr_sg() to rxe_mr.c where it makes a little more sense.

Link: https://lore.kernel.org/r/20230119235936.19728-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:45 -04:00
Bob Pearson
ade58da2a7 RDMA/rxe: Cleanup mr_check_range
Remove blank lines and replace EFAULT by EINVAL when an invalid
mr type is used.

Link: https://lore.kernel.org/r/20230119235936.19728-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:44 -04:00
Daisuke Matsuda
1aefe5c177 RDMA/rxe: Prevent faulty rkey generation
If you create MRs more than 0x10000 times after loading the module,
responder starts to reply NAKs for RDMA/Atomic operations because of rkey
violation detected in check_rkey(). The root cause is that rkeys are
incremented each time a new MR is created and the value overflows into the
range reserved for MWs.

This commit also increases the value of RXE_MAX_MW that has been limited
unlike other parameters.

Fixes: 0994a1bcd5 ("RDMA/rxe: Bump up default maximum values used via uverbs")
Link: https://lore.kernel.org/r/20221220080848.253785-2-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-09 10:48:16 -04:00
Daisuke Matsuda
3a73746b26 RDMA/rxe: Fix inaccurate constants in rxe_type_info
ibv_query_device() has reported incorrect device attributes, which are
actually not used by the device. Make the constants correspond with the
attributes shown to users.

Fixes: 3ccffe8abf ("RDMA/rxe: Move max_elem into rxe_type_info")
Fixes: 3225717f6d ("RDMA/rxe: Replace red-black trees by xarrays")
Link: https://lore.kernel.org/r/20221220080848.253785-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-09 10:48:16 -04:00
Jason Gunthorpe
5fc24e6022 RDMA/rxe: Fix compile warnings on 32-bit
Move the conditional code into a function, with two varients so it is
harder to make these kinds of mistakes.

 drivers/infiniband/sw/rxe/rxe_resp.c: In function 'atomic_write_reply':
 drivers/infiniband/sw/rxe/rxe_resp.c:794:13: error: unused variable 'payload' [-Werror=unused-variable]
   794 |         int payload = payload_size(pkt);
       |             ^~~~~~~
 drivers/infiniband/sw/rxe/rxe_resp.c:793:24: error: unused variable 'mr' [-Werror=unused-variable]
   793 |         struct rxe_mr *mr = qp->resp.mr;
       |                        ^~
 drivers/infiniband/sw/rxe/rxe_resp.c:791:19: error: unused variable 'dst' [-Werror=unused-variable]
   791 |         u64 src, *dst;
       |                   ^~~
 drivers/infiniband/sw/rxe/rxe_resp.c:791:13: error: unused variable 'src' [-Werror=unused-variable]
   791 |         u64 src, *dst;

Fixes: 034e285f8b ("RDMA/rxe: Make responder support atomic write on RC service")
Link: https://lore.kernel.org/linux-rdma/Y5s+EVE7eLWQqOwv@nvidia.com/
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-15 11:32:06 -04:00
Li Zhijian
124011e6e9 RDMA/rxe: Enable RDMA FLUSH capability for rxe device
Now we are ready to enable RDMA FLUSH capability for RXE.
It can support Global Visibility and Persistence placement types.

Link: https://lore.kernel.org/r/20221206130201.30986-11-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
Li Zhijian
70aad902ce RDMA/rxe: Implement flush completion
Per IBA SPEC, FLUSH will ack in rdma read response with 0 length.

Use IB_WC_FLUSH (aka IB_UVERBS_WC_FLUSH) code to tell userspace a FLUSH
completion.

Link: https://lore.kernel.org/r/20221206130201.30986-9-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
Li Zhijian
ea1bb00ee9 RDMA/rxe: Implement flush execution in responder side
Only the requested placement types that also registered in the destination
memory region are acceptable.
Otherwise, responder will also reply NAK "Remote Access Error" if it
found a placement type violation.

We will persist data via arch_wb_cache_pmem(), which could be
architecture specific.

This commit also adds 2 helpers to update qp.resp from the incoming packet.

Link: https://lore.kernel.org/r/20221206130201.30986-8-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
Li Zhijian
fa1fd682ad RDMA/rxe: Implement RC RDMA FLUSH service in requester side
Implement FLUSH request operation in the requester.

Link: https://lore.kernel.org/r/20221206130201.30986-7-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
Li Zhijian
02e9a31c89 RDMA/rxe: Extend rxe packet format to support flush
Extend rxe opcode tables, headers, helper and constants to support
flush operations.

Refer to the IBA A19.4.1 for more FETH definition details

Link: https://lore.kernel.org/r/20221206130201.30986-6-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
Li Zhijian
02ea0a5115 RDMA/rxe: Allow registering persistent flag for pmem MR only
Memory region could  support at most 2 flush access flags:
IB_ACCESS_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL

But we only allow user to register persistent flush flags to the pmem MR
where it has the ability of persisting data across power cycles.

So registering a persistent access flag to a non-pmem MR will be rejected.

Link: https://lore.kernel.org/r/20221206130201.30986-5-lizhijian@fujitsu.com
CC: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
Bob Pearson
689c5421bf RDMA/rxe: Fix incorrect responder length checking
The code in rxe_resp.c at check_length() is incorrect as it compares
pkt->opcode an 8 bit value against various mask bits which are all higher
than 256 so nothing is ever reported.

This patch rewrites this to compare against pkt->mask which is
correct. However this now exposes another error. For UD send packets the
value of the pmtu cannot be determined from qp->mtu.  All that is required
here is to later check if the payload fits into the posted receive buffer
in that case.

Fixes: 837a55847e ("RDMA/rxe: Implement packet length validation on responder")
Link: https://lore.kernel.org/r/20221208210945.28607-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:26:27 -04:00
Daisuke Matsuda
3282a549cf RDMA/rxe: Fix oops with zero length reads
The commit 686d348476 ("RDMA/rxe: Remove unnecessary mr testing") causes
a kernel crash. If responder get a zero-byte RDMA Read request,
qp->resp.mr is not set in check_rkey() (see IBA C9-88). The mr is NULL in
this case, and a NULL pointer dereference occurs as shown below.

 BUG: kernel NULL pointer dereference, address: 0000000000000010
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD 0 P4D 0
 Oops: 0002 [#1] PREEMPT SMP PTI
 CPU: 2 PID: 3622 Comm: python3 Kdump: loaded Not tainted 6.1.0-rc3+ #34
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
 RIP: 0010:__rxe_put+0xc/0x60 [rdma_rxe]
 Code: cc cc cc 31 f6 e8 64 36 1b d3 41 b8 01 00 00 00 44 89 c0 c3 cc cc cc cc 41 89 c0 eb c1 90 0f 1f 44 00 00 41 54 b8 ff ff ff ff <f0> 0f c1 47 10 83 f8 01 74 11 45 31 e4 85 c0 7e 20 44 89 e0 41 5c
 RSP: 0018:ffffb27bc012ce78 EFLAGS: 00010246
 RAX: 00000000ffffffff RBX: ffff9790857b0580 RCX: 0000000000000000
 RDX: ffff979080fe145a RSI: 000055560e3e0000 RDI: 0000000000000000
 RBP: ffff97909c7dd800 R08: 0000000000000001 R09: e7ce43d97f7bed0f
 R10: ffff97908b29c300 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: ffff97908b29c300 R15: 0000000000000000
 FS:  00007f276f7bd740(0000) GS:ffff9792b5c80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000010 CR3: 0000000114230002 CR4: 0000000000060ee0
 Call Trace:
  <IRQ>
  read_reply+0xda/0x310 [rdma_rxe]
  rxe_responder+0x82d/0xe50 [rdma_rxe]
  do_task+0x84/0x170 [rdma_rxe]
  tasklet_action_common.constprop.0+0xa7/0x120
  __do_softirq+0xcb/0x2ac
  do_softirq+0x63/0x90
  </IRQ>

Support a NULL mr during read_reply()

Fixes: 686d348476 ("RDMA/rxe: Remove unnecessary mr testing")
Fixes: b5f9a01fae ("RDMA/rxe: Fix mr leak in RESPST_ERR_RNR")
Link: https://lore.kernel.org/r/20221209045926.531689-1-matsuda-daisuke@fujitsu.com
Link: https://lore.kernel.org/r/20221202145713.13152-1-lizhijian@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 15:57:51 -04:00
Jason Gunthorpe
d69e8c63fc Linux 6.1-rc8
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmONI6weHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG9xgH/jqXGuMoO1ikfmGb
 7oY0W/f69G9V/e0DxFLvnIjhFgCUzdnNsmD4jQJA4x6QsxwLWuvpI282Ez+bHV5T
 U4RPsxJZIIMsXE2lKM9BRgeLzDdCt0aK4Pj+3x2x7NZC5cWFSQ8PyQJkCwg+0PQo
 u8Ly+GO8c4RUMf4/rrAZQq16qZUqGDaGm1EJhtSoa+KiR81LmUUmbDIK9Mr53rmQ
 wou+95XhibwMWr17WgXA28bTgYqn9UGr67V3qvTH2LC7GW8BCoKvn+3wh6TVhlWj
 dsWplXgcOP0/OHvSC5Sb1Uibk5Gx3DlIzYa6OfNZQuZ5xmQqm9kXjW8lmYpWFHy/
 38/5HWc=
 =EuoA
 -----END PGP SIGNATURE-----

Merge tag 'v6.1-rc8' into rdma.git for-next

For dependencies in following patches

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 15:52:17 -04:00
Xiao Yang
4cd9f1d320 RDMA/rxe: Enable atomic write capability for rxe device
The capability shows that rxe device supports atomic write operation.

Link: https://lore.kernel.org/r/1669905568-62-4-git-send-email-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01 19:51:10 -04:00
Xiao Yang
3aec427bb1 RDMA/rxe: Implement atomic write completion
Generate an atomic write completion when the atomic write request
has been finished.

Link: https://lore.kernel.org/r/1669905568-62-3-git-send-email-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01 19:51:10 -04:00
Xiao Yang
034e285f8b RDMA/rxe: Make responder support atomic write on RC service
Make responder process an atomic write request and send a read response
on RC service.

Link: https://lore.kernel.org/r/1669905568-62-2-git-send-email-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01 19:51:09 -04:00
Xiao Yang
abb633cf28 RDMA/rxe: Make requester support atomic write on RC service
Make requester process and send an atomic write request on RC service.

Link: https://lore.kernel.org/r/1669905568-62-1-git-send-email-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01 19:51:09 -04:00
Xiao Yang
5c7af6c793 RDMA/rxe: Extend rxe packet format to support atomic write
Extend rxe_wr_opcode_info[] and rxe_opcode[] for new atomic write opcode.

Link: https://lore.kernel.org/r/1669905432-14-5-git-send-email-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01 19:51:09 -04:00
Zhang Xiaoxu
f67376d801 RDMA/rxe: Fix NULL-ptr-deref in rxe_qp_do_cleanup() when socket create failed
There is a null-ptr-deref when mount.cifs over rdma:

  BUG: KASAN: null-ptr-deref in rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
  Read of size 8 at addr 0000000000000018 by task mount.cifs/3046

  CPU: 2 PID: 3046 Comm: mount.cifs Not tainted 6.1.0-rc5+ #62
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc3
  Call Trace:
   <TASK>
   dump_stack_lvl+0x34/0x44
   kasan_report+0xad/0x130
   rxe_qp_do_cleanup+0x2f3/0x360 [rdma_rxe]
   execute_in_process_context+0x25/0x90
   __rxe_cleanup+0x101/0x1d0 [rdma_rxe]
   rxe_create_qp+0x16a/0x180 [rdma_rxe]
   create_qp.part.0+0x27d/0x340
   ib_create_qp_kernel+0x73/0x160
   rdma_create_qp+0x100/0x230
   _smbd_get_connection+0x752/0x20f0
   smbd_get_connection+0x21/0x40
   cifs_get_tcp_session+0x8ef/0xda0
   mount_get_conns+0x60/0x750
   cifs_mount+0x103/0xd00
   cifs_smb3_do_mount+0x1dd/0xcb0
   smb3_get_tree+0x1d5/0x300
   vfs_get_tree+0x41/0xf0
   path_mount+0x9b3/0xdd0
   __x64_sys_mount+0x190/0x1d0
   do_syscall_64+0x35/0x80
   entry_SYSCALL_64_after_hwframe+0x46/0xb0

The root cause of the issue is the socket create failed in
rxe_qp_init_req().

So move the reset rxe_qp_do_cleanup() after the NULL ptr check.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20221122151437.1057671-1-zhangxiaoxu5@huawei.com
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-22 15:55:54 -04:00
Jason Gunthorpe
cb6562c380 RDMA/rxe: Do not NULL deref on debugging failure path
Correct the mistake, mr is obviously NULL in this code path.

Fixes: 2778b72b1d ("RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mr.c")
Link: https://lore.kernel.org/r/Y3eeJW0AdyJYhYyQ@kili
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-22 15:53:02 -04:00
Li Zhijian
7d984dac8f RDMA/rxe: Fix mr->map double free
rxe_mr_cleanup() which tries to free mr->map again will be called when
rxe_mr_init_user() fails:

   CPU: 0 PID: 4917 Comm: rdma_flush_serv Kdump: loaded Not tainted 6.1.0-rc1-roce-flush+ #25
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
   Call Trace:
    <TASK>
    dump_stack_lvl+0x45/0x5d
    panic+0x19e/0x349
    end_report.part.0+0x54/0x7c
    kasan_report.cold+0xa/0xf
    rxe_mr_cleanup+0x9d/0xf0 [rdma_rxe]
    __rxe_cleanup+0x10a/0x1e0 [rdma_rxe]
    rxe_reg_user_mr+0xb7/0xd0 [rdma_rxe]
    ib_uverbs_reg_mr+0x26a/0x480 [ib_uverbs]
    ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x1a2/0x250 [ib_uverbs]
    ib_uverbs_cmd_verbs+0x1397/0x15a0 [ib_uverbs]

This issue was firstly exposed since commit b18c7da63f ("RDMA/rxe: Fix
memory leak in error path code") and then we fixed it in commit
8ff5f5d9d8 ("RDMA/rxe: Prevent double freeing rxe_map_set()") but this
fix was reverted together at last by commit 1e75550648 (Revert
"RDMA/rxe: Create duplicate mapping tables for FMRs")

Simply let rxe_mr_cleanup() always handle freeing the mr->map once it is
successfully allocated.

Fixes: 1e75550648 ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"")
Link: https://lore.kernel.org/r/1667099073-2-1-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18 20:15:51 -04:00
Zhu Yanjun
8e1a76493b RDMA/rxe: Remove reliable datagram support
The rdma_rxe driver does not actually support the reliable datagram
transport but contains a variable with RD opcodes in driver code.  And
this variable is never used. So remove it.

Link: https://lore.kernel.org/r/20221112023537.432912-1-yanjun.zhu@intel.com
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18 19:57:46 -04:00
Bob Pearson
5de087250f RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mmap.c
Replace calls to pr_xxx() in rxe_mmap.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-17-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:07 -04:00
Bob Pearson
813728043b RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_icrc.c
Replace calls to pr_xxx() in rxe_icrc.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-16-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:06 -04:00
Bob Pearson
c6aba5ea00 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe.c
Replace calls to pr_xxx() in rxe.c with rxe_dbg_xxx().
Calls with a rxe device not yet in scope are left as is.

Link: https://lore.kernel.org/r/20221103171013.20659-15-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:06 -04:00
Bob Pearson
fc50597934 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_task.c
Replace calls to pr_xxx() in rxe_task.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-14-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:06 -04:00
Bob Pearson
25fd735a4c RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_av.c
Replace calls to pr_xxx() in rxe_av.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-13-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:05 -04:00
Bob Pearson
14e501fdb0 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_verbs.c
Replace calls to pr_xxx() in rxe_verbs.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-12-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:05 -04:00
Bob Pearson
0e6090024b RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_srq.c
Replace calls to pr_xxx() in rxe_srq.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-11-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:05 -04:00
Bob Pearson
74ddf7233c RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_resp.c
Replace calls to pr_xxx() in rxe_resp.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-10-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:05 -04:00
Bob Pearson
0edfb15e30 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_req.c
Replace calls to pr_xxx() in rxe_req.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:04 -04:00
Bob Pearson
6af70060d2 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_qp.c
Replace calls to pr_xxx() in rxe_qp.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-8-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:04 -04:00
Bob Pearson
34549e88e0 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_net.c
Replace (some) calls to pr_xxx() in rxe_net.c with rxe_dbg_xxx().
Calls with a rxe device not yet in scope are left as is.

Link: https://lore.kernel.org/r/20221103171013.20659-7-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:04 -04:00
Bob Pearson
e8a87efdf8 RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mw.c
Replace calls to pr_xxx() int rxe_mw.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:03 -04:00
Bob Pearson
2778b72b1d RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mr.c
Replace calls to pr_xxx() in rxe_mr.c by rxe_dbg_mr().

Link: https://lore.kernel.org/r/20221103171013.20659-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:03 -04:00
Bob Pearson
52920f537a RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_cq.c
Replace calls to pr_xxx() in rxe_cq.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:03 -04:00
Bob Pearson
27c4c520bd RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_comp.c
Replace calls to pr_xxx() in rxe_comp.c with rxe_dbg_xxx().

Link: https://lore.kernel.org/r/20221103171013.20659-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:02 -04:00
Bob Pearson
4554bac48a RDMA/rxe: Add ibdev_dbg macros for rxe
Add macros borrowed from siw to call dynamic debug macro ibdev_dbg.

Link: https://lore.kernel.org/r/20221103171013.20659-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:02 -04:00
Daisuke Matsuda
837a55847e RDMA/rxe: Implement packet length validation on responder
The function check_length() is supposed to check the length of inbound
packets on responder, but it actually has been a stub since the driver was
born. Let it check the payload length and the DMA length.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Link: https://lore.kernel.org/r/20221107055338.357184-1-matsuda-daisuke@fujitsu.com
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-11-09 19:54:57 +02:00
Yunsheng Lin
692373d186 RDMA/rxe: cleanup some error handling in rxe_verbs.c
Instead of 'goto and return', just return directly to
simplify the error handling, and avoid some unnecessary
return value check.

Link: https://lore.kernel.org/r/20221028075053.3990467-1-xuhaoyue1@hisilicon.com
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 15:11:44 -03:00
Xiao Yang
b071850ef6 RDMA/rxe: Remove the duplicate assignment of mr->map_shift
mr->map_shift is set to ilog2(RXE_BUF_PER_MAP) in both rxe_mr_init() and
rxe_mr_alloc() so remove the duplicate one in rxe_mr_init().

Link: https://lore.kernel.org/r/1666855893-145-1-git-send-email-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 15:08:41 -03:00
Li Zhijian
875ab4a8d9 RDMA/rxe: Make sure requested access is a subset of {mr,mw}->access
We should reject the requests with access flags that is not registered by
MR/MW. For example, lookup_mr() should return NULL when requested access
is 0x03 and mr->access is 0x01.

Link: https://lore.kernel.org/r/20220927055337.22630-2-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 14:39:47 -03:00
Bob Pearson
63a18baef2 RDMA/rxe: Rename task->state_lock to task->lock
Rename task-state_lock to task->lock

Link: https://lore.kernel.org/r/20221021200118.2163-7-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 13:47:16 -03:00
Bob Pearson
dcef28528c RDMA/rxe: Make rxe_do_task static
The subroutine rxe_do_task() is only called in rxe_task.c. This patch
makes it static and renames it do_task().

Link: https://lore.kernel.org/r/20221021200118.2163-6-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 13:47:15 -03:00
Bob Pearson
dccb23f6c3 RDMA/rxe: Split rxe_run_task() into two subroutines
Split rxe_run_task(task, sched) into rxe_run_task(task) and
rxe_sched_task(task).

Link: https://lore.kernel.org/r/20221021200118.2163-5-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 13:47:15 -03:00
Bob Pearson
de669ae8af RDMA/rxe: Removed unused name from rxe_task struct
The name field in struct rxe_task is never used. This patch removes it.

Link: https://lore.kernel.org/r/20221021200118.2163-4-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 13:47:15 -03:00
Bob Pearson
98a54f1706 RDMA/rxe: Remove init of task locks from rxe_qp.c
The calls to spin_lock_init() for the tasklet spinlocks in
rxe_qp_init_misc() are redundant since they are intiialized in
rxe_init_task().  This patch removes them.

Link: https://lore.kernel.org/r/20221021200118.2163-3-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 13:47:15 -03:00
Bob Pearson
05e88ebb9e RDMA/rxe: Remove redundant header files
Remove unneeded include files.

Link: https://lore.kernel.org/r/20221021200118.2163-2-rpearsonhpe@gmail.com
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 13:47:15 -03:00
Li Zhijian
b5f9a01fae RDMA/rxe: Fix mr leak in RESPST_ERR_RNR
rxe_recheck_mr() will increase mr's ref_cnt, so we should call rxe_put(mr)
to drop mr's ref_cnt in RESPST_ERR_RNR to avoid below warning:

  WARNING: CPU: 0 PID: 4156 at drivers/infiniband/sw/rxe/rxe_pool.c:259 __rxe_cleanup+0x1df/0x240 [rdma_rxe]
...
  Call Trace:
   rxe_dereg_mr+0x4c/0x60 [rdma_rxe]
   ib_dereg_mr_user+0xa8/0x200 [ib_core]
   ib_mr_pool_destroy+0x77/0xb0 [ib_core]
   nvme_rdma_destroy_queue_ib+0x89/0x240 [nvme_rdma]
   nvme_rdma_free_queue+0x40/0x50 [nvme_rdma]
   nvme_rdma_teardown_io_queues.part.0+0xc3/0x120 [nvme_rdma]
   nvme_rdma_error_recovery_work+0x4d/0xf0 [nvme_rdma]
   process_one_work+0x582/0xa40
   ? pwq_dec_nr_in_flight+0x100/0x100
   ? rwlock_bug.part.0+0x60/0x60
   worker_thread+0x2a9/0x700
   ? process_one_work+0xa40/0xa40
   kthread+0x168/0x1a0
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork+0x22/0x30

Link: https://lore.kernel.org/r/20221024052049.20577-1-lizhijian@fujitsu.com
Fixes: 8a1a0be894 ("RDMA/rxe: Replace mr by rkey in responder resources")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-25 08:59:29 +03:00
Li Zhijian
686d348476 RDMA/rxe: Remove unnecessary mr testing
Before the testing, we already passed it to rxe_mr_copy() where mr could
be dereferenced. so this checking is not needed.

The only way that mr is NULL is when it reaches below line 780 with
 'qp->resp.mr = NULL', which is not possible in Bob's explanation[1].

 778         if (res->state == rdatm_res_state_new) {
 779                 if (!res->replay) {
 780                         mr = qp->resp.mr;
 781                         qp->resp.mr = NULL;
 782                 } else {

[1] https://lore.kernel.org/lkml/30ff25c4-ce66-eac4-eaa2-64c0db203a19@gmail.com/

Link: https://lore.kernel.org/r/1666582315-2-1-git-send-email-lizhijian@fujitsu.com
CC: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-25 08:56:32 +03:00
Daisuke Matsuda
5ac814e02e RDMA/rxe: Handle remote errors in the midst of a Read reply sequence
Requesting nodes do not handle a reported error correctly if it is
generated in the middle of multi-packet Read responses, and the node tries
to resend the request endlessly. Let completer terminate the connection in
that case.

Link: https://lore.kernel.org/r/20221013014724.3786212-2-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-25 08:56:32 +03:00
Daisuke Matsuda
5ebc548f4f RDMA/rxe: Make responder handle RDMA Read failures
Currently, responder can reply packets with invalid payloads if it fails
to copy messages to the packets. Add an error handling in read_reply() to
inform a requesting node of the failure.

Link: https://lore.kernel.org/r/20221013014724.3786212-1-matsuda-daisuke@fujitsu.com
Suggested-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-25 08:56:32 +03:00
yangx.jy@fujitsu.com
71d2363991 RDMA/rxe: Remove the member 'type' of struct rxe_mr
The member 'type' is included in both struct rxe_mr and struct ib_mr
so remove the duplicate one of struct rxe_mr.

Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Link: https://lore.kernel.org/r/20221021134513.17730-1-yangx.jy@fujitsu.com
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24 14:50:14 +03:00
Daisuke Matsuda
8ad891ed43 RDMA/rxe: Remove error/warning messages from packet receiver path
Incoming packets to rxe are passed from UDP layer using an encapsulation
socket. If there are any clients reachable to a node, they can invoke the
encapsulation handler arbitrarily by sending malicious or irrelevant
packets. This can potentially cause a message overflow and a subsequent
slowdown on the node.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Link: https://lore.kernel.org/r/20220929080023.304242-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-09-29 12:57:56 +03:00
Bob Pearson
6c5e683925 RDMA/rxe: Remove redundant num_sge fields
In include/uapi/rdma/rdma_user_rxe.h there are redundant copies of num_sge
in the rxe_send_wr, rxe_recv_wqe, and rxe_dma_info. Only the ones in
rxe_dma_info are actually used by the rxe kernel driver.

The userspace would set these values, but the kernel never read them.

This change has no affect on the current ABI and new or old versions of
rdma-core operate correctly with new or old versions of the kernel rxe
driver.

Link: https://lore.kernel.org/r/20220913222716.18335-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27 10:15:24 -03:00
Bob Pearson
fda5d0cf8a RDMA/rxe: Fix resize_finish() in rxe_queue.c
Currently in resize_finish() in rxe_queue.c there is a loop which copies
the entries in the original queue into a newly allocated queue.  The
termination logic for this loop is incorrect. The call to
queue_next_index() updates cons but has no effect on whether the queue is
empty. So if the queue starts out empty nothing is copied but if it is not
then the loop will run forever. This patch changes the loop to compare the
value of cons to the original producer index.

Fixes: ae6e843fe0 ("RDMA/rxe: Add memory barriers to kernel queues")
Link: https://lore.kernel.org/r/20220825221446.6512-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27 10:15:24 -03:00
Bob Pearson
58651bbb30 RDMA/rxe: Set pd early in mr alloc routines
Move setting of pd in mr objects ahead of any possible errors so that it
will always be set in rxe_mr_cleanup() to avoid seg faults when
rxe_put(mr_pd(mr)) is called.

Fixes: cf40367961 ("RDMA/rxe: Move mr cleanup code to rxe_mr_cleanup()")
Link: https://lore.kernel.org/r/20220805183153.32007-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27 10:15:24 -03:00
Li Zhijian
f994ae0a14 RDMA/rxe: Add send_common_ack() helper
Most code in send_ack() and send_atomic_ack() are duplicate, move them to
a new helper send_common_ack().

In newer IBA spec, some opcodes require acknowledge with a zero-length
read response, with this new helper, we can easily implement it later.

Link: https://lore.kernel.org/r/1659335010-2-1-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-26 14:14:25 -03:00
Daisuke Matsuda
954afc5a8f RDMA/rxe: Use members of generic struct in rxe_mr
rxe_mr and ib_mr have interchangeable members. Remove device specific
members and use ones in the generic struct. Both 'iova' and 'length' are
filled in ib_uverbs or ib_core layer after MR registration.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Link: https://lore.kernel.org/r/20220921080844.1616883-2-matsuda-daisuke@fujitsu.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-22 12:46:39 +03:00
Li Zhijian
415a04844a RDMA/rxe: convert pr_warn to pr_debug
They could be triggered by user APIs with invalid parameters.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://lore.kernel.org/r/1662518901-2-2-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-08 11:03:15 +03:00
Li Zhijian
e2edba67fc RDMA/rxe: use %u to print u32 variables
struct ib_qp_cap {
        u32     max_send_wr;
        u32     max_recv_wr;
        u32     max_send_sge;
        u32     max_recv_sge;
        u32     max_inline_data;
...

To avoid getting a negative value from dmesg:
[410580.579965] rdma_rxe: invalid send sge = 65535 > 32
[410580.583818] rdma_rxe: invalid send wr = -1 > 1048576
[410582.771323] rdma_rxe: invalid recv sge = 65535 > 32
[410582.775310] rdma_rxe: invalid recv wr = -1 > 1048576

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://lore.kernel.org/r/1662518901-2-1-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-08 11:03:15 +03:00
Daisuke Matsuda
2c02249fcb RDMA/rxe: Delete error messages triggered by incoming Read requests
An incoming Read request causes multiple Read responses. If a user MR to
copy data from is unavailable or responder cannot send a reply, then the
error messages can be printed for each response attempt, resulting in
message overflow.

Link: https://lore.kernel.org/r/20220829071218.1639065-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-31 09:57:09 +03:00
Zhu Yanjun
f07853582d RDMA/rxe: Remove the unused variable obj
The member variable obj in struct rxe_task is not needed.
So remove it to save memory.

Link: https://lore.kernel.org/r/20220822011615.805603-4-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-31 09:53:13 +03:00
Zhu Yanjun
548ce2e667 RDMA/rxe: Fix the error caused by qp->sk
When sock_create_kern in the function rxe_qp_init_req fails,
qp->sk is set to NULL.

Then the function rxe_create_qp will call rxe_qp_do_cleanup
to handle allocated resource.

Before handling qp->sk, this variable should be checked.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220822011615.805603-3-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-31 09:53:13 +03:00