linux-yocto

mirror of git://git.yoctoproject.org/linux-yocto.git synced 2025-08-21 16:31:14 +02:00

Author	SHA1	Message	Date
Pavel Begunkov	d48845afa0	io_uring/poll: fix POLLERR handling commit `c7cafd5b81` upstream. `8c8492ca64` ("io_uring/net: don't retry connect operation on EPOLLERR") is a little dirty hack that 1) wrongfully assumes that POLLERR equals to a failed request, which breaks all POLLERR users, e.g. all error queue recv interfaces. 2) deviates the connection request behaviour from connect(2), and 3) racy and solved at a wrong level. Nothing can be done with 2) now, and 3) is beyond the scope of the patch. At least solve 1) by moving the hack out of generic poll handling into io_connect(). Cc: stable@vger.kernel.org Fixes: `8c8492ca64` ("io_uring/net: don't retry connect operation on EPOLLERR") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3dc89036388d602ebd84c28e5042e457bdfc952b.1752682444.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-07-24 08:53:12 +02:00
Fengnan Chang	50b1e01aa1	io_uring: make fallocate be hashed work [ Upstream commit `88a80066af` ] Like ftruncate and write, fallocate operations on the same file cannot be executed in parallel, so it is better to make fallocate be hashed work. Signed-off-by: Fengnan Chang <changfengnan@bytedance.com> Link: https://lore.kernel.org/r/20250623110218.61490-1-changfengnan@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-07-17 18:35:20 +02:00
Penglei Jiang	675d90ee87	io_uring: fix task leak issue in io_wq_create() commit `89465d923b` upstream. Add missing put_task_struct() in the error path Cc: stable@vger.kernel.org Fixes: `0f8baa3c98` ("io-wq: fully initialize wqe before calling cpuhp_state_add_instance_nocalls()") Signed-off-by: Penglei Jiang <superman.xpt@gmail.com> Link: https://lore.kernel.org/r/20250615163906.2367-1-superman.xpt@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-27 11:08:58 +01:00
Pavel Begunkov	0257c26bbc	io_uring/kbuf: account ring io_buffer_list memory commit `475a8d3037` upstream. Follow the non-ringed pbuf struct io_buffer_list allocations and account it against the memcg. There is low chance of that being an actual problem as ring provided buffer should either pin user memory or allocate it, which is already accounted. Cc: stable@vger.kernel.org # 6.1 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3985218b50d341273cafff7234e1a7e6d0db9808.1747150490.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-27 11:08:43 +01:00
Pavel Begunkov	a8b5ef3554	io_uring: account drain memory to cgroup commit `f979c20547` upstream. Account drain allocations against memcg. It's not a big problem as each such allocation is paired with a request, which is accounted, but it's nicer to follow the limits more closely. Cc: stable@vger.kernel.org # 6.1 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f8dfdbd755c41fd9c75d12b858af07dfba5bbb68.1746788718.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-27 11:08:43 +01:00
Greg Kroah-Hartman	99bc5248a4	Revert "io_uring: ensure deferred completions are posted for multishot" This reverts commit `746e7d285d` which is commit `687b2bae0e` upstream. Jens writes: There's some missing dependencies that makes this not work right, I'll bring it back in a series instead. Link: https://lore.kernel.org/r/906ba919-32e6-4534-bbad-2cd18e1098ca@kernel.dk Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-19 15:28:45 +02:00
Jens Axboe	029d39ae7e	io_uring/rw: fix wrong NOWAIT check in io_rw_init_file() Commit `ae6a888a43` upstream. A previous commit improved how !FMODE_NOWAIT is dealt with, but inadvertently negated a check whilst doing so. This caused -EAGAIN to be returned from reading files with O_NONBLOCK set. Fix up the check for REQ_F_SUPPORT_NOWAIT. Reported-by: Julian Orth <ju.orth@gmail.com> Link: https://github.com/axboe/liburing/issues/1270 Fixes: `f7c9134385` ("io_uring/rw: allow pollable non-blocking attempts for !FMODE_NOWAIT") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-19 15:28:45 +02:00
Jens Axboe	62d5d980b5	io_uring/rw: allow pollable non-blocking attempts for !FMODE_NOWAIT Commit `f7c9134385` upstream. The checking for whether or not io_uring can do a non-blocking read or write attempt is gated on FMODE_NOWAIT. However, if the file is pollable, it's feasible to just check if it's currently in a state in which it can sanely receive or send _some_ data. This avoids unnecessary io-wq punts, and repeated worthless retries before doing that punt, by assuming that some data can get delivered or received if poll tells us that is true. It also allows multishot reads to properly work with these types of files, enabling a bit of a cleanup of the logic that: `c9d952b910` ("io_uring/rw: fix cflags posting for single issue multishot read") had to put in place. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-19 15:28:44 +02:00
Jens Axboe	90e11232a6	io_uring: add io_file_can_poll() helper Commit `95041b93e9` upstream. This adds a flag to avoid dipping dereferencing file and then f_op to figure out if the file has a poll handler defined or not. We generally call this at least twice for networked workloads, and if using ring provided buffers, we do it on every buffer selection. Particularly the latter is troublesome, as it's otherwise a very fast operation. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-19 15:28:44 +02:00
Pavel Begunkov	c844ace5b8	io_uring: fix overflow resched cqe reordering [ Upstream commit `a7d755ed9c` ] Leaving the CQ critical section in the middle of a overflow flushing can cause cqe reordering since the cache cq pointers are reset and any new cqe emitters that might get called in between are not going to be forced into io_cqe_cache_refill(). Fixes: `eac2ca2d68` ("io_uring: check if we need to reschedule during overflow flush") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/90ba817f1a458f091f355f407de1c911d2b93bbf.1747483784.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-06-04 14:42:18 +02:00
Jens Axboe	b72952c8c3	io_uring/fdinfo: annotate racy sq/cq head/tail reads [ Upstream commit `f024d3a8de` ] syzbot complains about the cached sq head read, and it's totally right. But we don't need to care, it's just reading fdinfo, and reading the CQ or SQ tail/head entries are known racy in that they are just a view into that very instant and may of course be outdated by the time they are reported. Annotate both the SQ head and CQ tail read with data_race() to avoid this syzbot complaint. Link: https://lore.kernel.org/io-uring/6811f6dc.050a0220.39e3a1.0d0e.GAE@google.com/ Reported-by: syzbot+3e77fd302e99f5af9394@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-06-04 14:42:16 +02:00
Jens Axboe	746e7d285d	io_uring: ensure deferred completions are posted for multishot Commit `687b2bae0e` upstream. Multishot normally uses io_req_post_cqe() to post completions, but when stopping it, it may finish up with a deferred completion. This is fine, except if another multishot event triggers before the deferred completions get flushed. If this occurs, then CQEs may get reordered in the CQ ring, and cause confusion on the application side. When multishot posting via io_req_post_cqe(), flush any pending deferred completions first, if any. Cc: stable@vger.kernel.org # 6.1+ Reported-by: Norman Maurer <norman_maurer@apple.com> Reported-by: Christian Mazakas <christian.mazakas@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-05-18 08:24:10 +02:00
Jens Axboe	51f1389b5f	io_uring: always arm linked timeouts prior to issue Commit `b53e523261` upstream. There are a few spots where linked timeouts are armed, and not all of them adhere to the pre-arm, attempt issue, post-arm pattern. This can be problematic if the linked request returns that it will trigger a callback later, and does so before the linked timeout is fully armed. Consolidate all the linked timeout handling into __io_issue_sqe(), rather than have it spread throughout the various issue entry points. Cc: stable@vger.kernel.org Link: https://github.com/axboe/liburing/issues/1390 Reported-by: Chase Hiltz <chase@path.net> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-05-18 08:24:10 +02:00
Pavel Begunkov	c5d4d10300	io_uring: always do atomic put from iowq [ Upstream commit `390513642e` ] io_uring always switches requests to atomic refcounting for iowq execution before there is any parallilism by setting REQ_F_REFCOUNT, and the flag is not cleared until the request completes. That should be fine as long as the compiler doesn't make up a non existing value for the flags, however KCSAN still complains when the request owner changes oter flag bits: BUG: KCSAN: data-race in io_req_task_cancel / io_wq_free_work ... read to 0xffff888117207448 of 8 bytes by task 3871 on cpu 0: req_ref_put_and_test io_uring/refs.h:22 [inline] Skip REQ_F_REFCOUNT checks for iowq, we know it's set. Reported-by: syzbot+903a2ad71fb3f1e47cf5@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d880bc27fb8c3209b54641be4ff6ac02b0e5789a.1743679736.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-05-02 07:50:57 +02:00
Jens Axboe	b675b4c863	io_uring: fix 'sync' handling of io_fallback_tw() commit `edd43f4d6f` upstream. A previous commit added a 'sync' parameter to io_fallback_tw(), which if true, means the caller wants to wait on the fallback thread handling it. But the logic is somewhat messed up, ensure that ctxs are swapped and flushed appropriately. Cc: stable@vger.kernel.org Fixes: `dfbe5561ae` ("io_uring: flush offloaded and delayed task_work on exit") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-05-02 07:50:47 +02:00
Pavel Begunkov	7e2449ee66	io_uring/net: fix accept multishot handling Commit `f6a89bf527` upstream. REQ_F_APOLL_MULTISHOT doesn't guarantee it's executed from the multishot context, so a multishot accept may get executed inline, fail io_req_post_cqe(), and ask the core code to kill the request with -ECANCELED by returning IOU_STOP_MULTISHOT even when a socket has been accepted and installed. Cc: stable@vger.kernel.org Fixes: `390ed29b5e` ("io_uring: add IORING_ACCEPT_MULTISHOT for accept") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/51c6deb01feaa78b08565ca8f24843c017f5bc80.1740331076.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-04-25 10:45:54 +02:00
Jens Axboe	35c4a652d8	io_uring/kbuf: reject zero sized provided buffers commit `cf960726eb` upstream. This isn't fixing a real issue, but there's also zero point in going through group and buffer setup, when the buffers are going to be rejected once attempted to get used. Cc: stable@vger.kernel.org Reported-by: syzbot+58928048fd1416f1457c@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-04-25 10:45:26 +02:00
Pavel Begunkov	78aefac7ef	io_uring: fix error pbuf checking Commit `bcc87d978b` upstream. Syz reports a problem, which boils down to NULL vs IS_ERR inconsistent error handling in io_alloc_pbuf_ring(). KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:__io_remove_buffers+0xac/0x700 io_uring/kbuf.c:341 Call Trace: <TASK> io_put_bl io_uring/kbuf.c:378 [inline] io_destroy_buffers+0x14e/0x490 io_uring/kbuf.c:392 io_ring_ctx_free+0xa00/0x1070 io_uring/io_uring.c:2613 io_ring_exit_work+0x80f/0x8a0 io_uring/io_uring.c:2844 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312 worker_thread+0x86d/0xd40 kernel/workqueue.c:3390 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Cc: stable@vger.kernel.org Reported-by: syzbot+2074b1a3d447915c6f1c@syzkaller.appspotmail.com Fixes: `87585b0575` ("io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c5f9df20560bd9830401e8e48abc029e7cfd9f5e.1721329239.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-03-22 12:50:45 -07:00
Jens Axboe	1fdb9c9eb2	io_uring: use unpin_user_pages() where appropriate Commit `18595c0a58` upstream. There are a few cases of open-rolled loops around unpin_user_page(), use the generic helper instead. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-03-22 12:50:45 -07:00
Jens Axboe	46b1b3d81a	io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring Commit `87585b0575` upstream. Rather than use remap_pfn_range() for this and manually free later, switch to using vm_insert_page() and have it Just Work. This requires a bit of effort on the mmap lookup side, as the ctx uring_lock isn't held, which otherwise protects buffer_lists from being torn down, and it's not safe to grab from mmap context that would introduce an ABBA deadlock between the mmap lock and the ctx uring_lock. Instead, lookup the buffer_list under RCU, as the the list is RCU freed already. Use the existing reference count to determine whether it's possible to safely grab a reference to it (eg if it's not zero already), and drop that reference when done with the mapping. If the mmap reference is the last one, the buffer_list and the associated memory can go away, since the vma insertion has references to the inserted pages at that point. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-03-22 12:50:45 -07:00
Jens Axboe	af8f27ef1a	io_uring/kbuf: vmap pinned buffer ring Commit `e270bfd22a` upstream. This avoids needing to care about HIGHMEM, and it makes the buffer indexing easier as both ring provided buffer methods are now virtually mapped in a contigious fashion. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-03-22 12:50:45 -07:00
Jens Axboe	6168ec87bf	io_uring: unify io_pin_pages() Commit `1943f96b38` upstream. Move it into io_uring.c where it belongs, and use it in there as well rather than have two implementations of this. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-03-22 12:50:45 -07:00
Jens Axboe	719e745ee3	io_uring: use vmap() for ring mapping Commit `09fc75e0c0` upstream. This is the last holdout which does odd page checking, convert it to vmap just like what is done for the non-mmap path. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-03-22 12:50:45 -07:00
Pavel Begunkov	b89f95b94c	io_uring: fix corner case forgetting to vunmap Commit `43eef70e7e` upstream. io_pages_unmap() is a bit tricky in trying to figure whether the pages were previously vmap'ed or not. In particular If there is juts one page it belives there is no need to vunmap. Paired io_pages_map(), however, could've failed io_mem_alloc_compound() and attempted to io_mem_alloc_single(), which does vmap, and that leads to unpaired vmap. The solution is to fail if io_mem_alloc_compound() can't allocate a single page. That's the easiest way to deal with it, and those two functions are getting removed soon, so no need to overcomplicate it. Cc: stable@vger.kernel.org Fixes: `3ab1db3c60` ("io_uring: get rid of remap_pfn_range() for mapping rings/sqes") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/477e75a3907a2fe83249e49c0a92cd480b2c60e0.1732569842.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-03-22 12:50:45 -07:00
Jens Axboe	a0b21f2aca	io_uring: don't attempt to mmap larger than what the user asks for Commit `06fe9b1df1` upstream. If IORING_FEAT_SINGLE_MMAP is ignored, as can happen if an application uses an ancient liburing or does setup manually, then 3 mmap's are required to map the ring into userspace. The kernel will still have collapsed the mappings, however userspace may ask for mapping them individually. If so, then we should not use the full number of ring pages, as it may exceed the partial mapping. Doing so will yield an -EFAULT from vm_insert_pages(), as we pass in more pages than what the application asked for. Cap the number of pages to match what the application asked for, for the particular mapping operation. Reported-by: Lucas Mülling <lmulling@proton.me> Link: https://github.com/axboe/liburing/issues/1157 Fixes: `3ab1db3c60` ("io_uring: get rid of remap_pfn_range() for mapping rings/sqes") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-03-22 12:50:45 -07:00
Jens Axboe	2905c4fe7e	io_uring: get rid of remap_pfn_range() for mapping rings/sqes Commit `3ab1db3c60` upstream. Rather than use remap_pfn_range() for this and manually free later, switch to using vm_insert_pages() and have it Just Work. If possible, allocate a single compound page that covers the range that is needed. If that works, then we can just use page_address() on that page. If we fail to get a compound page, allocate single pages and use vmap() to map them into the kernel virtual address space. This just covers the rings/sqes, the other remaining user of the mmap remap_pfn_range() user will be converted separately. Once that is done, we can kill the old alloc/free code. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-03-22 12:50:44 -07:00
Uday Shankar	570f4d6e94	io-wq: backoff when retrying worker creation [ Upstream commit `13918315c5` ] When io_uring submission goes async for the first time on a given task, we'll try to create a worker thread to handle the submission. Creating this worker thread can fail due to various transient conditions, such as an outstanding signal in the forking thread, so we have retry logic with a limit of 3 retries. However, this retry logic appears to be too aggressive/fast - we've observed a thread blowing through the retry limit while having the same outstanding signal the whole time. Here's an excerpt of some tracing that demonstrates the issue: First, signal 26 is generated for the process. It ends up getting routed to thread 92942. 0) cbd-92284 /* signal_generate: sig=26 errno=0 code=-2 comm=psblkdASD pid=92934 grp=1 res=0 / This causes create_io_thread in the signalled thread to fail with ERESTARTNOINTR, and thus a retry is queued. 13) task_th-92942 / io_uring_queue_async_work: ring 000000007325c9ae, request 0000000080c96d8e, user_data 0x0, opcode URING_CMD, flags 0x8240001, normal queue, work 000000006e96dd3f / 13) task_th-92942 io_wq_enqueue() { 13) task_th-92942 _raw_spin_lock(); 13) task_th-92942 io_wq_activate_free_worker(); 13) task_th-92942 _raw_spin_lock(); 13) task_th-92942 create_io_worker() { 13) task_th-92942 __kmalloc_cache_noprof(); 13) task_th-92942 __init_swait_queue_head(); 13) task_th-92942 kprobe_ftrace_handler() { 13) task_th-92942 get_kprobe(); 13) task_th-92942 aggr_pre_handler() { 13) task_th-92942 pre_handler_kretprobe(); 13) task_th-92942 / create_enter: (create_io_thread+0x0/0x50) fn=0xffffffff8172c0e0 arg=0xffff888996bb69c0 node=-1 / 13) task_th-92942 } / aggr_pre_handler / ... 13) task_th-92942 } / copy_process / 13) task_th-92942 } / create_io_thread / 13) task_th-92942 kretprobe_rethook_handler() { 13) task_th-92942 / create_exit: (create_io_worker+0x8a/0x1a0 <- create_io_thread) arg1=0xfffffffffffffdff / 13) task_th-92942 } / kretprobe_rethook_handler / 13) task_th-92942 queue_work_on() { ... The CPU is then handed to a kworker to process the queued retry: ------------------------------------------ 13) task_th-92942 => kworker-54154 ------------------------------------------ 13) kworker-54154 io_workqueue_create() { 13) kworker-54154 io_queue_worker_create() { 13) kworker-54154 task_work_add() { 13) kworker-54154 wake_up_state() { 13) kworker-54154 try_to_wake_up() { 13) kworker-54154 _raw_spin_lock_irqsave(); 13) kworker-54154 _raw_spin_unlock_irqrestore(); 13) kworker-54154 } / try_to_wake_up / 13) kworker-54154 } / wake_up_state / 13) kworker-54154 kick_process(); 13) kworker-54154 } / task_work_add / 13) kworker-54154 } / io_queue_worker_create / 13) kworker-54154 } / io_workqueue_create / And then we immediately switch back to the original task to try creating a worker again. This fails, because the original task still hasn't handled its signal. ----------------------------------------- 13) kworker-54154 => task_th-92942 ------------------------------------------ 13) task_th-92942 create_worker_cont() { 13) task_th-92942 kprobe_ftrace_handler() { 13) task_th-92942 get_kprobe(); 13) task_th-92942 aggr_pre_handler() { 13) task_th-92942 pre_handler_kretprobe(); 13) task_th-92942 / create_enter: (create_io_thread+0x0/0x50) fn=0xffffffff8172c0e0 arg=0xffff888996bb69c0 node=-1 / 13) task_th-92942 } / aggr_pre_handler / 13) task_th-92942 } / kprobe_ftrace_handler / 13) task_th-92942 create_io_thread() { 13) task_th-92942 copy_process() { 13) task_th-92942 task_active_pid_ns(); 13) task_th-92942 _raw_spin_lock_irq(); 13) task_th-92942 recalc_sigpending(); 13) task_th-92942 _raw_spin_lock_irq(); 13) task_th-92942 } / copy_process / 13) task_th-92942 } / create_io_thread / 13) task_th-92942 kretprobe_rethook_handler() { 13) task_th-92942 / create_exit: (create_worker_cont+0x35/0x1b0 <- create_io_thread) arg1=0xfffffffffffffdff / 13) task_th-92942 } / kretprobe_rethook_handler / 13) task_th-92942 io_worker_release(); 13) task_th-92942 queue_work_on() { 13) task_th-92942 clear_pending_if_disabled(); 13) task_th-92942 __queue_work() { 13) task_th-92942 } / __queue_work / 13) task_th-92942 } / queue_work_on / 13) task_th-92942 } / create_worker_cont / The pattern repeats another couple times until we blow through the retry counter, at which point we give up. All outstanding work is canceled, and the io_uring command which triggered all this is failed with ECANCELED: 13) task_th-92942 io_acct_cancel_pending_work() { ... 13) task_th-92942 / io_uring_complete: ring 000000007325c9ae, req 0000000080c96d8e, user_data 0x0, result -125, cflags 0x0 extra1 0 extra2 0 / Finally, the task gets around to processing its outstanding signal 26, but it's too late. 13) task_th-92942 / signal_deliver: sig=26 errno=0 code=-2 sa_handler=59566a0 sa_flags=14000000 */ Try to address this issue by adding a small scaling delay when retrying worker creation. This should give the forking thread time to handle its signal in the above case. This isn't a particularly satisfying solution, as sufficiently paradoxical scheduling would still have us hitting the same issue, and I'm open to suggestions for something better. But this is likely to prevent this (already rare) issue from hitting in practice. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Link: https://lore.kernel.org/r/20250208-wq_retry-v2-1-4f6f5041d303@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-03-22 12:50:43 -07:00
Pavel Begunkov	685da33c81	io_uring/net: save msg_control for compat [ Upstream commit `6ebf05189d` ] Match the compat part of io_sendmsg_copy_hdr() with its counterpart and save msg_control. Fixes: `c55978024d` ("io_uring/net: move receive multishot out of the generic msghdr path") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2a8418821fe83d3b64350ad2b3c0303e9b732bbd.1740498502.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-03-07 16:45:43 +01:00
Pavel Begunkov	b9826e3b26	io_uring: prevent opcode speculation commit `1e988c3fe1` upstream. sqe->opcode is used for different tables, make sure we santitise it against speculations. Cc: stable@vger.kernel.org Fixes: `d3656344fe` ("io_uring: add lookup table for various opcode needs") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Li Zetao <lizetao1@huawei.com> Link: https://lore.kernel.org/r/7eddbf31c8ca0a3947f8ed98271acc2b4349c016.1739568408.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-27 04:10:51 -08:00
Pavel Begunkov	146a185f6c	io_uring/kbuf: reallocate buf lists on upgrade commit `8802766324` upstream. IORING_REGISTER_PBUF_RING can reuse an old struct io_buffer_list if it was created for legacy selected buffer and has been emptied. It violates the requirement that most of the field should stay stable after publish. Always reallocate it instead. Cc: stable@vger.kernel.org Reported-by: Pumpkin Chang <pumpkin@devco.re> Fixes: `2fcabce2d7` ("io_uring: disallow mixed provided buffer group registrations") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-21 13:57:27 +01:00
Pavel Begunkov	644636ee7e	io_uring/rw: commit provided buffer state on async When we get -EIOCBQUEUED, we need to ensure that the buffer is consumed from the provided buffer ring, which can be done with io_kbuf_recycle() + REQ_F_PARTIAL_IO. Reported-by: Muhammad Ramdhan <ramdhan@starlabs.sg> Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Reported-by: Jacob Soo <jacob.soo@starlabs.sg> Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-17 09:40:37 +01:00
Pavel Begunkov	a94592ec30	io_uring: fix io_req_prep_async with provided buffers io_req_prep_async() can import provided buffers, commit the ring state by giving up on that before, it'll be reimported later if needed. Reported-by: Muhammad Ramdhan <ramdhan@starlabs.sg> Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Reported-by: Jacob Soo <jacob.soo@starlabs.sg> Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-17 09:40:36 +01:00
Jens Axboe	130675a219	io_uring/net: don't retry connect operation on EPOLLERR commit `8c8492ca64` upstream. If a socket is shutdown before the connection completes, POLLERR is set in the poll mask. However, connect ignores this as it doesn't know, and attempts the connection again. This may lead to a bogus -ETIMEDOUT result, where it should have noticed the POLLERR and just returned -ECONNRESET instead. Have the poll logic check for whether or not POLLERR is set in the mask, and if so, mark the request as failed. Then connect can appropriately fail the request rather than retry it. Reported-by: Sergey Galas <ssgalas@cloud.ru> Cc: stable@vger.kernel.org Link: https://github.com/axboe/liburing/discussions/1335 Fixes: `3fb1bd6881` ("io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-17 09:40:36 +01:00
Pavel Begunkov	b86f1d5173	io_uring: fix multishots with selected buffers commit `d63b0e8a62` upstream. We do io_kbuf_recycle() when arming a poll but every iteration of a multishot can grab more buffers, which is why we need to flush the kbuf ring state before continuing with waiting. Cc: stable@vger.kernel.org Fixes: `b3fdea6ecb` ("io_uring: multishot recv") Reported-by: Muhammad Ramdhan <ramdhan@starlabs.sg> Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg> Reported-by: Jacob Soo <jacob.soo@starlabs.sg> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1bfc9990fe435f1fc6152ca9efeba5eb3e68339c.1738025570.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-02-17 09:40:36 +01:00
Jens Axboe	563ba1701b	io_uring/uring_cmd: use cached cmd_op in io_uring_cmd_sock() [ Upstream commit `d58d82bd0e` ] io_uring_cmd_sock() does a normal read of cmd->sqe->cmd_op, where it really should be using a READ_ONCE() as ->sqe may still be pointing to the original SQE. Since the prep side already does this READ_ONCE() and stores it locally, use that value rather than re-read it. Fixes: `8e9fad0e70` ("io_uring: Add io_uring command support for sockets") Link: https://lore.kernel.org/r/20250121-uring-sockcmd-fix-v1-1-add742802a29@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-02-08 09:52:33 +01:00
Jens Axboe	8efff2aa2d	io_uring/eventfd: ensure io_eventfd_signal() defers another RCU period Commit `c9a40292a4` upstream. io_eventfd_do_signal() is invoked from an RCU callback, but when dropping the reference to the io_ev_fd, it calls io_eventfd_free() directly if the refcount drops to zero. This isn't correct, as any potential freeing of the io_ev_fd should be deferred another RCU grace period. Just call io_eventfd_put() rather than open-code the dec-and-test and free, which will correctly defer it another RCU grace period. Fixes: `21a091b970` ("io_uring: signal registered eventfd to process deferred task work") Reported-by: Jann Horn <jannh@google.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-17 13:36:24 +01:00
Pavel Begunkov	e3ed5a14aa	io_uring/timeout: fix multishot updates commit `c83c846231` upstream. After update only the first shot of a multishot timeout request adheres to the new timeout value while all subsequent retries continue to use the old value. Don't forget to update the timeout stored in struct io_timeout_data. Cc: stable@vger.kernel.org Fixes: `ea97f6c855` ("io_uring: add support for multishot timeouts") Reported-by: Christian Mazakas <christian.mazakas@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e6516c3304eb654ec234cfa65c88a9579861e597.1736015288.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-17 13:36:19 +01:00
Pavel Begunkov	80120bb4ee	io_uring/sqpoll: fix sqpoll error handling races commit `e33ac68e5e` upstream. BUG: KASAN: slab-use-after-free in __lock_acquire+0x370b/0x4a10 kernel/locking/lockdep.c:5089 Call Trace: <TASK> ... _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:551 [inline] try_to_wake_up+0xb5/0x23c0 kernel/sched/core.c:4205 io_sq_thread_park+0xac/0xe0 io_uring/sqpoll.c:55 io_sq_thread_finish+0x6b/0x310 io_uring/sqpoll.c:96 io_sq_offload_create+0x162/0x11d0 io_uring/sqpoll.c:497 io_uring_create io_uring/io_uring.c:3724 [inline] io_uring_setup+0x1728/0x3230 io_uring/io_uring.c:3806 ... Kun Hu reports that the SQPOLL creating error path has UAF, which happens if io_uring_alloc_task_context() fails and then io_sq_thread() manages to run and complete before the rest of error handling code, which means io_sq_thread_finish() is looking at already killed task. Note that this is mostly theoretical, requiring fault injection on the allocation side to trigger in practice. Cc: stable@vger.kernel.org Reported-by: Kun Hu <huk23@m.fudan.edu.cn> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0f2f1aa5729332612bd01fe0f2f385fd1f06ce7c.1735231717.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-01-02 10:32:10 +01:00
Pavel Begunkov	4cba441226	io_uring/rw: avoid punting to io-wq directly Commit `6e6b8c6212` upstream. kiocb_done() should care to specifically redirecting requests to io-wq. Remove the hopping to tw to then queue an io-wq, return -EAGAIN and let the core code io_uring handle offloading. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/413564e550fe23744a970e1783dfa566291b0e6f.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit `6e6b8c6212`) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-27 13:58:57 +01:00
Jens Axboe	4192884017	io_uring/rw: treat -EOPNOTSUPP for IOCB_NOWAIT like -EAGAIN Commit `c0a9d496e0` upstream. Some file systems, ocfs2 in this case, will return -EOPNOTSUPP for an IOCB_NOWAIT read/write attempt. While this can be argued to be correct, the usual return value for something that requires blocking issue is -EAGAIN. A refactoring io_uring commit dropped calling kiocb_done() for negative return values, which is otherwise where we already do that transformation. To ensure we catch it in both spots, check it in __io_read() itself as well. Reported-by: Robert Sander <r.sander@heinlein-support.de> Link: https://fosstodon.org/@gurubert@mastodon.gurubert.de/113112431889638440 Cc: stable@vger.kernel.org Fixes: `a08d195b58` ("io_uring/rw: split io_read() into a helper") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-27 13:58:57 +01:00
Jens Axboe	6c27fc6a78	io_uring/rw: split io_read() into a helper Commit `a08d195b58` upstream. Add __io_read() which does the grunt of the work, leaving the completion side to the new io_read(). No functional changes in this patch. Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> (cherry picked from commit `a08d195b58`) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-27 13:58:57 +01:00
Pavel Begunkov	2ca94c8de3	io_uring: check if iowq is killed before queuing commit `dbd2ca9367` upstream. task work can be executed after the task has gone through io_uring termination, whether it's the final task_work run or the fallback path. In this case, task work will find ->io_wq being already killed and null'ed, which is a problem if it then tries to forward the request to io_queue_iowq(). Make io_queue_iowq() fail requests in this case. Note that it also checks PF_KTHREAD, because the user can first close a DEFER_TASKRUN ring and shortly after kill the task, in which case ->iowq check would race. Cc: stable@vger.kernel.org Fixes: `50c52250e2` ("block: implement async io_uring discard cmd") Fixes: `773af69121` ("io_uring: always reissue from task_work context") Reported-by: Will <willsroot@protonmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/63312b4a2c2bb67ad67b857d17a300e1d3b078e8.1734637909.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-27 13:58:55 +01:00
Jann Horn	a73f0425f4	io_uring: Fix registered ring file refcount leak commit `12d908116f` upstream. Currently, io_uring_unreg_ringfd() (which cleans up registered rings) is only called on exit, but __io_uring_free (which frees the tctx in which the registered ring pointers are stored) is also called on execve (via begin_new_exec -> io_uring_task_cancel -> __io_uring_cancel -> io_uring_cancel_generic -> __io_uring_free). This means: A process going through execve while having registered rings will leak references to the rings' `struct file`. Fix it by zapping registered rings on execve(). This is implemented by moving the io_uring_unreg_ringfd() from io_uring_files_cancel() into its callee __io_uring_cancel(), which is called from io_uring_task_cancel() on execve. This could probably be exploited on 32-bit kernels by leaking 2^32 references to the same ring, because the file refcount is stored in a pointer-sized field and get_file() doesn't have protection against refcount overflow, just a WARN_ONCE(); but on 64-bit it should have no impact beyond a memory leak. Cc: stable@vger.kernel.org Fixes: `e7a6c00dc7` ("io_uring: add support for registering ring file descriptors") Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20241218-uring-reg-ring-cleanup-v1-1-8f63e999045b@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-12-27 13:58:55 +01:00
Jens Axboe	42882b5830	io_uring/tctx: work around xa_store() allocation error issue [ Upstream commit `7eb75ce752` ] syzbot triggered the following WARN_ON: WARNING: CPU: 0 PID: 16 at io_uring/tctx.c:51 __io_uring_free+0xfa/0x140 io_uring/tctx.c:51 which is the WARN_ON_ONCE(!xa_empty(&tctx->xa)); sanity check in __io_uring_free() when a io_uring_task is going through its final put. The syzbot test case includes injecting memory allocation failures, and it very much looks like xa_store() can fail one of its memory allocations and end up with ->head being non-NULL even though no entries exist in the xarray. Until this issue gets sorted out, work around it by attempting to iterate entries in our xarray, and WARN_ON_ONCE() if one is found. Reported-by: syzbot+cc36d44ec9f368e443d3@syzkaller.appspotmail.com Link: https://lore.kernel.org/io-uring/673c1643.050a0220.87769.0066.GAE@google.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-12-14 20:00:17 +01:00
Hagar Hemdan	950ac86cff	io_uring: fix possible deadlock in io_register_iowq_max_workers() commit `73254a297c` upstream. The io_register_iowq_max_workers() function calls io_put_sq_data(), which acquires the sqd->lock without releasing the uring_lock. Similar to the commit `009ad9f0c6` ("io_uring: drop ctx->uring_lock before acquiring sqd->lock"), this can lead to a potential deadlock situation. To resolve this issue, the uring_lock is released before calling io_put_sq_data(), and then it is re-acquired after the function call. This change ensures that the locks are acquired in the correct order, preventing the possibility of a deadlock. Suggested-by: Maximilian Heyne <mheyne@amazon.de> Signed-off-by: Hagar Hemdan <hagarhem@amazon.com> Link: https://lore.kernel.org/r/20240604130527.3597-1-hagarhem@amazon.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-17 15:08:58 +01:00
Pavel Begunkov	6a91a5816b	io_uring: always lock __io_cqring_overflow_flush commit `8d09a88ef9` upstream. Conditional locking is never great, in case of __io_cqring_overflow_flush(), which is a slow path, it's not justified. Don't handle IOPOLL separately, always grab uring_lock for overflow flushing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/162947df299aa12693ac4b305dacedab32ec7976.1712708261.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-11-08 16:28:27 +01:00
Jens Axboe	003d299696	io_uring/rw: fix missing NOWAIT check for O_DIRECT start write [ Upstream commit `1d60d74e85` ] When io_uring starts a write, it'll call kiocb_start_write() to bump the super block rwsem, preventing any freezes from happening while that write is in-flight. The freeze side will grab that rwsem for writing, excluding any new writers from happening and waiting for existing writes to finish. But io_uring unconditionally uses kiocb_start_write(), which will block if someone is currently attempting to freeze the mount point. This causes a deadlock where freeze is waiting for previous writes to complete, but the previous writes cannot complete, as the task that is supposed to complete them is blocked waiting on starting a new write. This results in the following stuck trace showing that dependency with the write blocked starting a new write: task:fio state:D stack:0 pid:886 tgid:886 ppid:876 Call trace: __switch_to+0x1d8/0x348 __schedule+0x8e8/0x2248 schedule+0x110/0x3f0 percpu_rwsem_wait+0x1e8/0x3f8 __percpu_down_read+0xe8/0x500 io_write+0xbb8/0xff8 io_issue_sqe+0x10c/0x1020 io_submit_sqes+0x614/0x2110 __arm64_sys_io_uring_enter+0x524/0x1038 invoke_syscall+0x74/0x268 el0_svc_common.constprop.0+0x160/0x238 do_el0_svc+0x44/0x60 el0_svc+0x44/0xb0 el0t_64_sync_handler+0x118/0x128 el0t_64_sync+0x168/0x170 INFO: task fsfreeze:7364 blocked for more than 15 seconds. Not tainted 6.12.0-rc5-00063-g76aaf945701c #7963 with the attempting freezer stuck trying to grab the rwsem: task:fsfreeze state:D stack:0 pid:7364 tgid:7364 ppid:995 Call trace: __switch_to+0x1d8/0x348 __schedule+0x8e8/0x2248 schedule+0x110/0x3f0 percpu_down_write+0x2b0/0x680 freeze_super+0x248/0x8a8 do_vfs_ioctl+0x149c/0x1b18 __arm64_sys_ioctl+0xd0/0x1a0 invoke_syscall+0x74/0x268 el0_svc_common.constprop.0+0x160/0x238 do_el0_svc+0x44/0x60 el0_svc+0x44/0xb0 el0t_64_sync_handler+0x118/0x128 el0t_64_sync+0x168/0x170 Fix this by having the io_uring side honor IOCB_NOWAIT, and only attempt a blocking grab of the super block rwsem if it isn't set. For normal issue where IOCB_NOWAIT would always be set, this returns -EAGAIN which will have io_uring core issue a blocking attempt of the write. That will in turn also get completions run, ensuring forward progress. Since freezing requires CAP_SYS_ADMIN in the first place, this isn't something that can be triggered by a regular user. Cc: stable@vger.kernel.org # 5.10+ Reported-by: Peter Mann <peter.mann@sh.cz> Link: https://lore.kernel.org/io-uring/38c94aec-81c9-4f62-b44e-1d87f5597644@sh.cz Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-11-08 16:28:26 +01:00
Jens Axboe	2762b3cc90	io_uring/sqpoll: close race on waiting for sqring entries commit `28aabffae6` upstream. When an application uses SQPOLL, it must wait for the SQPOLL thread to consume SQE entries, if it fails to get an sqe when calling io_uring_get_sqe(). It can do so by calling io_uring_enter(2) with the flag value of IORING_ENTER_SQ_WAIT. In liburing, this is generally done with io_uring_sqring_wait(). There's a natural expectation that once this call returns, a new SQE entry can be retrieved, filled out, and submitted. However, the kernel uses the cached sq head to determine if the SQRING is full or not. If the SQPOLL thread is currently in the process of submitting SQE entries, it may have updated the cached sq head, but not yet committed it to the SQ ring. Hence the kernel may find that there are SQE entries ready to be consumed, and return successfully to the application. If the SQPOLL thread hasn't yet committed the SQ ring entries by the time the application returns to userspace and attempts to get a new SQE, it will fail getting a new SQE. Fix this by having io_sqring_full() always use the user visible SQ ring head entry, rather than the internally cached one. Cc: stable@vger.kernel.org # 5.10+ Link: https://github.com/axboe/liburing/discussions/1267 Reported-by: Benedek Thaler <thaler@thaler.hu> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-10-22 15:46:27 +02:00
Jens Axboe	f4ce3b5d26	io_uring: check if we need to reschedule during overflow flush [ Upstream commit `eac2ca2d68` ] In terms of normal application usage, this list will always be empty. And if an application does overflow a bit, it'll have a few entries. However, nothing obviously prevents syzbot from running a test case that generates a ton of overflow entries, and then flushing them can take quite a while. Check for needing to reschedule while flushing, and drop our locks and do so if necessary. There's no state to maintain here as overflows always prune from head-of-list, hence it's fine to drop and reacquire the locks at the end of the loop. Link: https://lore.kernel.org/io-uring/66ed061d.050a0220.29194.0053.GAE@google.com/ Reported-by: syzbot+5fca234bd7eb378ff78e@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-10-17 15:24:18 +02:00
Jens Axboe	24f7989ed2	io_uring/net: harden multishot termination case for recv [ Upstream commit `c314094cb4` ] If the recv returns zero, or an error, then it doesn't matter if more data has already been received for this buffer. A condition like that should terminate the multishot receive. Rather than pass in the collected return value, pass in whether to terminate or keep the recv going separately. Note that this isn't a bug right now, as the only way to get there is via setting MSG_WAITALL with multishot receive. And if an application does that, then -EINVAL is returned anyway. But it seems like an easy bug to introduce, so let's make it a bit more explicit. Link: https://github.com/axboe/liburing/issues/1246 Cc: stable@vger.kernel.org Fixes: `b3fdea6ecb` ("io_uring: multishot recv") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-10-10 11:58:02 +02:00

1 2 3 4 5 ...

874 Commits