-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'io_uring-zero-copy-rx'
David Wei says: ==================== io_uring zero copy rx This patchset contains net/ patches needed by a new io_uring request implementing zero copy rx into userspace pages, eliminating a kernel to user copy. We configure a page pool that a driver uses to fill a hw rx queue to hand out user pages instead of kernel pages. Any data that ends up hitting this hw rx queue will thus be dma'd into userspace memory directly, without needing to be bounced through kernel memory. 'Reading' data out of a socket instead becomes a _notification_ mechanism, where the kernel tells userspace where the data is. The overall approach is similar to the devmem TCP proposal. This relies on hw header/data split, flow steering and RSS to ensure packet headers remain in kernel memory and only desired flows hit a hw rx queue configured for zero copy. Configuring this is outside of the scope of this patchset. We share netdev core infra with devmem TCP. The main difference is that io_uring is used for the uAPI and the lifetime of all objects are bound to an io_uring instance. Data is 'read' using a new io_uring request type. When done, data is returned via a new shared refill queue. A zero copy page pool refills a hw rx queue from this refill queue directly. Of course, the lifetime of these data buffers are managed by io_uring rather than the networking stack, with different refcounting rules. This patchset is the first step adding basic zero copy support. We will extend this iteratively with new features e.g. dynamically allocated zero copy areas, THP support, dmabuf support, improved copy fallback, general optimisations and more. In terms of netdev support, we're first targeting Broadcom bnxt. Patches aren't included since Taehee Yoo has already sent a more comprehensive patchset adding support in [1]. Google gve should already support this, and Mellanox mlx5 support is WIP pending driver changes. =========== Performance =========== Note: Comparison with epoll + TCP_ZEROCOPY_RECEIVE isn't done yet. Test setup: * AMD EPYC 9454 * Broadcom BCM957508 200G * Kernel v6.11 base [2] * liburing fork [3] * kperf fork [4] * 4K MTU * Single TCP flow With application thread + net rx softirq pinned to _different_ cores: +-------------------------------+ | epoll | io_uring | |-----------|-------------------| | 82.2 Gbps | 116.2 Gbps (+41%) | +-------------------------------+ Pinned to _same_ core: +-------------------------------+ | epoll | io_uring | |-----------|-------------------| | 62.6 Gbps | 80.9 Gbps (+29%) | +-------------------------------+ ===== Links ===== Broadcom bnxt support: [1]: https://lore.kernel.org/20241003160620.1521626-8-ap420073@gmail.com Linux kernel branch including io_uring bits: [2]: https://github.com/isilence/linux.git zcrx/v13 liburing for testing: [3]: https://github.com/isilence/liburing.git zcrx/next kperf for testing: [4]: https://git.kernel.dk/kperf.git ==================== Link: https://patch.msgid.link/20250204215622.695511-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Loading branch information
Showing
14 changed files
with
321 additions
and
81 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
/* SPDX-License-Identifier: GPL-2.0 */ | ||
#ifndef _NET_PAGE_POOL_MEMORY_PROVIDER_H | ||
#define _NET_PAGE_POOL_MEMORY_PROVIDER_H | ||
|
||
#include <net/netmem.h> | ||
#include <net/page_pool/types.h> | ||
|
||
struct netdev_rx_queue; | ||
struct sk_buff; | ||
|
||
struct memory_provider_ops { | ||
netmem_ref (*alloc_netmems)(struct page_pool *pool, gfp_t gfp); | ||
bool (*release_netmem)(struct page_pool *pool, netmem_ref netmem); | ||
int (*init)(struct page_pool *pool); | ||
void (*destroy)(struct page_pool *pool); | ||
int (*nl_fill)(void *mp_priv, struct sk_buff *rsp, | ||
struct netdev_rx_queue *rxq); | ||
void (*uninstall)(void *mp_priv, struct netdev_rx_queue *rxq); | ||
}; | ||
|
||
bool net_mp_niov_set_dma_addr(struct net_iov *niov, dma_addr_t addr); | ||
void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov); | ||
void net_mp_niov_clear_page_pool(struct net_iov *niov); | ||
|
||
int net_mp_open_rxq(struct net_device *dev, unsigned ifq_idx, | ||
struct pp_memory_provider_params *p); | ||
void net_mp_close_rxq(struct net_device *dev, unsigned ifq_idx, | ||
struct pp_memory_provider_params *old_p); | ||
|
||
/** | ||
* net_mp_netmem_place_in_cache() - give a netmem to a page pool | ||
* @pool: the page pool to place the netmem into | ||
* @netmem: netmem to give | ||
* | ||
* Push an accounted netmem into the page pool's allocation cache. The caller | ||
* must ensure that there is space in the cache. It should only be called off | ||
* the mp_ops->alloc_netmems() path. | ||
*/ | ||
static inline void net_mp_netmem_place_in_cache(struct page_pool *pool, | ||
netmem_ref netmem) | ||
{ | ||
pool->alloc.cache[pool->alloc.count++] = netmem; | ||
} | ||
|
||
#endif |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.