Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-05-28

We've added 23 non-merge commits during the last 11 day(s) which contain
a total of 45 files changed, 696 insertions(+), 277 deletions(-).

The main changes are:

1) Rename skb's mono_delivery_time to tstamp_type for extensibility
   and add SKB_CLOCK_TAI type support to bpf_skb_set_tstamp(),
   from Abhishek Chauhan.

2) Add netfilter CT zone ID and direction to bpf_ct_opts so that arbitrary
   CT zones can be used from XDP/tc BPF netfilter CT helper functions,
   from Brad Cowie.

3) Several tweaks to the instruction-set.rst IETF doc to address
   the Last Call review comments, from Dave Thaler.

4) Small batch of riscv64 BPF JIT optimizations in order to emit more
   compressed instructions to the JITed image for better icache efficiency,
   from Xiao Wang.

5) Sort bpftool C dump output from BTF, aiming to simplify vmlinux.h
   diffing and forcing more natural type definitions ordering,
   from Mykyta Yatsenko.

6) Use DEV_STATS_INC() macro in BPF redirect helpers to silence
   a syzbot/KCSAN race report for the tx_errors counter,
   from Jiang Yunshui.

7) Un-constify bpf_func_info in bpftool to fix compilation with LLVM 17+
   which started treating const structs as constants and thus breaking
   full BTF program name resolution, from Ivan Babrou.

8) Fix up BPF program numbers in test_sockmap selftest in order to reduce
   some of the test-internal array sizes, from Geliang Tang.

9) Small cleanup in Makefile.btf script to use test-ge check for v1.25-only
   pahole, from Alan Maguire.

10) Fix bpftool's make dependencies for vmlinux.h in order to avoid needless
    rebuilds in some corner cases, from Artem Savkov.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
  bpf, net: Use DEV_STAT_INC()
  bpf, docs: Fix instruction.rst indentation
  bpf, docs: Clarify call local offset
  bpf, docs: Add table captions
  bpf, docs: clarify sign extension of 64-bit use of 32-bit imm
  bpf, docs: Use RFC 2119 language for ISA requirements
  bpf, docs: Move sentence about returning R0 to abi.rst
  bpf: constify member bpf_sysctl_kern:: Table
  riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT
  riscv, bpf: Use STACK_ALIGN macro for size rounding up
  riscv, bpf: Optimize zextw insn with Zba extension
  selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets
  net: Add additional bit to support clockid_t timestamp type
  net: Rename mono_delivery_time to tstamp_type for scalabilty
  selftests/bpf: Update tests for new ct zone opts for nf_conntrack kfuncs
  net: netfilter: Make ct zone opts configurable for bpf ct helpers
  selftests/bpf: Fix prog numbers in test_sockmap
  bpf: Remove unused variable "prev_state"
  bpftool: Un-const bpf_func_info to fix it for llvm 17 and newer
  bpf: Fix order of args in call to bpf_map_kvcalloc
  ...
====================

Link: https://lore.kernel.org/r/20240528105924.30905-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
  • Loading branch information
Jakub Kicinski committed May 28, 2024
2 parents c30ff5f + d9cbd83 commit 4b3529e
Show file tree
Hide file tree
Showing 45 changed files with 696 additions and 277 deletions.
3 changes: 3 additions & 0 deletions Documentation/bpf/standardization/abi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,6 @@ The BPF calling convention is defined as:

R0 - R5 are scratch registers and BPF programs needs to spill/fill them if
necessary across calls.

The BPF program needs to store the return value into register R0 before doing an
``EXIT``.
261 changes: 152 additions & 109 deletions Documentation/bpf/standardization/instruction-set.rst

Large diffs are not rendered by default.

12 changes: 12 additions & 0 deletions arch/riscv/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -604,6 +604,18 @@ config TOOLCHAIN_HAS_VECTOR_CRYPTO
def_bool $(as-instr, .option arch$(comma) +v$(comma) +zvkb)
depends on AS_HAS_OPTION_ARCH

config RISCV_ISA_ZBA
bool "Zba extension support for bit manipulation instructions"
default y
help
Add support for enabling optimisations in the kernel when the Zba
extension is detected at boot.

The Zba extension provides instructions to accelerate the generation
of addresses that index into arrays of basic data types.

If you don't know what to do here, say Y.

config RISCV_ISA_ZBB
bool "Zbb extension support for bit manipulation instructions"
depends on TOOLCHAIN_HAS_ZBB
Expand Down
18 changes: 18 additions & 0 deletions arch/riscv/net/bpf_jit.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@ static inline bool rvc_enabled(void)
return IS_ENABLED(CONFIG_RISCV_ISA_C);
}

static inline bool rvzba_enabled(void)
{
return IS_ENABLED(CONFIG_RISCV_ISA_ZBA) && riscv_has_extension_likely(RISCV_ISA_EXT_ZBA);
}

static inline bool rvzbb_enabled(void)
{
return IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && riscv_has_extension_likely(RISCV_ISA_EXT_ZBB);
Expand Down Expand Up @@ -939,6 +944,14 @@ static inline u16 rvc_sdsp(u32 imm9, u8 rs2)
return rv_css_insn(0x7, imm, rs2, 0x2);
}

/* RV64-only ZBA instructions. */

static inline u32 rvzba_zextw(u8 rd, u8 rs1)
{
/* add.uw rd, rs1, ZERO */
return rv_r_insn(0x04, RV_REG_ZERO, rs1, 0, rd, 0x3b);
}

#endif /* __riscv_xlen == 64 */

/* Helper functions that emit RVC instructions when possible. */
Expand Down Expand Up @@ -1161,6 +1174,11 @@ static inline void emit_zexth(u8 rd, u8 rs, struct rv_jit_context *ctx)

static inline void emit_zextw(u8 rd, u8 rs, struct rv_jit_context *ctx)
{
if (rvzba_enabled()) {
emit(rvzba_zextw(rd, rs), ctx);
return;
}

emit_slli(rd, rs, 32, ctx);
emit_srli(rd, rd, 32, ctx);
}
Expand Down
12 changes: 7 additions & 5 deletions arch/riscv/net/bpf_jit_comp64.c
Original file line number Diff line number Diff line change
Expand Up @@ -537,8 +537,10 @@ static void emit_atomic(u8 rd, u8 rs, s16 off, s32 imm, bool is64,
/* r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg); */
case BPF_CMPXCHG:
r0 = bpf_to_rv_reg(BPF_REG_0, ctx);
emit(is64 ? rv_addi(RV_REG_T2, r0, 0) :
rv_addiw(RV_REG_T2, r0, 0), ctx);
if (is64)
emit_mv(RV_REG_T2, r0, ctx);
else
emit_addiw(RV_REG_T2, r0, 0, ctx);
emit(is64 ? rv_lr_d(r0, 0, rd, 0, 0) :
rv_lr_w(r0, 0, rd, 0, 0), ctx);
jmp_offset = ninsns_rvoff(8);
Expand Down Expand Up @@ -868,7 +870,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
stack_size += 8;
sreg_off = stack_size;

stack_size = round_up(stack_size, 16);
stack_size = round_up(stack_size, STACK_ALIGN);

if (!is_struct_ops) {
/* For the trampoline called from function entry,
Expand Down Expand Up @@ -1960,7 +1962,7 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog)
{
int i, stack_adjust = 0, store_offset, bpf_stack_adjust;

bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, STACK_ALIGN);
if (bpf_stack_adjust)
mark_fp(ctx);

Expand All @@ -1982,7 +1984,7 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx, bool is_subprog)
if (ctx->arena_vm_start)
stack_adjust += 8;

stack_adjust = round_up(stack_adjust, 16);
stack_adjust = round_up(stack_adjust, STACK_ALIGN);
stack_adjust += bpf_stack_adjust;

store_offset = stack_adjust - 8;
Expand Down
2 changes: 1 addition & 1 deletion include/linux/filter.h
Original file line number Diff line number Diff line change
Expand Up @@ -1406,7 +1406,7 @@ struct bpf_sock_ops_kern {

struct bpf_sysctl_kern {
struct ctl_table_header *head;
struct ctl_table *table;
const struct ctl_table *table;
void *cur_val;
size_t cur_len;
void *new_val;
Expand Down
68 changes: 50 additions & 18 deletions include/linux/skbuff.h
Original file line number Diff line number Diff line change
Expand Up @@ -706,6 +706,13 @@ typedef unsigned int sk_buff_data_t;
typedef unsigned char *sk_buff_data_t;
#endif

enum skb_tstamp_type {
SKB_CLOCK_REALTIME,
SKB_CLOCK_MONOTONIC,
SKB_CLOCK_TAI,
__SKB_CLOCK_MAX = SKB_CLOCK_TAI,
};

/**
* DOC: Basic sk_buff geometry
*
Expand Down Expand Up @@ -823,10 +830,8 @@ typedef unsigned char *sk_buff_data_t;
* @dst_pending_confirm: need to confirm neighbour
* @decrypted: Decrypted SKB
* @slow_gro: state present at GRO time, slower prepare step required
* @mono_delivery_time: When set, skb->tstamp has the
* delivery_time in mono clock base (i.e. EDT). Otherwise, the
* skb->tstamp has the (rcv) timestamp at ingress and
* delivery_time at egress.
* @tstamp_type: When set, skb->tstamp has the
* delivery_time clock base of skb->tstamp.
* @napi_id: id of the NAPI struct this skb came from
* @sender_cpu: (aka @napi_id) source CPU in XPS
* @alloc_cpu: CPU which did the skb allocation.
Expand Down Expand Up @@ -954,7 +959,7 @@ struct sk_buff {
/* private: */
__u8 __mono_tc_offset[0];
/* public: */
__u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */
__u8 tstamp_type:2; /* See skb_tstamp_type */
#ifdef CONFIG_NET_XGRESS
__u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
__u8 tc_skip_classify:1;
Expand Down Expand Up @@ -1084,15 +1089,16 @@ struct sk_buff {
#endif
#define PKT_TYPE_OFFSET offsetof(struct sk_buff, __pkt_type_offset)

/* if you move tc_at_ingress or mono_delivery_time
/* if you move tc_at_ingress or tstamp_type
* around, you also must adapt these constants.
*/
#ifdef __BIG_ENDIAN_BITFIELD
#define SKB_MONO_DELIVERY_TIME_MASK (1 << 7)
#define TC_AT_INGRESS_MASK (1 << 6)
#define SKB_TSTAMP_TYPE_MASK (3 << 6)
#define SKB_TSTAMP_TYPE_RSHIFT (6)
#define TC_AT_INGRESS_MASK (1 << 5)
#else
#define SKB_MONO_DELIVERY_TIME_MASK (1 << 0)
#define TC_AT_INGRESS_MASK (1 << 1)
#define SKB_TSTAMP_TYPE_MASK (3)
#define TC_AT_INGRESS_MASK (1 << 2)
#endif
#define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset)

Expand Down Expand Up @@ -4179,7 +4185,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb,
static inline void __net_timestamp(struct sk_buff *skb)
{
skb->tstamp = ktime_get_real();
skb->mono_delivery_time = 0;
skb->tstamp_type = SKB_CLOCK_REALTIME;
}

static inline ktime_t net_timedelta(ktime_t t)
Expand All @@ -4188,10 +4194,36 @@ static inline ktime_t net_timedelta(ktime_t t)
}

static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
bool mono)
u8 tstamp_type)
{
skb->tstamp = kt;
skb->mono_delivery_time = kt && mono;

if (kt)
skb->tstamp_type = tstamp_type;
else
skb->tstamp_type = SKB_CLOCK_REALTIME;
}

static inline void skb_set_delivery_type_by_clockid(struct sk_buff *skb,
ktime_t kt, clockid_t clockid)
{
u8 tstamp_type = SKB_CLOCK_REALTIME;

switch (clockid) {
case CLOCK_REALTIME:
break;
case CLOCK_MONOTONIC:
tstamp_type = SKB_CLOCK_MONOTONIC;
break;
case CLOCK_TAI:
tstamp_type = SKB_CLOCK_TAI;
break;
default:
WARN_ON_ONCE(1);
kt = 0;
}

skb_set_delivery_time(skb, kt, tstamp_type);
}

DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
Expand All @@ -4201,8 +4233,8 @@ DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
*/
static inline void skb_clear_delivery_time(struct sk_buff *skb)
{
if (skb->mono_delivery_time) {
skb->mono_delivery_time = 0;
if (skb->tstamp_type) {
skb->tstamp_type = SKB_CLOCK_REALTIME;
if (static_branch_unlikely(&netstamp_needed_key))
skb->tstamp = ktime_get_real();
else
Expand All @@ -4212,23 +4244,23 @@ static inline void skb_clear_delivery_time(struct sk_buff *skb)

static inline void skb_clear_tstamp(struct sk_buff *skb)
{
if (skb->mono_delivery_time)
if (skb->tstamp_type)
return;

skb->tstamp = 0;
}

static inline ktime_t skb_tstamp(const struct sk_buff *skb)
{
if (skb->mono_delivery_time)
if (skb->tstamp_type)
return 0;

return skb->tstamp;
}

static inline ktime_t skb_tstamp_cond(const struct sk_buff *skb, bool cond)
{
if (!skb->mono_delivery_time && skb->tstamp)
if (skb->tstamp_type != SKB_CLOCK_MONOTONIC && skb->tstamp)
return skb->tstamp;

if (static_branch_unlikely(&netstamp_needed_key) || cond)
Expand Down
4 changes: 2 additions & 2 deletions include/net/inet_frag.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ struct frag_v6_compare_key {
* @stamp: timestamp of the last received fragment
* @len: total length of the original datagram
* @meat: length of received fragments so far
* @mono_delivery_time: stamp has a mono delivery time (EDT)
* @tstamp_type: stamp has a mono delivery time (EDT)
* @flags: fragment queue flags
* @max_size: maximum received fragment size
* @fqdir: pointer to struct fqdir
Expand All @@ -97,7 +97,7 @@ struct inet_frag_queue {
ktime_t stamp;
int len;
int meat;
u8 mono_delivery_time;
u8 tstamp_type;
__u8 flags;
u16 max_size;
struct fqdir *fqdir;
Expand Down
15 changes: 10 additions & 5 deletions include/uapi/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -6207,12 +6207,17 @@ union { \
__u64 :64; \
} __attribute__((aligned(8)))

/* The enum used in skb->tstamp_type. It specifies the clock type
* of the time stored in the skb->tstamp.
*/
enum {
BPF_SKB_TSTAMP_UNSPEC,
BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
/* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
* the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
* and try to deduce it by ingress, egress or skb->sk->sk_clockid.
BPF_SKB_TSTAMP_UNSPEC = 0, /* DEPRECATED */
BPF_SKB_TSTAMP_DELIVERY_MONO = 1, /* DEPRECATED */
BPF_SKB_CLOCK_REALTIME = 0,
BPF_SKB_CLOCK_MONOTONIC = 1,
BPF_SKB_CLOCK_TAI = 2,
/* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
* the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
*/
};

Expand Down
4 changes: 2 additions & 2 deletions kernel/bpf/bpf_local_storage.c
Original file line number Diff line number Diff line change
Expand Up @@ -782,8 +782,8 @@ bpf_local_storage_map_alloc(union bpf_attr *attr,
nbuckets = max_t(u32, 2, nbuckets);
smap->bucket_log = ilog2(nbuckets);

smap->buckets = bpf_map_kvcalloc(&smap->map, sizeof(*smap->buckets),
nbuckets, GFP_USER | __GFP_NOWARN);
smap->buckets = bpf_map_kvcalloc(&smap->map, nbuckets,
sizeof(*smap->buckets), GFP_USER | __GFP_NOWARN);
if (!smap->buckets) {
err = -ENOMEM;
goto free_smap;
Expand Down
6 changes: 3 additions & 3 deletions net/bridge/netfilter/nf_conntrack_bridge.c
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
struct sk_buff *))
{
int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
bool mono_delivery_time = skb->mono_delivery_time;
u8 tstamp_type = skb->tstamp_type;
unsigned int hlen, ll_rs, mtu;
ktime_t tstamp = skb->tstamp;
struct ip_frag_state state;
Expand Down Expand Up @@ -82,7 +82,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
if (iter.frag)
ip_fraglist_prepare(skb, &iter);

skb_set_delivery_time(skb, tstamp, mono_delivery_time);
skb_set_delivery_time(skb, tstamp, tstamp_type);
err = output(net, sk, data, skb);
if (err || !iter.frag)
break;
Expand Down Expand Up @@ -113,7 +113,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
goto blackhole;
}

skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
skb_set_delivery_time(skb2, tstamp, tstamp_type);
err = output(net, sk, data, skb2);
if (err)
goto blackhole;
Expand Down
2 changes: 1 addition & 1 deletion net/core/dev.c
Original file line number Diff line number Diff line change
Expand Up @@ -2160,7 +2160,7 @@ EXPORT_SYMBOL(net_disable_timestamp);
static inline void net_timestamp_set(struct sk_buff *skb)
{
skb->tstamp = 0;
skb->mono_delivery_time = 0;
skb->tstamp_type = SKB_CLOCK_REALTIME;
if (static_branch_unlikely(&netstamp_needed_key))
skb->tstamp = ktime_get_real();
}
Expand Down
Loading

0 comments on commit 4b3529e

Please sign in to comment.