Skip to content

Commit

Permalink
Merge branch 'bpf: bpf memory usage'
Browse files Browse the repository at this point in the history
Yafang Shao says:

====================

Currently we can't get bpf memory usage reliably either from memcg or
from bpftool.

In memcg, there's not a 'bpf' item in memory.stat, but only 'kernel',
'sock', 'vmalloc' and 'percpu' which may related to bpf memory. With
these items we still can't get the bpf memory usage, because bpf memory
usage may far less than the kmem in a memcg, for example, the dentry may
consume lots of kmem.

bpftool now shows the bpf memory footprint, which is difference with bpf
memory usage. The difference can be quite great in some cases, for example,

- non-preallocated bpf map
  The non-preallocated bpf map memory usage is dynamically changed. The
  allocated elements count can be from 0 to the max entries. But the
  memory footprint in bpftool only shows a fixed number.

- bpf metadata consumes more memory than bpf element
  In some corner cases, the bpf metadata can consumes a lot more memory
  than bpf element consumes. For example, it can happen when the element
  size is quite small.

- some maps don't have key, value or max_entries
  For example the key_size and value_size of ringbuf is 0, so its
  memlock is always 0.

We need a way to show the bpf memory usage especially there will be more
and more bpf programs running on the production environment and thus the
bpf memory usage is not trivial.

This patchset introduces a new map ops ->map_mem_usage to calculate the
memory usage. Note that we don't intend to make the memory usage 100%
accurate, while our goal is to make sure there is only a small difference
between what bpftool reports and the real memory. That small difference
can be ignored compared to the total usage.  That is enough to monitor
the bpf memory usage. For example, the user can rely on this value to
monitor the trend of bpf memory usage, compare the difference in bpf
memory usage between different bpf program versions, figure out which
maps consume large memory, and etc.

This patchset implements the bpf memory usage for all maps, and yet there's
still work to do. We don't want to introduce runtime overhead in the
element update and delete path, but we have to do it for some
non-preallocated maps,
- devmap, xskmap
  When we update or delete an element, it will allocate or free memory.
  In order to track this dynamic memory, we have to track the count in
  element update and delete path.

- cpumap
  The element size of each cpumap element is not determinated. If we
  want to track the usage, we have to count the size of all elements in
  the element update and delete path. So I just put it aside currently.

- local_storage, bpf_local_storage
  When we attach or detach a cgroup, it will allocate or free memory. If
  we want to track the dynamic memory, we also need to do something in
  the update and delete path. So I just put it aside currently.

- offload map
  The element update and delete of offload map is via the netdev dev_ops,
  in which it may dynamically allocate or free memory, but this dynamic
  memory isn't counted in offload map memory usage currently.

The result of each map can be found in the individual patch.

We may also need to track per-container bpf memory usage, that will be
addressed by a different patchset.

Changes:
v3->v4: code improvement on ringbuf (Andrii)
        use READ_ONCE() to read lpm_trie (Tao)
        explain why we can't get bpf memory usage from memcg.
v2->v3: check callback at map creation time and avoid warning (Alexei)
        fix build error under CONFIG_BPF=n (lkp@intel.com)
v1->v2: calculate the memory usage within bpf (Alexei)
- [v1] bpf, mm: bpf memory usage
  https://lwn.net/Articles/921991/
- [RFC PATCH v2] mm, bpf: Add BPF into /proc/meminfo
  https://lwn.net/Articles/919848/
- [RFC PATCH v1] mm, bpf: Add BPF into /proc/meminfo
  https://lwn.net/Articles/917647/
- [RFC PATCH] bpf, mm: Add a new item bpf into memory.stat
  https://lore.kernel.org/bpf/20220921170002.29557-1-laoar.shao@gmail].com/
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
  • Loading branch information
Alexei Starovoitov committed Mar 7, 2023
2 parents 2d5bcdc + 6b4a6ea commit a73dc91
Show file tree
Hide file tree
Showing 24 changed files with 273 additions and 15 deletions.
8 changes: 8 additions & 0 deletions include/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,8 @@ struct bpf_map_ops {
bpf_callback_t callback_fn,
void *callback_ctx, u64 flags);

u64 (*map_mem_usage)(const struct bpf_map *map);

/* BTF id of struct allocated by map_alloc */
int *map_btf_id;

Expand Down Expand Up @@ -2622,6 +2624,7 @@ static inline bool bpf_map_is_offloaded(struct bpf_map *map)

struct bpf_map *bpf_map_offload_map_alloc(union bpf_attr *attr);
void bpf_map_offload_map_free(struct bpf_map *map);
u64 bpf_map_offload_map_mem_usage(const struct bpf_map *map);
int bpf_prog_test_run_syscall(struct bpf_prog *prog,
const union bpf_attr *kattr,
union bpf_attr __user *uattr);
Expand Down Expand Up @@ -2693,6 +2696,11 @@ static inline void bpf_map_offload_map_free(struct bpf_map *map)
{
}

static inline u64 bpf_map_offload_map_mem_usage(const struct bpf_map *map)
{
return 0;
}

static inline int bpf_prog_test_run_syscall(struct bpf_prog *prog,
const union bpf_attr *kattr,
union bpf_attr __user *uattr)
Expand Down
1 change: 1 addition & 0 deletions include/linux/bpf_local_storage.h
Original file line number Diff line number Diff line change
Expand Up @@ -164,5 +164,6 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
void *value, u64 map_flags, gfp_t gfp_flags);

void bpf_local_storage_free_rcu(struct rcu_head *rcu);
u64 bpf_local_storage_map_mem_usage(const struct bpf_map *map);

#endif /* _BPF_LOCAL_STORAGE_H */
1 change: 1 addition & 0 deletions include/net/xdp_sock.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ struct xdp_umem {
struct xsk_map {
struct bpf_map map;
spinlock_t lock; /* Synchronize map updates */
atomic_t count;
struct xdp_sock __rcu *xsk_map[];
};

Expand Down
28 changes: 28 additions & 0 deletions kernel/bpf/arraymap.c
Original file line number Diff line number Diff line change
Expand Up @@ -721,6 +721,28 @@ static int bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback_
return num_elems;
}

static u64 array_map_mem_usage(const struct bpf_map *map)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
bool percpu = map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
u32 elem_size = array->elem_size;
u64 entries = map->max_entries;
u64 usage = sizeof(*array);

if (percpu) {
usage += entries * sizeof(void *);
usage += entries * elem_size * num_possible_cpus();
} else {
if (map->map_flags & BPF_F_MMAPABLE) {
usage = PAGE_ALIGN(usage);
usage += PAGE_ALIGN(entries * elem_size);
} else {
usage += entries * elem_size;
}
}
return usage;
}

BTF_ID_LIST_SINGLE(array_map_btf_ids, struct, bpf_array)
const struct bpf_map_ops array_map_ops = {
.map_meta_equal = array_map_meta_equal,
Expand All @@ -742,6 +764,7 @@ const struct bpf_map_ops array_map_ops = {
.map_update_batch = generic_map_update_batch,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_array_elem,
.map_mem_usage = array_map_mem_usage,
.map_btf_id = &array_map_btf_ids[0],
.iter_seq_info = &iter_seq_info,
};
Expand All @@ -762,6 +785,7 @@ const struct bpf_map_ops percpu_array_map_ops = {
.map_update_batch = generic_map_update_batch,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_array_elem,
.map_mem_usage = array_map_mem_usage,
.map_btf_id = &array_map_btf_ids[0],
.iter_seq_info = &iter_seq_info,
};
Expand Down Expand Up @@ -1156,6 +1180,7 @@ const struct bpf_map_ops prog_array_map_ops = {
.map_fd_sys_lookup_elem = prog_fd_array_sys_lookup_elem,
.map_release_uref = prog_array_map_clear,
.map_seq_show_elem = prog_array_map_seq_show_elem,
.map_mem_usage = array_map_mem_usage,
.map_btf_id = &array_map_btf_ids[0],
};

Expand Down Expand Up @@ -1257,6 +1282,7 @@ const struct bpf_map_ops perf_event_array_map_ops = {
.map_fd_put_ptr = perf_event_fd_array_put_ptr,
.map_release = perf_event_fd_array_release,
.map_check_btf = map_check_no_btf,
.map_mem_usage = array_map_mem_usage,
.map_btf_id = &array_map_btf_ids[0],
};

Expand Down Expand Up @@ -1291,6 +1317,7 @@ const struct bpf_map_ops cgroup_array_map_ops = {
.map_fd_get_ptr = cgroup_fd_array_get_ptr,
.map_fd_put_ptr = cgroup_fd_array_put_ptr,
.map_check_btf = map_check_no_btf,
.map_mem_usage = array_map_mem_usage,
.map_btf_id = &array_map_btf_ids[0],
};
#endif
Expand Down Expand Up @@ -1379,5 +1406,6 @@ const struct bpf_map_ops array_of_maps_map_ops = {
.map_lookup_batch = generic_map_lookup_batch,
.map_update_batch = generic_map_update_batch,
.map_check_btf = map_check_no_btf,
.map_mem_usage = array_map_mem_usage,
.map_btf_id = &array_map_btf_ids[0],
};
12 changes: 12 additions & 0 deletions kernel/bpf/bloom_filter.c
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,17 @@ static int bloom_map_check_btf(const struct bpf_map *map,
return btf_type_is_void(key_type) ? 0 : -EINVAL;
}

static u64 bloom_map_mem_usage(const struct bpf_map *map)
{
struct bpf_bloom_filter *bloom;
u64 bitset_bytes;

bloom = container_of(map, struct bpf_bloom_filter, map);
bitset_bytes = BITS_TO_BYTES((u64)bloom->bitset_mask + 1);
bitset_bytes = roundup(bitset_bytes, sizeof(unsigned long));
return sizeof(*bloom) + bitset_bytes;
}

BTF_ID_LIST_SINGLE(bpf_bloom_map_btf_ids, struct, bpf_bloom_filter)
const struct bpf_map_ops bloom_filter_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
Expand All @@ -206,5 +217,6 @@ const struct bpf_map_ops bloom_filter_map_ops = {
.map_update_elem = bloom_map_update_elem,
.map_delete_elem = bloom_map_delete_elem,
.map_check_btf = bloom_map_check_btf,
.map_mem_usage = bloom_map_mem_usage,
.map_btf_id = &bpf_bloom_map_btf_ids[0],
};
1 change: 1 addition & 0 deletions kernel/bpf/bpf_cgrp_storage.c
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,7 @@ const struct bpf_map_ops cgrp_storage_map_ops = {
.map_update_elem = bpf_cgrp_storage_update_elem,
.map_delete_elem = bpf_cgrp_storage_delete_elem,
.map_check_btf = bpf_local_storage_map_check_btf,
.map_mem_usage = bpf_local_storage_map_mem_usage,
.map_btf_id = &bpf_local_storage_map_btf_id[0],
.map_owner_storage_ptr = cgroup_storage_ptr,
};
Expand Down
1 change: 1 addition & 0 deletions kernel/bpf/bpf_inode_storage.c
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@ const struct bpf_map_ops inode_storage_map_ops = {
.map_update_elem = bpf_fd_inode_storage_update_elem,
.map_delete_elem = bpf_fd_inode_storage_delete_elem,
.map_check_btf = bpf_local_storage_map_check_btf,
.map_mem_usage = bpf_local_storage_map_mem_usage,
.map_btf_id = &bpf_local_storage_map_btf_id[0],
.map_owner_storage_ptr = inode_storage_ptr,
};
Expand Down
10 changes: 10 additions & 0 deletions kernel/bpf/bpf_local_storage.c
Original file line number Diff line number Diff line change
Expand Up @@ -685,6 +685,16 @@ bool bpf_local_storage_unlink_nolock(struct bpf_local_storage *local_storage)
return free_storage;
}

u64 bpf_local_storage_map_mem_usage(const struct bpf_map *map)
{
struct bpf_local_storage_map *smap = (struct bpf_local_storage_map *)map;
u64 usage = sizeof(*smap);

/* The dynamically callocated selems are not counted currently. */
usage += sizeof(*smap->buckets) * (1ULL << smap->bucket_log);
return usage;
}

struct bpf_map *
bpf_local_storage_map_alloc(union bpf_attr *attr,
struct bpf_local_storage_cache *cache)
Expand Down
16 changes: 16 additions & 0 deletions kernel/bpf/bpf_struct_ops.c
Original file line number Diff line number Diff line change
Expand Up @@ -641,6 +641,21 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
return map;
}

static u64 bpf_struct_ops_map_mem_usage(const struct bpf_map *map)
{
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;
const struct bpf_struct_ops *st_ops = st_map->st_ops;
const struct btf_type *vt = st_ops->value_type;
u64 usage;

usage = sizeof(*st_map) +
vt->size - sizeof(struct bpf_struct_ops_value);
usage += vt->size;
usage += btf_type_vlen(vt) * sizeof(struct bpf_links *);
usage += PAGE_SIZE;
return usage;
}

BTF_ID_LIST_SINGLE(bpf_struct_ops_map_btf_ids, struct, bpf_struct_ops_map)
const struct bpf_map_ops bpf_struct_ops_map_ops = {
.map_alloc_check = bpf_struct_ops_map_alloc_check,
Expand All @@ -651,6 +666,7 @@ const struct bpf_map_ops bpf_struct_ops_map_ops = {
.map_delete_elem = bpf_struct_ops_map_delete_elem,
.map_update_elem = bpf_struct_ops_map_update_elem,
.map_seq_show_elem = bpf_struct_ops_map_seq_show_elem,
.map_mem_usage = bpf_struct_ops_map_mem_usage,
.map_btf_id = &bpf_struct_ops_map_btf_ids[0],
};

Expand Down
1 change: 1 addition & 0 deletions kernel/bpf/bpf_task_storage.c
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,7 @@ const struct bpf_map_ops task_storage_map_ops = {
.map_update_elem = bpf_pid_task_storage_update_elem,
.map_delete_elem = bpf_pid_task_storage_delete_elem,
.map_check_btf = bpf_local_storage_map_check_btf,
.map_mem_usage = bpf_local_storage_map_mem_usage,
.map_btf_id = &bpf_local_storage_map_btf_id[0],
.map_owner_storage_ptr = task_storage_ptr,
};
Expand Down
10 changes: 10 additions & 0 deletions kernel/bpf/cpumap.c
Original file line number Diff line number Diff line change
Expand Up @@ -673,6 +673,15 @@ static int cpu_map_redirect(struct bpf_map *map, u64 index, u64 flags)
__cpu_map_lookup_elem);
}

static u64 cpu_map_mem_usage(const struct bpf_map *map)
{
u64 usage = sizeof(struct bpf_cpu_map);

/* Currently the dynamically allocated elements are not counted */
usage += (u64)map->max_entries * sizeof(struct bpf_cpu_map_entry *);
return usage;
}

BTF_ID_LIST_SINGLE(cpu_map_btf_ids, struct, bpf_cpu_map)
const struct bpf_map_ops cpu_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
Expand All @@ -683,6 +692,7 @@ const struct bpf_map_ops cpu_map_ops = {
.map_lookup_elem = cpu_map_lookup_elem,
.map_get_next_key = cpu_map_get_next_key,
.map_check_btf = map_check_no_btf,
.map_mem_usage = cpu_map_mem_usage,
.map_btf_id = &cpu_map_btf_ids[0],
.map_redirect = cpu_map_redirect,
};
Expand Down
26 changes: 24 additions & 2 deletions kernel/bpf/devmap.c
Original file line number Diff line number Diff line change
Expand Up @@ -819,8 +819,10 @@ static int dev_map_delete_elem(struct bpf_map *map, void *key)
return -EINVAL;

old_dev = unrcu_pointer(xchg(&dtab->netdev_map[k], NULL));
if (old_dev)
if (old_dev) {
call_rcu(&old_dev->rcu, __dev_map_entry_free);
atomic_dec((atomic_t *)&dtab->items);
}
return 0;
}

Expand Down Expand Up @@ -931,6 +933,8 @@ static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
old_dev = unrcu_pointer(xchg(&dtab->netdev_map[i], RCU_INITIALIZER(dev)));
if (old_dev)
call_rcu(&old_dev->rcu, __dev_map_entry_free);
else
atomic_inc((atomic_t *)&dtab->items);

return 0;
}
Expand Down Expand Up @@ -1016,6 +1020,20 @@ static int dev_hash_map_redirect(struct bpf_map *map, u64 ifindex, u64 flags)
__dev_map_hash_lookup_elem);
}

static u64 dev_map_mem_usage(const struct bpf_map *map)
{
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
u64 usage = sizeof(struct bpf_dtab);

if (map->map_type == BPF_MAP_TYPE_DEVMAP_HASH)
usage += (u64)dtab->n_buckets * sizeof(struct hlist_head);
else
usage += (u64)map->max_entries * sizeof(struct bpf_dtab_netdev *);
usage += atomic_read((atomic_t *)&dtab->items) *
(u64)sizeof(struct bpf_dtab_netdev);
return usage;
}

BTF_ID_LIST_SINGLE(dev_map_btf_ids, struct, bpf_dtab)
const struct bpf_map_ops dev_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
Expand All @@ -1026,6 +1044,7 @@ const struct bpf_map_ops dev_map_ops = {
.map_update_elem = dev_map_update_elem,
.map_delete_elem = dev_map_delete_elem,
.map_check_btf = map_check_no_btf,
.map_mem_usage = dev_map_mem_usage,
.map_btf_id = &dev_map_btf_ids[0],
.map_redirect = dev_map_redirect,
};
Expand All @@ -1039,6 +1058,7 @@ const struct bpf_map_ops dev_map_hash_ops = {
.map_update_elem = dev_map_hash_update_elem,
.map_delete_elem = dev_map_hash_delete_elem,
.map_check_btf = map_check_no_btf,
.map_mem_usage = dev_map_mem_usage,
.map_btf_id = &dev_map_btf_ids[0],
.map_redirect = dev_hash_map_redirect,
};
Expand Down Expand Up @@ -1109,9 +1129,11 @@ static int dev_map_notification(struct notifier_block *notifier,
if (!dev || netdev != dev->dev)
continue;
odev = unrcu_pointer(cmpxchg(&dtab->netdev_map[i], RCU_INITIALIZER(dev), NULL));
if (dev == odev)
if (dev == odev) {
call_rcu(&dev->rcu,
__dev_map_entry_free);
atomic_dec((atomic_t *)&dtab->items);
}
}
}
rcu_read_unlock();
Expand Down
43 changes: 43 additions & 0 deletions kernel/bpf/hashtab.c
Original file line number Diff line number Diff line change
Expand Up @@ -2190,6 +2190,44 @@ static int bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_f
return num_elems;
}

static u64 htab_map_mem_usage(const struct bpf_map *map)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
u32 value_size = round_up(htab->map.value_size, 8);
bool prealloc = htab_is_prealloc(htab);
bool percpu = htab_is_percpu(htab);
bool lru = htab_is_lru(htab);
u64 num_entries;
u64 usage = sizeof(struct bpf_htab);

usage += sizeof(struct bucket) * htab->n_buckets;
usage += sizeof(int) * num_possible_cpus() * HASHTAB_MAP_LOCK_COUNT;
if (prealloc) {
num_entries = map->max_entries;
if (htab_has_extra_elems(htab))
num_entries += num_possible_cpus();

usage += htab->elem_size * num_entries;

if (percpu)
usage += value_size * num_possible_cpus() * num_entries;
else if (!lru)
usage += sizeof(struct htab_elem *) * num_possible_cpus();
} else {
#define LLIST_NODE_SZ sizeof(struct llist_node)

num_entries = htab->use_percpu_counter ?
percpu_counter_sum(&htab->pcount) :
atomic_read(&htab->count);
usage += (htab->elem_size + LLIST_NODE_SZ) * num_entries;
if (percpu) {
usage += (LLIST_NODE_SZ + sizeof(void *)) * num_entries;
usage += value_size * num_possible_cpus() * num_entries;
}
}
return usage;
}

BTF_ID_LIST_SINGLE(htab_map_btf_ids, struct, bpf_htab)
const struct bpf_map_ops htab_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
Expand All @@ -2206,6 +2244,7 @@ const struct bpf_map_ops htab_map_ops = {
.map_seq_show_elem = htab_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab),
.map_btf_id = &htab_map_btf_ids[0],
.iter_seq_info = &iter_seq_info,
Expand All @@ -2227,6 +2266,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
.map_seq_show_elem = htab_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_lru),
.map_btf_id = &htab_map_btf_ids[0],
.iter_seq_info = &iter_seq_info,
Expand Down Expand Up @@ -2378,6 +2418,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
.map_seq_show_elem = htab_percpu_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_percpu),
.map_btf_id = &htab_map_btf_ids[0],
.iter_seq_info = &iter_seq_info,
Expand All @@ -2397,6 +2438,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = {
.map_seq_show_elem = htab_percpu_map_seq_show_elem,
.map_set_for_each_callback_args = map_set_for_each_callback_args,
.map_for_each_callback = bpf_for_each_hash_elem,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab_lru_percpu),
.map_btf_id = &htab_map_btf_ids[0],
.iter_seq_info = &iter_seq_info,
Expand Down Expand Up @@ -2534,6 +2576,7 @@ const struct bpf_map_ops htab_of_maps_map_ops = {
.map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem,
.map_gen_lookup = htab_of_map_gen_lookup,
.map_check_btf = map_check_no_btf,
.map_mem_usage = htab_map_mem_usage,
BATCH_OPS(htab),
.map_btf_id = &htab_map_btf_ids[0],
};
Loading

0 comments on commit a73dc91

Please sign in to comment.