From d67f39d2b81b6a8259944d2400c1ff4fe283ff72 Mon Sep 17 00:00:00 2001 From: "wuqiang.matt" Date: Fri, 1 Dec 2023 14:53:55 +0900 Subject: [PATCH 1/3] lib: objpool: fix head overrun on RK3588 SBC objpool overrun stress with test_objpool on OrangePi5+ SBC triggered the following kernel warnings: WARNING: CPU: 6 PID: 3115 at lib/objpool.c:168 objpool_push+0xc0/0x100 This message is from objpool.c:168: WARN_ON_ONCE(tail - head > pool->nr_objs); The overrun test case is to validate the case that pre-allocated objects are insufficient: 8 objects are pre-allocated for each node and consumer thread per node tries to grab 16 objects in a row. The testing system is OrangePI 5+, with RK3588, a big.LITTLE SOC with 4x A76 and 4x A55. When disabling either all 4 big or 4 little cores, the overrun tests run well, and once with big and little cores mixed together, the overrun test would always cause an overrun loop. It's likely the memory timing differences of big and little cores cause this trouble. Here are the debugging data of objpool_try_get_slot after try_cmpxchg_release: objpool_pop: cpu: 4/0 0:0 head: 278/279 tail:278 last:276/278 The local copies of 'head' and 'last' were 278 and 276, and reloading of 'slot->head' and 'slot->last' got 279 and 278. After try_cmpxchg_release 'slot->head' became 'head + 1', which is correct. But what's wrong here is the stale value of 'last', and that stale value of 'last' finally led the overrun of 'head'. Memory updating of 'last' and 'head' are performed in push() and pop() independently, which could be the culprit leading this out of order visibility of 'last' and 'head'. So for objpool_try_get_slot(), it's not enough only checking the condition of 'head != slot', the implicit condition 'last - head <= nr_objs' must also be explicitly asserted to guarantee 'last' is always behind 'head' before the object retrieving. This patch will check and try reloading of 'head' and 'last' to ensure 'last' is behind 'head' at the time of object retrieving. Performance testings show the average impact is about 0.1% for X86_64 and 1.12% for ARM64. Here are the results: OS: Debian 10 X86_64, Linux 6.6rc HW: XEON 8336C x 2, 64 cores/128 threads, DDR4 3200MT/s 1T 2T 4T 8T 16T native: 49543304 99277826 199017659 399070324 795185848 objpool: 29909085 59865637 119692073 239750369 478005250 objpool+: 29879313 59230743 119609856 239067773 478509029 32T 48T 64T 96T 128T native: 1596927073 2390099988 2929397330 3183875848 3257546602 objpool: 957553042 1435814086 1680872925 2043126796 2165424198 objpool+: 956476281 1434491297 1666055740 2041556569 2157415622 OS: Debian 11 AARCH64, Linux 6.6rc HW: Kunpeng-920 96 cores/2 sockets/4 NUMA nodes, DDR4 2933 MT/s 1T 2T 4T 8T 16T native: 30890508 60399915 123111980 242257008 494002946 objpool: 14742531 28883047 57739948 115886644 232455421 objpool+: 14107220 29032998 57286084 113730493 232232850 24T 32T 48T 64T 96T native: 746406039 1000174750 1493236240 1998318364 2942911180 objpool: 349164852 467284332 702296756 934459713 1387898285 objpool+: 348388180 462750976 696606096 927865887 1368402195 Link: https://lore.kernel.org/all/20231114115148.298821-1-wuqiang.matt@bytedance.com/ Fixes: b4edb8d2d464 ("lib: objpool added: ring-array based lockless MPMC") Signed-off-by: wuqiang.matt Acked-by: Masami Hiramatsu (Google) Signed-off-by: Masami Hiramatsu (Google) --- lib/objpool.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/lib/objpool.c b/lib/objpool.c index ce0087f64400c..cfdc024208849 100644 --- a/lib/objpool.c +++ b/lib/objpool.c @@ -201,6 +201,23 @@ static inline void *objpool_try_get_slot(struct objpool_head *pool, int cpu) while (head != READ_ONCE(slot->last)) { void *obj; + /* + * data visibility of 'last' and 'head' could be out of + * order since memory updating of 'last' and 'head' are + * performed in push() and pop() independently + * + * before any retrieving attempts, pop() must guarantee + * 'last' is behind 'head', that is to say, there must + * be available objects in slot, which could be ensured + * by condition 'last != head && last - head <= nr_objs' + * that is equivalent to 'last - head - 1 < nr_objs' as + * 'last' and 'head' are both unsigned int32 + */ + if (READ_ONCE(slot->last) - head - 1 >= pool->nr_objs) { + head = READ_ONCE(slot->head); + continue; + } + /* obj must be retrieved before moving forward head */ obj = READ_ONCE(slot->entries[head & slot->mask]); From d839a656d0f3caca9f96e9bf912fd394ac6a11bc Mon Sep 17 00:00:00 2001 From: JP Kobryn Date: Fri, 1 Dec 2023 14:53:55 +0900 Subject: [PATCH 2/3] kprobes: consistent rcu api usage for kretprobe holder It seems that the pointer-to-kretprobe "rp" within the kretprobe_holder is RCU-managed, based on the (non-rethook) implementation of get_kretprobe(). The thought behind this patch is to make use of the RCU API where possible when accessing this pointer so that the needed barriers are always in place and to self-document the code. The __rcu annotation to "rp" allows for sparse RCU checking. Plain writes done to the "rp" pointer are changed to make use of the RCU macro for assignment. For the single read, the implementation of get_kretprobe() is simplified by making use of an RCU macro which accomplishes the same, but note that the log warning text will be more generic. I did find that there is a difference in assembly generated between the usage of the RCU macros vs without. For example, on arm64, when using rcu_assign_pointer(), the corresponding store instruction is a store-release (STLR) which has an implicit barrier. When normal assignment is done, a regular store (STR) is found. In the macro case, this seems to be a result of rcu_assign_pointer() using smp_store_release() when the value to write is not NULL. Link: https://lore.kernel.org/all/20231122132058.3359-1-inwardvessel@gmail.com/ Fixes: d741bf41d7c7 ("kprobes: Remove kretprobe hash") Cc: stable@vger.kernel.org Signed-off-by: JP Kobryn Acked-by: Masami Hiramatsu (Google) Signed-off-by: Masami Hiramatsu (Google) --- include/linux/kprobes.h | 7 ++----- kernel/kprobes.c | 4 ++-- 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index ab1da3142b06a..64672bace5609 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -139,7 +139,7 @@ static inline bool kprobe_ftrace(struct kprobe *p) * */ struct kretprobe_holder { - struct kretprobe *rp; + struct kretprobe __rcu *rp; struct objpool_head pool; }; @@ -245,10 +245,7 @@ unsigned long kretprobe_trampoline_handler(struct pt_regs *regs, static nokprobe_inline struct kretprobe *get_kretprobe(struct kretprobe_instance *ri) { - RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(), - "Kretprobe is accessed from instance under preemptive context"); - - return READ_ONCE(ri->rph->rp); + return rcu_dereference_check(ri->rph->rp, rcu_read_lock_any_held()); } static nokprobe_inline unsigned long get_kretprobe_retaddr(struct kretprobe_instance *ri) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 075a632e6c7c3..d5a0ee40bf66c 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -2252,7 +2252,7 @@ int register_kretprobe(struct kretprobe *rp) rp->rph = NULL; return -ENOMEM; } - rp->rph->rp = rp; + rcu_assign_pointer(rp->rph->rp, rp); rp->nmissed = 0; /* Establish function entry probe point */ ret = register_kprobe(&rp->kp); @@ -2300,7 +2300,7 @@ void unregister_kretprobes(struct kretprobe **rps, int num) #ifdef CONFIG_KRETPROBE_ON_RETHOOK rethook_free(rps[i]->rh); #else - rps[i]->rph->rp = NULL; + rcu_assign_pointer(rps[i]->rph->rp, NULL); #endif } mutex_unlock(&kprobe_mutex); From a1461f1fd6cfdc4b8917c9d4a91e92605d1f28dc Mon Sep 17 00:00:00 2001 From: "Masami Hiramatsu (Google)" Date: Fri, 1 Dec 2023 14:53:56 +0900 Subject: [PATCH 3/3] rethook: Use __rcu pointer for rethook::handler Since the rethook::handler is an RCU-maganged pointer so that it will notice readers the rethook is stopped (unregistered) or not, it should be an __rcu pointer and use appropriate functions to be accessed. This will use appropriate memory barrier when accessing it. OTOH, rethook::data is never changed, so we don't need to check it in get_kretprobe(). NOTE: To avoid sparse warning, rethook::handler is defined by a raw function pointer type with __rcu instead of rethook_handler_t. Link: https://lore.kernel.org/all/170126066201.398836.837498688669005979.stgit@devnote2/ Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook") Cc: stable@vger.kernel.org Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202311241808.rv9ceuAh-lkp@intel.com/ Tested-by: JP Kobryn Signed-off-by: Masami Hiramatsu (Google) --- include/linux/kprobes.h | 6 ++---- include/linux/rethook.h | 7 ++++++- kernel/trace/rethook.c | 23 ++++++++++++++--------- 3 files changed, 22 insertions(+), 14 deletions(-) diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 64672bace5609..0ff44d6633e33 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -197,10 +197,8 @@ extern int arch_trampoline_kprobe(struct kprobe *p); #ifdef CONFIG_KRETPROBE_ON_RETHOOK static nokprobe_inline struct kretprobe *get_kretprobe(struct kretprobe_instance *ri) { - RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(), - "Kretprobe is accessed from instance under preemptive context"); - - return (struct kretprobe *)READ_ONCE(ri->node.rethook->data); + /* rethook::data is non-changed field, so that you can access it freely. */ + return (struct kretprobe *)ri->node.rethook->data; } static nokprobe_inline unsigned long get_kretprobe_retaddr(struct kretprobe_instance *ri) { diff --git a/include/linux/rethook.h b/include/linux/rethook.h index ce69b2b7bc358..ba60962805f6d 100644 --- a/include/linux/rethook.h +++ b/include/linux/rethook.h @@ -28,7 +28,12 @@ typedef void (*rethook_handler_t) (struct rethook_node *, void *, unsigned long, */ struct rethook { void *data; - rethook_handler_t handler; + /* + * To avoid sparse warnings, this uses a raw function pointer with + * __rcu, instead of rethook_handler_t. But this must be same as + * rethook_handler_t. + */ + void (__rcu *handler) (struct rethook_node *, void *, unsigned long, struct pt_regs *); struct objpool_head pool; struct rcu_head rcu; }; diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c index 6fd7d4ecbbc67..fa03094e9e698 100644 --- a/kernel/trace/rethook.c +++ b/kernel/trace/rethook.c @@ -48,7 +48,7 @@ static void rethook_free_rcu(struct rcu_head *head) */ void rethook_stop(struct rethook *rh) { - WRITE_ONCE(rh->handler, NULL); + rcu_assign_pointer(rh->handler, NULL); } /** @@ -63,7 +63,7 @@ void rethook_stop(struct rethook *rh) */ void rethook_free(struct rethook *rh) { - WRITE_ONCE(rh->handler, NULL); + rethook_stop(rh); call_rcu(&rh->rcu, rethook_free_rcu); } @@ -82,6 +82,12 @@ static int rethook_fini_pool(struct objpool_head *head, void *context) return 0; } +static inline rethook_handler_t rethook_get_handler(struct rethook *rh) +{ + return (rethook_handler_t)rcu_dereference_check(rh->handler, + rcu_read_lock_any_held()); +} + /** * rethook_alloc() - Allocate struct rethook. * @data: a data to pass the @handler when hooking the return. @@ -107,7 +113,7 @@ struct rethook *rethook_alloc(void *data, rethook_handler_t handler, return ERR_PTR(-ENOMEM); rh->data = data; - rh->handler = handler; + rcu_assign_pointer(rh->handler, handler); /* initialize the objpool for rethook nodes */ if (objpool_init(&rh->pool, num, size, GFP_KERNEL, rh, @@ -135,9 +141,10 @@ static void free_rethook_node_rcu(struct rcu_head *head) */ void rethook_recycle(struct rethook_node *node) { - lockdep_assert_preemption_disabled(); + rethook_handler_t handler; - if (likely(READ_ONCE(node->rethook->handler))) + handler = rethook_get_handler(node->rethook); + if (likely(handler)) objpool_push(node, &node->rethook->pool); else call_rcu(&node->rcu, free_rethook_node_rcu); @@ -153,9 +160,7 @@ NOKPROBE_SYMBOL(rethook_recycle); */ struct rethook_node *rethook_try_get(struct rethook *rh) { - rethook_handler_t handler = READ_ONCE(rh->handler); - - lockdep_assert_preemption_disabled(); + rethook_handler_t handler = rethook_get_handler(rh); /* Check whether @rh is going to be freed. */ if (unlikely(!handler)) @@ -300,7 +305,7 @@ unsigned long rethook_trampoline_handler(struct pt_regs *regs, rhn = container_of(first, struct rethook_node, llist); if (WARN_ON_ONCE(rhn->frame != frame)) break; - handler = READ_ONCE(rhn->rethook->handler); + handler = rethook_get_handler(rhn->rethook); if (handler) handler(rhn, rhn->rethook->data, correct_ret_addr, regs);