From a2e5c6e8d884d0457fbd587a409ee97b9f840ec4 Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:20 +1100 Subject: [PATCH 01/58] sysctl: add a new register_sysctl_init() interface Patch series "sysctl: first set of kernel/sysctl cleanups", v2. Finally had time to respin the series of the work we had started last year on cleaning up the kernel/sysct.c kitchen sink. People keeps stuffing their sysctls in that file and this creates a maintenance burden. So this effort is aimed at placing sysctls where they actually belong. I'm going to split patches up into series as there is quite a bit of work. This first set adds register_sysctl_init() for uses of registerting a sysctl on the init path, adds const where missing to a few places, generalizes common values so to be more easy to share, and starts the move of a few kernel/sysctl.c out where they belong. The majority of rework on v2 in this first patch set is 0-day fixes. Eric W. Biederman's feedback is later addressed in subsequent patch sets. I'll only post the first two patch sets for now. We can address the rest once the first two patch sets get completely reviewed / Acked. This patch (of 9): The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. Today though folks heavily rely on tables on kernel/sysctl.c so they can easily just extend this table with their needed sysctls. In order to help users move their sysctls out we need to provide a helper which can be used during code initialization. We special-case the initialization use of register_sysctl() since it *is* safe to fail, given all that sysctls do is provide a dynamic interface to query or modify at runtime an existing variable. So the use case of register_sysctl() on init should *not* stop if the sysctls don't end up getting registered. It would be counter productive to stop boot if a simple sysctl registration failed. Provide a helper for init then, and document the recommended init levels to use for callers of this routine. We will later use this in subsequent patches to start slimming down kernel/sysctl.c tables and moving sysctl registration to the code which actually needs these sysctls. [mcgrof@kernel.org: major commit log and documentation rephrasing also moved to fs/proc/proc_sysctl.c ] Link: https://lkml.kernel.org/r/20211123202347.818157-1-mcgrof@kernel.org Link: https://lkml.kernel.org/r/20211123202347.818157-2-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Reviewed-by: Kees Cook Cc: Iurii Zaikin Cc: "Eric W. Biederman" Cc: Peter Zijlstra Cc: Greg Kroah-Hartman Cc: Paul Turner Cc: Andy Shevchenko Cc: Sebastian Reichel Cc: Tetsuo Handa Cc: Petr Mladek Cc: Sergey Senozhatsky Cc: Qing Wang Cc: Benjamin LaHaise Cc: Al Viro Cc: Jan Kara Cc: Amir Goldstein Cc: Stephen Kitt Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Clemens Ladisch Cc: David Airlie Cc: Jani Nikula Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Phillip Potter Cc: Rodrigo Vivi Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/proc/proc_sysctl.c | 33 +++++++++++++++++++++++++++++++++ include/linux/sysctl.h | 3 +++ 2 files changed, 36 insertions(+) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 5d66faecd4ef0..b4950843d90a9 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "internal.h" static const struct dentry_operations proc_sys_dentry_operations; @@ -1384,6 +1385,38 @@ struct ctl_table_header *register_sysctl(const char *path, struct ctl_table *tab } EXPORT_SYMBOL(register_sysctl); +/** + * __register_sysctl_init() - register sysctl table to path + * @path: path name for sysctl base + * @table: This is the sysctl table that needs to be registered to the path + * @table_name: The name of sysctl table, only used for log printing when + * registration fails + * + * The sysctl interface is used by userspace to query or modify at runtime + * a predefined value set on a variable. These variables however have default + * values pre-set. Code which depends on these variables will always work even + * if register_sysctl() fails. If register_sysctl() fails you'd just loose the + * ability to query or modify the sysctls dynamically at run time. Chances of + * register_sysctl() failing on init are extremely low, and so for both reasons + * this function does not return any error as it is used by initialization code. + * + * Context: Can only be called after your respective sysctl base path has been + * registered. So for instance, most base directories are registered early on + * init before init levels are processed through proc_sys_init() and + * sysctl_init(). + */ +void __init __register_sysctl_init(const char *path, struct ctl_table *table, + const char *table_name) +{ + struct ctl_table_header *hdr = register_sysctl(path, table); + + if (unlikely(!hdr)) { + pr_err("failed when register_sysctl %s to %s\n", table_name, path); + return; + } + kmemleak_not_leak(hdr); +} + static char *append_path(const char *path, char *pos, const char *name) { int namelen; diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 1fa2b69c6fc3d..d3ab7969b6b50 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -199,6 +199,9 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path, void unregister_sysctl_table(struct ctl_table_header * table); extern int sysctl_init(void); +extern void __register_sysctl_init(const char *path, struct ctl_table *table, + const char *table_name); +#define register_sysctl_init(path, table) __register_sysctl_init(path, table, #table) void do_sysctl_args(void); extern int pwrsw_enabled; From 0cfcc9dc5ad8fe4e916a6e544edc962e260429f3 Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:21 +1100 Subject: [PATCH 02/58] sysctl: move some boundary constants from sysctl.c to sysctl_vals sysctl has helpers which let us specify boundary values for a min or max int value. Since these are used for a boundary check only they don't change, so move these variables to sysctl_vals to avoid adding duplicate variables. This will help with our cleanup of kernel/sysctl.c. [mcgrof@kernel.org: major rebase] Link: https://lkml.kernel.org/r/20211123202347.818157-3-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Reviewed-by: Kees Cook Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Benjamin LaHaise Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Jan Kara Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Qing Wang Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Clemens Ladisch Cc: David Airlie Cc: Jani Nikula Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Phillip Potter Cc: Rodrigo Vivi Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/proc/proc_sysctl.c | 2 +- include/linux/sysctl.h | 12 +++++++++--- kernel/sysctl.c | 42 ++++++++++++++++++------------------------ 3 files changed, 28 insertions(+), 28 deletions(-) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index b4950843d90a9..6d462644bb003 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -26,7 +26,7 @@ static const struct file_operations proc_sys_dir_file_operations; static const struct inode_operations proc_sys_dir_operations; /* shared constants to be used in various sysctls */ -const int sysctl_vals[] = { 0, 1, INT_MAX }; +const int sysctl_vals[] = { -1, 0, 1, 2, 4, 100, 200, 1000, INT_MAX }; EXPORT_SYMBOL(sysctl_vals); /* Support for permanently empty directories */ diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index d3ab7969b6b50..718492057c70c 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -38,9 +38,15 @@ struct ctl_table_header; struct ctl_dir; /* Keep the same order as in fs/proc/proc_sysctl.c */ -#define SYSCTL_ZERO ((void *)&sysctl_vals[0]) -#define SYSCTL_ONE ((void *)&sysctl_vals[1]) -#define SYSCTL_INT_MAX ((void *)&sysctl_vals[2]) +#define SYSCTL_NEG_ONE ((void *)&sysctl_vals[0]) +#define SYSCTL_ZERO ((void *)&sysctl_vals[1]) +#define SYSCTL_ONE ((void *)&sysctl_vals[2]) +#define SYSCTL_TWO ((void *)&sysctl_vals[3]) +#define SYSCTL_FOUR ((void *)&sysctl_vals[4]) +#define SYSCTL_ONE_HUNDRED ((void *)&sysctl_vals[5]) +#define SYSCTL_TWO_HUNDRED ((void *)&sysctl_vals[6]) +#define SYSCTL_ONE_THOUSAND ((void *)&sysctl_vals[7]) +#define SYSCTL_INT_MAX ((void *)&sysctl_vals[8]) extern const int sysctl_vals[]; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 66d6d1763e3f6..98482599b4796 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -113,15 +113,9 @@ static int sixty = 60; #endif -static int __maybe_unused neg_one = -1; -static int __maybe_unused two = 2; -static int __maybe_unused four = 4; static unsigned long zero_ul; static unsigned long one_ul = 1; static unsigned long long_max = LONG_MAX; -static int one_hundred = 100; -static int two_hundred = 200; -static int one_thousand = 1000; static int three_thousand = 3000; #ifdef CONFIG_PRINTK static int ten_thousand = 10000; @@ -1963,7 +1957,7 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, - .extra1 = &neg_one, + .extra1 = SYSCTL_NEG_ONE, .extra2 = SYSCTL_ONE, }, #endif @@ -2305,7 +2299,7 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax_sysadmin, .extra1 = SYSCTL_ZERO, - .extra2 = &two, + .extra2 = SYSCTL_TWO, }, #endif { @@ -2565,7 +2559,7 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, - .extra1 = &neg_one, + .extra1 = SYSCTL_NEG_ONE, }, #endif #ifdef CONFIG_RT_MUTEXES @@ -2627,7 +2621,7 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = perf_cpu_time_max_percent_handler, .extra1 = SYSCTL_ZERO, - .extra2 = &one_hundred, + .extra2 = SYSCTL_ONE_HUNDRED, }, { .procname = "perf_event_max_stack", @@ -2645,7 +2639,7 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = perf_event_max_stack_handler, .extra1 = SYSCTL_ZERO, - .extra2 = &one_thousand, + .extra2 = SYSCTL_ONE_THOUSAND, }, #endif { @@ -2676,7 +2670,7 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = bpf_unpriv_handler, .extra1 = SYSCTL_ZERO, - .extra2 = &two, + .extra2 = SYSCTL_TWO, }, { .procname = "bpf_stats_enabled", @@ -2730,7 +2724,7 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = overcommit_policy_handler, .extra1 = SYSCTL_ZERO, - .extra2 = &two, + .extra2 = SYSCTL_TWO, }, { .procname = "panic_on_oom", @@ -2739,7 +2733,7 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, - .extra2 = &two, + .extra2 = SYSCTL_TWO, }, { .procname = "oom_kill_allocating_task", @@ -2784,7 +2778,7 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = dirty_background_ratio_handler, .extra1 = SYSCTL_ZERO, - .extra2 = &one_hundred, + .extra2 = SYSCTL_ONE_HUNDRED, }, { .procname = "dirty_background_bytes", @@ -2801,7 +2795,7 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = dirty_ratio_handler, .extra1 = SYSCTL_ZERO, - .extra2 = &one_hundred, + .extra2 = SYSCTL_ONE_HUNDRED, }, { .procname = "dirty_bytes", @@ -2841,7 +2835,7 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, - .extra2 = &two_hundred, + .extra2 = SYSCTL_TWO_HUNDRED, }, #ifdef CONFIG_HUGETLB_PAGE { @@ -2898,7 +2892,7 @@ static struct ctl_table vm_table[] = { .mode = 0200, .proc_handler = drop_caches_sysctl_handler, .extra1 = SYSCTL_ONE, - .extra2 = &four, + .extra2 = SYSCTL_FOUR, }, #ifdef CONFIG_COMPACTION { @@ -2915,7 +2909,7 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = compaction_proactiveness_sysctl_handler, .extra1 = SYSCTL_ZERO, - .extra2 = &one_hundred, + .extra2 = SYSCTL_ONE_HUNDRED, }, { .procname = "extfrag_threshold", @@ -3039,7 +3033,7 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = sysctl_min_unmapped_ratio_sysctl_handler, .extra1 = SYSCTL_ZERO, - .extra2 = &one_hundred, + .extra2 = SYSCTL_ONE_HUNDRED, }, { .procname = "min_slab_ratio", @@ -3048,7 +3042,7 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = sysctl_min_slab_ratio_sysctl_handler, .extra1 = SYSCTL_ZERO, - .extra2 = &one_hundred, + .extra2 = SYSCTL_ONE_HUNDRED, }, #endif #ifdef CONFIG_SMP @@ -3338,7 +3332,7 @@ static struct ctl_table fs_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, - .extra2 = &two, + .extra2 = SYSCTL_TWO, }, { .procname = "protected_regular", @@ -3347,7 +3341,7 @@ static struct ctl_table fs_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, - .extra2 = &two, + .extra2 = SYSCTL_TWO, }, { .procname = "suid_dumpable", @@ -3356,7 +3350,7 @@ static struct ctl_table fs_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax_coredump, .extra1 = SYSCTL_ZERO, - .extra2 = &two, + .extra2 = SYSCTL_TWO, }, #if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE) { From 75d615d845ec010d48e4711c230727dd970b2d38 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Fri, 3 Dec 2021 12:58:22 +1100 Subject: [PATCH 03/58] sysctl-move-some-boundary-constants-from-sysctlc-to-sysctl_vals-fix update it for "mm/pagealloc: sysctl: change watermark_scale_factor max limit to 30%". Cc: Xiaoming Ni Cc: Suren Baghdasaryan Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: Douglas Gilbert Cc: Eric Biederman Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: James E.J. Bottomley Cc: Jani Nikula Cc: Jani Nikula Cc: Jan Kara Cc: Joel Becker Cc: John Ogness Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Luis Chamberlain Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Martin K. Petersen Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: "Rafael J. Wysocki" Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Steven Rostedt (VMware) Cc: Tetsuo Handa Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/proc/proc_sysctl.c | 2 +- include/linux/sysctl.h | 3 ++- kernel/sysctl.c | 3 +-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 6d462644bb003..0b7a00ed6c49b 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -26,7 +26,7 @@ static const struct file_operations proc_sys_dir_file_operations; static const struct inode_operations proc_sys_dir_operations; /* shared constants to be used in various sysctls */ -const int sysctl_vals[] = { -1, 0, 1, 2, 4, 100, 200, 1000, INT_MAX }; +const int sysctl_vals[] = { -1, 0, 1, 2, 4, 100, 200, 1000, 3000, INT_MAX }; EXPORT_SYMBOL(sysctl_vals); /* Support for permanently empty directories */ diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 718492057c70c..47cf70c8eb93c 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -46,7 +46,8 @@ struct ctl_dir; #define SYSCTL_ONE_HUNDRED ((void *)&sysctl_vals[5]) #define SYSCTL_TWO_HUNDRED ((void *)&sysctl_vals[6]) #define SYSCTL_ONE_THOUSAND ((void *)&sysctl_vals[7]) -#define SYSCTL_INT_MAX ((void *)&sysctl_vals[8]) +#define SYSCTL_THREE_THOUSAND ((void *)&sysctl_vals[8]) +#define SYSCTL_INT_MAX ((void *)&sysctl_vals[9]) extern const int sysctl_vals[]; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 98482599b4796..f734d335c75a7 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -116,7 +116,6 @@ static int sixty = 60; static unsigned long zero_ul; static unsigned long one_ul = 1; static unsigned long long_max = LONG_MAX; -static int three_thousand = 3000; #ifdef CONFIG_PRINTK static int ten_thousand = 10000; #endif @@ -2954,7 +2953,7 @@ static struct ctl_table vm_table[] = { .mode = 0644, .proc_handler = watermark_scale_factor_sysctl_handler, .extra1 = SYSCTL_ONE, - .extra2 = &three_thousand, + .extra2 = SYSCTL_THREE_THOUSAND, }, { .procname = "percpu_pagelist_high_fraction", From d3c2dee610e34773a73186c5a495ead6f7faf298 Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:23 +1100 Subject: [PATCH 04/58] hung_task: move hung_task sysctl interface to hung_task.c The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move hung_task sysctl interface to hung_task.c and use register_sysctl() to register the sysctl interface. [mcgrof@kernel.org: commit log refresh and fixed 2-3 0day reported compile issues] Link: https://lkml.kernel.org/r/20211123202347.818157-4-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Reviewed-by: Kees Cook Reviewed-by: Petr Mladek Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Benjamin LaHaise Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Jan Kara Cc: Paul Turner Cc: Peter Zijlstra Cc: Qing Wang Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Clemens Ladisch Cc: David Airlie Cc: Jani Nikula Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Phillip Potter Cc: Rodrigo Vivi Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- include/linux/sched/sysctl.h | 14 +------ kernel/hung_task.c | 81 ++++++++++++++++++++++++++++++++++-- kernel/sysctl.c | 61 --------------------------- 3 files changed, 79 insertions(+), 77 deletions(-) diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 304f431178fd9..c19dd5a2c05c6 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -7,20 +7,8 @@ struct ctl_table; #ifdef CONFIG_DETECT_HUNG_TASK - -#ifdef CONFIG_SMP -extern unsigned int sysctl_hung_task_all_cpu_backtrace; -#else -#define sysctl_hung_task_all_cpu_backtrace 0 -#endif /* CONFIG_SMP */ - -extern int sysctl_hung_task_check_count; -extern unsigned int sysctl_hung_task_panic; +/* used for hung_task and block/ */ extern unsigned long sysctl_hung_task_timeout_secs; -extern unsigned long sysctl_hung_task_check_interval_secs; -extern int sysctl_hung_task_warnings; -int proc_dohung_task_timeout_secs(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos); #else /* Avoid need for ifdefs elsewhere in the code */ enum { sysctl_hung_task_timeout_secs = 0 }; diff --git a/kernel/hung_task.c b/kernel/hung_task.c index 8cc07e7f29aa1..40220dfd6fa93 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -63,7 +63,9 @@ static struct task_struct *watchdog_task; * Should we dump all CPUs backtraces in a hung task event? * Defaults to 0, can be changed via sysctl. */ -unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace; +static unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace; +#else +#define sysctl_hung_task_all_cpu_backtrace 0 #endif /* CONFIG_SMP */ /* @@ -266,11 +268,13 @@ static long hung_timeout_jiffies(unsigned long last_checked, MAX_SCHEDULE_TIMEOUT; } +#ifdef CONFIG_SYSCTL /* * Process updating of timeout sysctl */ -int proc_dohung_task_timeout_secs(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) +static int proc_dohung_task_timeout_secs(struct ctl_table *table, int write, + void __user *buffer, + size_t *lenp, loff_t *ppos) { int ret; @@ -285,6 +289,76 @@ int proc_dohung_task_timeout_secs(struct ctl_table *table, int write, return ret; } +/* + * This is needed for proc_doulongvec_minmax of sysctl_hung_task_timeout_secs + * and hung_task_check_interval_secs + */ +static const unsigned long hung_task_timeout_max = (LONG_MAX / HZ); +static struct ctl_table hung_task_sysctls[] = { +#ifdef CONFIG_SMP + { + .procname = "hung_task_all_cpu_backtrace", + .data = &sysctl_hung_task_all_cpu_backtrace, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, +#endif /* CONFIG_SMP */ + { + .procname = "hung_task_panic", + .data = &sysctl_hung_task_panic, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + { + .procname = "hung_task_check_count", + .data = &sysctl_hung_task_check_count, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + }, + { + .procname = "hung_task_timeout_secs", + .data = &sysctl_hung_task_timeout_secs, + .maxlen = sizeof(unsigned long), + .mode = 0644, + .proc_handler = proc_dohung_task_timeout_secs, + .extra2 = (void *)&hung_task_timeout_max, + }, + { + .procname = "hung_task_check_interval_secs", + .data = &sysctl_hung_task_check_interval_secs, + .maxlen = sizeof(unsigned long), + .mode = 0644, + .proc_handler = proc_dohung_task_timeout_secs, + .extra2 = (void *)&hung_task_timeout_max, + }, + { + .procname = "hung_task_warnings", + .data = &sysctl_hung_task_warnings, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_NEG_ONE, + }, + {} +}; + +static void __init hung_task_sysctl_init(void) +{ + register_sysctl_init("kernel", hung_task_sysctls); +} +#else +#define hung_task_sysctl_init() do { } while (0) +#endif /* CONFIG_SYSCTL */ + + static atomic_t reset_hung_task = ATOMIC_INIT(0); void reset_hung_task_detector(void) @@ -354,6 +428,7 @@ static int __init hung_task_init(void) pm_notifier(hungtask_pm_notify, 0); watchdog_task = kthread_run(watchdog, NULL, "khungtaskd"); + hung_task_sysctl_init(); return 0; } diff --git a/kernel/sysctl.c b/kernel/sysctl.c index f734d335c75a7..69a0ab1bb7f54 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -133,13 +133,6 @@ static int minolduid; static int ngroups_max = NGROUPS_MAX; static const int cap_last_cap = CAP_LAST_CAP; -/* - * This is needed for proc_doulongvec_minmax of sysctl_hung_task_timeout_secs - * and hung_task_check_interval_secs - */ -#ifdef CONFIG_DETECT_HUNG_TASK -static unsigned long hung_task_timeout_max = (LONG_MAX/HZ); -#endif #ifdef CONFIG_INOTIFY_USER #include @@ -2507,60 +2500,6 @@ static struct ctl_table kern_table[] = { .proc_handler = proc_dointvec, }, #endif -#ifdef CONFIG_DETECT_HUNG_TASK -#ifdef CONFIG_SMP - { - .procname = "hung_task_all_cpu_backtrace", - .data = &sysctl_hung_task_all_cpu_backtrace, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, -#endif /* CONFIG_SMP */ - { - .procname = "hung_task_panic", - .data = &sysctl_hung_task_panic, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, - { - .procname = "hung_task_check_count", - .data = &sysctl_hung_task_check_count, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - }, - { - .procname = "hung_task_timeout_secs", - .data = &sysctl_hung_task_timeout_secs, - .maxlen = sizeof(unsigned long), - .mode = 0644, - .proc_handler = proc_dohung_task_timeout_secs, - .extra2 = &hung_task_timeout_max, - }, - { - .procname = "hung_task_check_interval_secs", - .data = &sysctl_hung_task_check_interval_secs, - .maxlen = sizeof(unsigned long), - .mode = 0644, - .proc_handler = proc_dohung_task_timeout_secs, - .extra2 = &hung_task_timeout_max, - }, - { - .procname = "hung_task_warnings", - .data = &sysctl_hung_task_warnings, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_NEG_ONE, - }, -#endif #ifdef CONFIG_RT_MUTEXES { .procname = "max_lock_depth", From 42592625033d3973a6581d15fa87b0a0ce05c1e5 Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:24 +1100 Subject: [PATCH 05/58] watchdog: move watchdog sysctl interface to watchdog.c The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic of proc sysctl. So, move the watchdog syscl interface to watchdog.c. Use register_sysctl() to register the sysctl interface to avoid merge conflicts when different features modify sysctl.c at the same time. [mcgrof@kernel.org: justify the move on the commit log] Link: https://lkml.kernel.org/r/20211123202347.818157-5-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Reviewed-by: Kees Cook Reviewed-by: Petr Mladek Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Benjamin LaHaise Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Jan Kara Cc: Paul Turner Cc: Peter Zijlstra Cc: Qing Wang Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Clemens Ladisch Cc: David Airlie Cc: Jani Nikula Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Phillip Potter Cc: Rodrigo Vivi Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- kernel/sysctl.c | 96 ------------------------------------------- kernel/watchdog.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 101 insertions(+), 96 deletions(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 69a0ab1bb7f54..87410bb0fd1af 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -102,16 +102,10 @@ #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE #include #endif -#ifdef CONFIG_LOCKUP_DETECTOR -#include -#endif #if defined(CONFIG_SYSCTL) /* Constants used for minimum and maximum */ -#ifdef CONFIG_LOCKUP_DETECTOR -static int sixty = 60; -#endif static unsigned long zero_ul; static unsigned long one_ul = 1; @@ -2308,96 +2302,6 @@ static struct ctl_table kern_table[] = { .mode = 0444, .proc_handler = proc_dointvec, }, -#if defined(CONFIG_LOCKUP_DETECTOR) - { - .procname = "watchdog", - .data = &watchdog_user_enabled, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_watchdog, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, - { - .procname = "watchdog_thresh", - .data = &watchdog_thresh, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_watchdog_thresh, - .extra1 = SYSCTL_ZERO, - .extra2 = &sixty, - }, - { - .procname = "nmi_watchdog", - .data = &nmi_watchdog_user_enabled, - .maxlen = sizeof(int), - .mode = NMI_WATCHDOG_SYSCTL_PERM, - .proc_handler = proc_nmi_watchdog, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, - { - .procname = "watchdog_cpumask", - .data = &watchdog_cpumask_bits, - .maxlen = NR_CPUS, - .mode = 0644, - .proc_handler = proc_watchdog_cpumask, - }, -#ifdef CONFIG_SOFTLOCKUP_DETECTOR - { - .procname = "soft_watchdog", - .data = &soft_watchdog_user_enabled, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_soft_watchdog, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, - { - .procname = "softlockup_panic", - .data = &softlockup_panic, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, -#ifdef CONFIG_SMP - { - .procname = "softlockup_all_cpu_backtrace", - .data = &sysctl_softlockup_all_cpu_backtrace, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, -#endif /* CONFIG_SMP */ -#endif -#ifdef CONFIG_HARDLOCKUP_DETECTOR - { - .procname = "hardlockup_panic", - .data = &hardlockup_panic, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, -#ifdef CONFIG_SMP - { - .procname = "hardlockup_all_cpu_backtrace", - .data = &sysctl_hardlockup_all_cpu_backtrace, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, -#endif /* CONFIG_SMP */ -#endif -#endif - #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) { .procname = "unknown_nmi_panic", diff --git a/kernel/watchdog.c b/kernel/watchdog.c index ad912511a0c08..99afb88d2e85a 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -740,6 +740,106 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write, mutex_unlock(&watchdog_mutex); return err; } + +static const int sixty = 60; + +static struct ctl_table watchdog_sysctls[] = { + { + .procname = "watchdog", + .data = &watchdog_user_enabled, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_watchdog, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + { + .procname = "watchdog_thresh", + .data = &watchdog_thresh, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_watchdog_thresh, + .extra1 = SYSCTL_ZERO, + .extra2 = (void *)&sixty, + }, + { + .procname = "nmi_watchdog", + .data = &nmi_watchdog_user_enabled, + .maxlen = sizeof(int), + .mode = NMI_WATCHDOG_SYSCTL_PERM, + .proc_handler = proc_nmi_watchdog, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + { + .procname = "watchdog_cpumask", + .data = &watchdog_cpumask_bits, + .maxlen = NR_CPUS, + .mode = 0644, + .proc_handler = proc_watchdog_cpumask, + }, +#ifdef CONFIG_SOFTLOCKUP_DETECTOR + { + .procname = "soft_watchdog", + .data = &soft_watchdog_user_enabled, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_soft_watchdog, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + { + .procname = "softlockup_panic", + .data = &softlockup_panic, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, +#ifdef CONFIG_SMP + { + .procname = "softlockup_all_cpu_backtrace", + .data = &sysctl_softlockup_all_cpu_backtrace, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, +#endif /* CONFIG_SMP */ +#endif +#ifdef CONFIG_HARDLOCKUP_DETECTOR + { + .procname = "hardlockup_panic", + .data = &hardlockup_panic, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, +#ifdef CONFIG_SMP + { + .procname = "hardlockup_all_cpu_backtrace", + .data = &sysctl_hardlockup_all_cpu_backtrace, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, +#endif /* CONFIG_SMP */ +#endif + {} +}; + +static void __init watchdog_sysctl_init(void) +{ + register_sysctl_init("kernel", watchdog_sysctls); +} +#else +#define watchdog_sysctl_init() do { } while (0) #endif /* CONFIG_SYSCTL */ void __init lockup_detector_init(void) @@ -753,4 +853,5 @@ void __init lockup_detector_init(void) if (!watchdog_nmi_probe()) nmi_watchdog_available = true; lockup_detector_setup(); + watchdog_sysctl_init(); } From 3623c31fcd71acad5f7bcf51cd13f3bbef43213c Mon Sep 17 00:00:00 2001 From: Stephen Kitt Date: Fri, 3 Dec 2021 12:58:24 +1100 Subject: [PATCH 06/58] sysctl: make ngroups_max const ngroups_max is a read-only sysctl entry, reflecting NGROUPS_MAX. Make it const, in the same way as cap_last_cap. Link: https://lkml.kernel.org/r/20211123202347.818157-6-mcgrof@kernel.org Signed-off-by: Stephen Kitt Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Benjamin LaHaise Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Jan Kara Cc: Kees Cook Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Qing Wang Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Tetsuo Handa Cc: Xiaoming Ni Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Clemens Ladisch Cc: David Airlie Cc: Jani Nikula Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Phillip Potter Cc: Rodrigo Vivi Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- kernel/sysctl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 87410bb0fd1af..000d326a8905d 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -124,7 +124,7 @@ static unsigned long dirty_bytes_min = 2 * PAGE_SIZE; static int maxolduid = 65535; static int minolduid; -static int ngroups_max = NGROUPS_MAX; +static const int ngroups_max = NGROUPS_MAX; static const int cap_last_cap = CAP_LAST_CAP; @@ -2290,7 +2290,7 @@ static struct ctl_table kern_table[] = { #endif { .procname = "ngroups_max", - .data = &ngroups_max, + .data = (void *)&ngroups_max, .maxlen = sizeof (int), .mode = 0444, .proc_handler = proc_dointvec, From 9407c265a592082e9321e32a425e27d290f9dd2a Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:25 +1100 Subject: [PATCH 07/58] sysctl: use const for typically used max/min proc sysctls When proc_dointvec_minmax() or proc_doulongvec_minmax() are used we are using the extra1 and extra2 parameters on the sysctl table only for a min and max boundary, these extra1 and extra2 arguments are then used for read-only operations. So make them const to reflect this. [mcgrof@kernel.org: commit log love] Link: https://lkml.kernel.org/r/20211123202347.818157-7-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Benjamin LaHaise Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Jan Kara Cc: Kees Cook Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Qing Wang Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Clemens Ladisch Cc: David Airlie Cc: Jani Nikula Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Phillip Potter Cc: Rodrigo Vivi Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- kernel/sysctl.c | 53 ++++++++++++++++++++++++------------------------- 1 file changed, 26 insertions(+), 27 deletions(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 000d326a8905d..bfc058577bfdf 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -107,27 +107,26 @@ /* Constants used for minimum and maximum */ -static unsigned long zero_ul; -static unsigned long one_ul = 1; -static unsigned long long_max = LONG_MAX; +static const unsigned long zero_ul; +static const unsigned long one_ul = 1; +static const unsigned long long_max = LONG_MAX; #ifdef CONFIG_PRINTK -static int ten_thousand = 10000; +static const int ten_thousand = 10000; #endif #ifdef CONFIG_PERF_EVENTS -static int six_hundred_forty_kb = 640 * 1024; +static const int six_hundred_forty_kb = 640 * 1024; #endif /* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */ -static unsigned long dirty_bytes_min = 2 * PAGE_SIZE; +static const unsigned long dirty_bytes_min = 2 * PAGE_SIZE; /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ -static int maxolduid = 65535; -static int minolduid; +static const int maxolduid = 65535; +static const int minolduid; static const int ngroups_max = NGROUPS_MAX; static const int cap_last_cap = CAP_LAST_CAP; - #ifdef CONFIG_INOTIFY_USER #include #endif @@ -171,8 +170,8 @@ int sysctl_legacy_va_layout; #endif #ifdef CONFIG_COMPACTION -static int min_extfrag_threshold; -static int max_extfrag_threshold = 1000; +static const int min_extfrag_threshold; +static const int max_extfrag_threshold = 1000; #endif #endif /* CONFIG_SYSCTL */ @@ -2176,8 +2175,8 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, - .extra1 = &minolduid, - .extra2 = &maxolduid, + .extra1 = (void *)&minolduid, + .extra2 = (void *)&maxolduid, }, { .procname = "overflowgid", @@ -2185,8 +2184,8 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, - .extra1 = &minolduid, - .extra2 = &maxolduid, + .extra1 = (void *)&minolduid, + .extra2 = (void *)&maxolduid, }, #ifdef CONFIG_S390 { @@ -2260,7 +2259,7 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, - .extra2 = &ten_thousand, + .extra2 = (void *)&ten_thousand, }, { .procname = "printk_devkmsg", @@ -2472,7 +2471,7 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = perf_event_max_stack_handler, .extra1 = SYSCTL_ZERO, - .extra2 = &six_hundred_forty_kb, + .extra2 = (void *)&six_hundred_forty_kb, }, { .procname = "perf_event_max_contexts_per_stack", @@ -2628,7 +2627,7 @@ static struct ctl_table vm_table[] = { .maxlen = sizeof(dirty_background_bytes), .mode = 0644, .proc_handler = dirty_background_bytes_handler, - .extra1 = &one_ul, + .extra1 = (void *)&one_ul, }, { .procname = "dirty_ratio", @@ -2645,7 +2644,7 @@ static struct ctl_table vm_table[] = { .maxlen = sizeof(vm_dirty_bytes), .mode = 0644, .proc_handler = dirty_bytes_handler, - .extra1 = &dirty_bytes_min, + .extra1 = (void *)&dirty_bytes_min, }, { .procname = "dirty_writeback_centisecs", @@ -2759,8 +2758,8 @@ static struct ctl_table vm_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, - .extra1 = &min_extfrag_threshold, - .extra2 = &max_extfrag_threshold, + .extra1 = (void *)&min_extfrag_threshold, + .extra2 = (void *)&max_extfrag_threshold, }, { .procname = "compact_unevictable_allowed", @@ -3046,8 +3045,8 @@ static struct ctl_table fs_table[] = { .maxlen = sizeof(files_stat.max_files), .mode = 0644, .proc_handler = proc_doulongvec_minmax, - .extra1 = &zero_ul, - .extra2 = &long_max, + .extra1 = (void *)&zero_ul, + .extra2 = (void *)&long_max, }, { .procname = "nr_open", @@ -3071,8 +3070,8 @@ static struct ctl_table fs_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, - .extra1 = &minolduid, - .extra2 = &maxolduid, + .extra1 = (void *)&minolduid, + .extra2 = (void *)&maxolduid, }, { .procname = "overflowgid", @@ -3080,8 +3079,8 @@ static struct ctl_table fs_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, - .extra1 = &minolduid, - .extra2 = &maxolduid, + .extra1 = (void *)&minolduid, + .extra2 = (void *)&maxolduid, }, #ifdef CONFIG_FILE_LOCKING { From 9acfc44aaef5939734fda88e741da1383743a53f Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:26 +1100 Subject: [PATCH 08/58] sysctl: use SYSCTL_ZERO to replace some static int zero uses Use the variable SYSCTL_ZERO to replace some static int boundary variables with a value of 0 (minolduid, min_extfrag_threshold, min_wakeup_granularity_ns). Link: https://lkml.kernel.org/r/20211123202347.818157-8-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Benjamin LaHaise Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Jan Kara Cc: Kees Cook Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Qing Wang Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Clemens Ladisch Cc: David Airlie Cc: Jani Nikula Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Phillip Potter Cc: Rodrigo Vivi Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- kernel/sysctl.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index bfc058577bfdf..4d7e35ebca8c7 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -122,7 +122,7 @@ static const unsigned long dirty_bytes_min = 2 * PAGE_SIZE; /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ static const int maxolduid = 65535; -static const int minolduid; +/* minolduid is SYSCTL_ZERO */ static const int ngroups_max = NGROUPS_MAX; static const int cap_last_cap = CAP_LAST_CAP; @@ -170,7 +170,7 @@ int sysctl_legacy_va_layout; #endif #ifdef CONFIG_COMPACTION -static const int min_extfrag_threshold; +/* min_extfrag_threshold is SYSCTL_ZERO */; static const int max_extfrag_threshold = 1000; #endif @@ -2175,7 +2175,7 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, - .extra1 = (void *)&minolduid, + .extra1 = SYSCTL_ZERO, .extra2 = (void *)&maxolduid, }, { @@ -2184,7 +2184,7 @@ static struct ctl_table kern_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, - .extra1 = (void *)&minolduid, + .extra1 = SYSCTL_ZERO, .extra2 = (void *)&maxolduid, }, #ifdef CONFIG_S390 @@ -2758,7 +2758,7 @@ static struct ctl_table vm_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, - .extra1 = (void *)&min_extfrag_threshold, + .extra1 = SYSCTL_ZERO, .extra2 = (void *)&max_extfrag_threshold, }, { @@ -3070,7 +3070,7 @@ static struct ctl_table fs_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, - .extra1 = (void *)&minolduid, + .extra1 = SYSCTL_ZERO, .extra2 = (void *)&maxolduid, }, { @@ -3079,7 +3079,7 @@ static struct ctl_table fs_table[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec_minmax, - .extra1 = (void *)&minolduid, + .extra1 = SYSCTL_ZERO, .extra2 = (void *)&maxolduid, }, #ifdef CONFIG_FILE_LOCKING From 50004fa52f6bc5d8b6c9c8bd90cb2619c7948578 Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:27 +1100 Subject: [PATCH 09/58] aio: move aio sysctl to aio.c The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. Move aio sysctl to aio.c and use the new register_sysctl_init() to register the sysctl interface for aio. [mcgrof@kernel.org: adjust commit log to justify the move] Link: https://lkml.kernel.org/r/20211123202347.818157-9-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Reviewed-by: Jan Kara Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Benjamin LaHaise Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Kees Cook Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Qing Wang Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Clemens Ladisch Cc: David Airlie Cc: Jani Nikula Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Phillip Potter Cc: Rodrigo Vivi Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/aio.c | 31 +++++++++++++++++++++++++++++-- include/linux/aio.h | 4 ---- kernel/sysctl.c | 17 ----------------- 3 files changed, 29 insertions(+), 23 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 9c81cf611d659..83ef2341e73fc 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -219,9 +219,35 @@ struct aio_kiocb { /*------ sysctl variables----*/ static DEFINE_SPINLOCK(aio_nr_lock); -unsigned long aio_nr; /* current system wide number of aio requests */ -unsigned long aio_max_nr = 0x10000; /* system wide maximum number of aio requests */ +static unsigned long aio_nr; /* current system wide number of aio requests */ +static unsigned long aio_max_nr = 0x10000; /* system wide maximum number of aio requests */ /*----end sysctl variables---*/ +#ifdef CONFIG_SYSCTL +static struct ctl_table aio_sysctls[] = { + { + .procname = "aio-nr", + .data = &aio_nr, + .maxlen = sizeof(aio_nr), + .mode = 0444, + .proc_handler = proc_doulongvec_minmax, + }, + { + .procname = "aio-max-nr", + .data = &aio_max_nr, + .maxlen = sizeof(aio_max_nr), + .mode = 0644, + .proc_handler = proc_doulongvec_minmax, + }, + {} +}; + +static void __init aio_sysctl_init(void) +{ + register_sysctl_init("fs", aio_sysctls); +} +#else +#define aio_sysctl_init() do { } while (0) +#endif static struct kmem_cache *kiocb_cachep; static struct kmem_cache *kioctx_cachep; @@ -274,6 +300,7 @@ static int __init aio_setup(void) kiocb_cachep = KMEM_CACHE(aio_kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC); kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC); + aio_sysctl_init(); return 0; } __initcall(aio_setup); diff --git a/include/linux/aio.h b/include/linux/aio.h index b83e68dd006f1..86892a4fe7c8b 100644 --- a/include/linux/aio.h +++ b/include/linux/aio.h @@ -20,8 +20,4 @@ static inline void kiocb_set_cancel_fn(struct kiocb *req, kiocb_cancel_fn *cancel) { } #endif /* CONFIG_AIO */ -/* for sysctl: */ -extern unsigned long aio_nr; -extern unsigned long aio_max_nr; - #endif /* __LINUX__AIO_H */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 4d7e35ebca8c7..2afb40b49bafa 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -20,7 +20,6 @@ */ #include -#include #include #include #include @@ -3110,22 +3109,6 @@ static struct ctl_table fs_table[] = { .proc_handler = proc_dointvec, }, #endif -#ifdef CONFIG_AIO - { - .procname = "aio-nr", - .data = &aio_nr, - .maxlen = sizeof(aio_nr), - .mode = 0444, - .proc_handler = proc_doulongvec_minmax, - }, - { - .procname = "aio-max-nr", - .data = &aio_max_nr, - .maxlen = sizeof(aio_max_nr), - .mode = 0644, - .proc_handler = proc_doulongvec_minmax, - }, -#endif /* CONFIG_AIO */ #ifdef CONFIG_INOTIFY_USER { .procname = "inotify", From 9c781fe5e79eef5ea92d488e7e36de3b3a9d8252 Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:28 +1100 Subject: [PATCH 10/58] dnotify: move dnotify sysctl to dnotify.c The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move dnotify sysctls to dnotify.c and use the new register_sysctl_init() to register the sysctl interface. [mcgrof@kernel.org: adjust the commit log to justify the move] Link: https://lkml.kernel.org/r/20211123202347.818157-10-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Acked-by: Jan Kara Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Benjamin LaHaise Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Kees Cook Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Qing Wang Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Clemens Ladisch Cc: David Airlie Cc: Jani Nikula Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Phillip Potter Cc: Rodrigo Vivi Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/notify/dnotify/dnotify.c | 21 ++++++++++++++++++++- include/linux/dnotify.h | 1 - kernel/sysctl.c | 10 ---------- 3 files changed, 20 insertions(+), 12 deletions(-) diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index e85e13c50d6d4..2b04e2296fb63 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -19,7 +19,25 @@ #include #include -int dir_notify_enable __read_mostly = 1; +static int dir_notify_enable __read_mostly = 1; +#ifdef CONFIG_SYSCTL +static struct ctl_table dnotify_sysctls[] = { + { + .procname = "dir-notify-enable", + .data = &dir_notify_enable, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + {} +}; +static void __init dnotify_sysctl_init(void) +{ + register_sysctl_init("fs", dnotify_sysctls); +} +#else +#define dnotify_sysctl_init() do { } while (0) +#endif static struct kmem_cache *dnotify_struct_cache __read_mostly; static struct kmem_cache *dnotify_mark_cache __read_mostly; @@ -386,6 +404,7 @@ static int __init dnotify_init(void) dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops); if (IS_ERR(dnotify_group)) panic("unable to allocate fsnotify group for dnotify\n"); + dnotify_sysctl_init(); return 0; } diff --git a/include/linux/dnotify.h b/include/linux/dnotify.h index 0aad774beaec4..4f3b25d474365 100644 --- a/include/linux/dnotify.h +++ b/include/linux/dnotify.h @@ -29,7 +29,6 @@ struct dnotify_struct { FS_CREATE | FS_DN_RENAME |\ FS_MOVED_FROM | FS_MOVED_TO) -extern int dir_notify_enable; extern void dnotify_flush(struct file *, fl_owner_t); extern int fcntl_dirnotify(int, struct file *, unsigned long); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 2afb40b49bafa..3419c6f914192 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -48,7 +48,6 @@ #include #include #include -#include #include #include #include @@ -3090,15 +3089,6 @@ static struct ctl_table fs_table[] = { .proc_handler = proc_dointvec, }, #endif -#ifdef CONFIG_DNOTIFY - { - .procname = "dir-notify-enable", - .data = &dir_notify_enable, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, -#endif #ifdef CONFIG_MMU #ifdef CONFIG_FILE_LOCKING { From 5c9ddbb4bbc26a77a70fe8a62a7a0ef6a6cbb27f Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:29 +1100 Subject: [PATCH 11/58] hpet: simplify subdirectory registration with register_sysctl() Patch series "sysctl: second set of kernel/sysctl cleanups", v2. This is the 2nd set of kernel/sysctl.c cleanups. The diff stat should reflect how this is a much better way to deal with theses. Fortunately coccinelle can be used to ensure correctness for most of these and/or future merge conflicts. Note that since this is part of a larger effort to cleanup kernel/sysctl.c I think we have no other option but to go with merging these patches in either Andrew's tree or keep them staged in a separate tree and send a merge request later. Otherwise kernel/sysctl.c will end up becoming a sore spot for the next merge window. This patch (of 8): There is no need to user boiler plate code to specify a set of base directories we're going to stuff sysctls under. Simplify this by using register_sysctl() and specifying the directory path directly. // pycocci sysctl-subdir-register-sysctl-simplify.cocci drivers/char/hpet.c @c1@ expression E1; identifier subdir, sysctls; @@ static struct ctl_table subdir[] = { { .procname = E1, .maxlen = 0, .mode = 0555, .child = sysctls, }, { } }; @c2@ identifier c1.subdir; expression E2; identifier base; @@ static struct ctl_table base[] = { { .procname = E2, .maxlen = 0, .mode = 0555, .child = subdir, }, { } }; @c3@ identifier c2.base; identifier header; @@ header = register_sysctl_table(base); @r1 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.subdir, c1.sysctls; @@ -static struct ctl_table subdir[] = { - { - .procname = E1, - .maxlen = 0, - .mode = 0555, - .child = sysctls, - }, - { } -}; @r2 depends on c1 && c2 && c3@ identifier c1.subdir; expression c2.E2; identifier c2.base; @@ -static struct ctl_table base[] = { - { - .procname = E2, - .maxlen = 0, - .mode = 0555, - .child = subdir, - }, - { } -}; @initialize:python@ @@ def make_my_fresh_expression(s1, s2): return '"' + s1.strip('"') + "/" + s2.strip('"') + '"' @r3 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.sysctls; expression c2.E2; identifier c2.base; identifier c3.header; fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) }; @@ header = -register_sysctl_table(base); +register_sysctl(E3, sysctls); Generated-by: Coccinelle SmPL Link: https://lkml.kernel.org/r/20211123202422.819032-1-mcgrof@kernel.org Link: https://lkml.kernel.org/r/20211123202422.819032-2-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Kees Cook Cc: Iurii Zaikin Cc: Xiaoming Ni Cc: "Eric W. Biederman" Cc: Clemens Ladisch Cc: Arnd Bergmann Cc: Greg Kroah-Hartman Cc: Jani Nikula Cc: Joonas Lahtinen Cc: Rodrigo Vivi Cc: David Airlie Cc: Benjamin Herrenschmidt Cc: Mark Fasheh Cc: Joel Becker Cc: Joseph Qi Cc: Jan Kara Cc: Amir Goldstein Cc: Phillip Potter Cc: Al Viro Cc: Julia Lawall Cc: Lukas Middendorf Cc: Antti Palosaari Cc: Mauro Carvalho Chehab Cc: Andy Shevchenko Cc: Benjamin LaHaise Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Qing Wang Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- drivers/char/hpet.c | 22 +--------------------- 1 file changed, 1 insertion(+), 21 deletions(-) diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c index 4e5431f01450b..563dfae3b8daf 100644 --- a/drivers/char/hpet.c +++ b/drivers/char/hpet.c @@ -746,26 +746,6 @@ static struct ctl_table hpet_table[] = { {} }; -static struct ctl_table hpet_root[] = { - { - .procname = "hpet", - .maxlen = 0, - .mode = 0555, - .child = hpet_table, - }, - {} -}; - -static struct ctl_table dev_root[] = { - { - .procname = "dev", - .maxlen = 0, - .mode = 0555, - .child = hpet_root, - }, - {} -}; - static struct ctl_table_header *sysctl_header; /* @@ -1061,7 +1041,7 @@ static int __init hpet_init(void) if (result < 0) return -ENODEV; - sysctl_header = register_sysctl_table(dev_root); + sysctl_header = register_sysctl("dev/hpet", hpet_table); result = acpi_bus_register_driver(&hpet_acpi_driver); if (result < 0) { From e3b0f679f352435a64cd442d7d52c0e2183e14e4 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:30 +1100 Subject: [PATCH 12/58] i915: simplify subdirectory registration with register_sysctl() There is no need to user boiler plate code to specify a set of base directories we're going to stuff sysctls under. Simplify this by using register_sysctl() and specifying the directory path directly. // pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH @c1@ expression E1; identifier subdir, sysctls; @@ static struct ctl_table subdir[] = { { .procname = E1, .maxlen = 0, .mode = 0555, .child = sysctls, }, { } }; @c2@ identifier c1.subdir; expression E2; identifier base; @@ static struct ctl_table base[] = { { .procname = E2, .maxlen = 0, .mode = 0555, .child = subdir, }, { } }; @c3@ identifier c2.base; identifier header; @@ header = register_sysctl_table(base); @r1 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.subdir, c1.sysctls; @@ -static struct ctl_table subdir[] = { - { - .procname = E1, - .maxlen = 0, - .mode = 0555, - .child = sysctls, - }, - { } -}; @r2 depends on c1 && c2 && c3@ identifier c1.subdir; expression c2.E2; identifier c2.base; @@ -static struct ctl_table base[] = { - { - .procname = E2, - .maxlen = 0, - .mode = 0555, - .child = subdir, - }, - { } -}; @initialize:python@ @@ def make_my_fresh_expression(s1, s2): return '"' + s1.strip('"') + "/" + s2.strip('"') + '"' @r3 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.sysctls; expression c2.E2; identifier c2.base; identifier c3.header; fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) }; @@ header = -register_sysctl_table(base); +register_sysctl(E3, sysctls); Generated-by: Coccinelle SmPL Link: https://lkml.kernel.org/r/20211123202422.819032-3-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Acked-by: Jani Nikula Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Jan Kara Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Xiaoming Ni Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- drivers/gpu/drm/i915/i915_perf.c | 22 +--------------------- 1 file changed, 1 insertion(+), 21 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 2f01b8c0284c8..5979e32586475 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -4273,26 +4273,6 @@ static struct ctl_table oa_table[] = { {} }; -static struct ctl_table i915_root[] = { - { - .procname = "i915", - .maxlen = 0, - .mode = 0555, - .child = oa_table, - }, - {} -}; - -static struct ctl_table dev_root[] = { - { - .procname = "dev", - .maxlen = 0, - .mode = 0555, - .child = i915_root, - }, - {} -}; - static void oa_init_supported_formats(struct i915_perf *perf) { struct drm_i915_private *i915 = perf->i915; @@ -4488,7 +4468,7 @@ static int destroy_config(int id, void *p, void *data) int i915_perf_sysctl_register(void) { - sysctl_header = register_sysctl_table(dev_root); + sysctl_header = register_sysctl("dev/i915", oa_table); return 0; } From 46ad6e47fe63b6c9bc7292de274f70c8ab8d3773 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:31 +1100 Subject: [PATCH 13/58] macintosh/mac_hid.c: simplify subdirectory registration with register_sysctl() There is no need to user boiler plate code to specify a set of base directories we're going to stuff sysctls under. Simplify this by using register_sysctl() and specifying the directory path directly. // pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH @c1@ expression E1; identifier subdir, sysctls; @@ static struct ctl_table subdir[] = { { .procname = E1, .maxlen = 0, .mode = 0555, .child = sysctls, }, { } }; @c2@ identifier c1.subdir; expression E2; identifier base; @@ static struct ctl_table base[] = { { .procname = E2, .maxlen = 0, .mode = 0555, .child = subdir, }, { } }; @c3@ identifier c2.base; identifier header; @@ header = register_sysctl_table(base); @r1 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.subdir, c1.sysctls; @@ -static struct ctl_table subdir[] = { - { - .procname = E1, - .maxlen = 0, - .mode = 0555, - .child = sysctls, - }, - { } -}; @r2 depends on c1 && c2 && c3@ identifier c1.subdir; expression c2.E2; identifier c2.base; @@ -static struct ctl_table base[] = { - { - .procname = E2, - .maxlen = 0, - .mode = 0555, - .child = subdir, - }, - { } -}; @initialize:python@ @@ def make_my_fresh_expression(s1, s2): return '"' + s1.strip('"') + "/" + s2.strip('"') + '"' @r3 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.sysctls; expression c2.E2; identifier c2.base; identifier c3.header; fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) }; @@ header = -register_sysctl_table(base); +register_sysctl(E3, sysctls); Generated-by: Coccinelle SmPL Link: https://lkml.kernel.org/r/20211123202422.819032-4-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Jani Nikula Cc: Jan Kara Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Xiaoming Ni Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- drivers/macintosh/mac_hid.c | 24 +----------------------- 1 file changed, 1 insertion(+), 23 deletions(-) diff --git a/drivers/macintosh/mac_hid.c b/drivers/macintosh/mac_hid.c index 28b8581b44dda..d8c4d5664145d 100644 --- a/drivers/macintosh/mac_hid.c +++ b/drivers/macintosh/mac_hid.c @@ -239,33 +239,11 @@ static struct ctl_table mac_hid_files[] = { { } }; -/* dir in /proc/sys/dev */ -static struct ctl_table mac_hid_dir[] = { - { - .procname = "mac_hid", - .maxlen = 0, - .mode = 0555, - .child = mac_hid_files, - }, - { } -}; - -/* /proc/sys/dev itself, in case that is not there yet */ -static struct ctl_table mac_hid_root_dir[] = { - { - .procname = "dev", - .maxlen = 0, - .mode = 0555, - .child = mac_hid_dir, - }, - { } -}; - static struct ctl_table_header *mac_hid_sysctl_header; static int __init mac_hid_init(void) { - mac_hid_sysctl_header = register_sysctl_table(mac_hid_root_dir); + mac_hid_sysctl_header = register_sysctl("dev/mac_hid", mac_hid_files); if (!mac_hid_sysctl_header) return -ENOMEM; From 76691c9ffb59a9fc149b3c1ed6502fb4a227de55 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:32 +1100 Subject: [PATCH 14/58] ocfs2: simplify subdirectory registration with register_sysctl() There is no need to user boiler plate code to specify a set of base directories we're going to stuff sysctls under. Simplify this by using register_sysctl() and specifying the directory path directly. // pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH @c1@ expression E1; identifier subdir, sysctls; @@ static struct ctl_table subdir[] = { { .procname = E1, .maxlen = 0, .mode = 0555, .child = sysctls, }, { } }; @c2@ identifier c1.subdir; expression E2; identifier base; @@ static struct ctl_table base[] = { { .procname = E2, .maxlen = 0, .mode = 0555, .child = subdir, }, { } }; @c3@ identifier c2.base; identifier header; @@ header = register_sysctl_table(base); @r1 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.subdir, c1.sysctls; @@ -static struct ctl_table subdir[] = { - { - .procname = E1, - .maxlen = 0, - .mode = 0555, - .child = sysctls, - }, - { } -}; @r2 depends on c1 && c2 && c3@ identifier c1.subdir; expression c2.E2; identifier c2.base; @@ -static struct ctl_table base[] = { - { - .procname = E2, - .maxlen = 0, - .mode = 0555, - .child = subdir, - }, - { } -}; @initialize:python@ @@ def make_my_fresh_expression(s1, s2): return '"' + s1.strip('"') + "/" + s2.strip('"') + '"' @r3 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.sysctls; expression c2.E2; identifier c2.base; identifier c3.header; fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) }; @@ header = -register_sysctl_table(base); +register_sysctl(E3, sysctls); Generated-by: Coccinelle SmPL Link: https://lkml.kernel.org/r/20211123202422.819032-5-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Reviewed-by: Jan Kara Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Jani Nikula Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Xiaoming Ni Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/ocfs2/stackglue.c | 25 +------------------------ 1 file changed, 1 insertion(+), 24 deletions(-) diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c index 16f1bfc407f2a..731558a6f27d4 100644 --- a/fs/ocfs2/stackglue.c +++ b/fs/ocfs2/stackglue.c @@ -672,31 +672,8 @@ static struct ctl_table ocfs2_mod_table[] = { { } }; -static struct ctl_table ocfs2_kern_table[] = { - { - .procname = "ocfs2", - .data = NULL, - .maxlen = 0, - .mode = 0555, - .child = ocfs2_mod_table - }, - { } -}; - -static struct ctl_table ocfs2_root_table[] = { - { - .procname = "fs", - .data = NULL, - .maxlen = 0, - .mode = 0555, - .child = ocfs2_kern_table - }, - { } -}; - static struct ctl_table_header *ocfs2_table_header; - /* * Initialization */ @@ -705,7 +682,7 @@ static int __init ocfs2_stack_glue_init(void) { strcpy(cluster_stack_name, OCFS2_STACK_PLUGIN_O2CB); - ocfs2_table_header = register_sysctl_table(ocfs2_root_table); + ocfs2_table_header = register_sysctl("fs/ocfs2", ocfs2_mod_table); if (!ocfs2_table_header) { printk(KERN_ERR "ocfs2 stack glue: unable to register sysctl\n"); From ab15c7dd31673e340189f041d5d235b37241a918 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:32 +1100 Subject: [PATCH 15/58] test_sysctl: simplify subdirectory registration with register_sysctl() There is no need to user boiler plate code to specify a set of base directories we're going to stuff sysctls under. Simplify this by using register_sysctl() and specifying the directory path directly. // pycocci sysctl-subdir-register-sysctl-simplify.cocci lib/test_sysctl.c @c1@ expression E1; identifier subdir, sysctls; @@ static struct ctl_table subdir[] = { { .procname = E1, .maxlen = 0, .mode = 0555, .child = sysctls, }, { } }; @c2@ identifier c1.subdir; expression E2; identifier base; @@ static struct ctl_table base[] = { { .procname = E2, .maxlen = 0, .mode = 0555, .child = subdir, }, { } }; @c3@ identifier c2.base; identifier header; @@ header = register_sysctl_table(base); @r1 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.subdir, c1.sysctls; @@ -static struct ctl_table subdir[] = { - { - .procname = E1, - .maxlen = 0, - .mode = 0555, - .child = sysctls, - }, - { } -}; @r2 depends on c1 && c2 && c3@ identifier c1.subdir; expression c2.E2; identifier c2.base; @@ -static struct ctl_table base[] = { - { - .procname = E2, - .maxlen = 0, - .mode = 0555, - .child = subdir, - }, - { } -}; @initialize:python@ @@ def make_my_fresh_expression(s1, s2): return '"' + s1.strip('"') + "/" + s2.strip('"') + '"' @r3 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.sysctls; expression c2.E2; identifier c2.base; identifier c3.header; fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) }; @@ header = -register_sysctl_table(base); +register_sysctl(E3, sysctls); Generated-by: Coccinelle SmPL Link: https://lkml.kernel.org/r/20211123202422.819032-6-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Jani Nikula Cc: Jan Kara Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Xiaoming Ni Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- lib/test_sysctl.c | 22 +--------------------- 1 file changed, 1 insertion(+), 21 deletions(-) diff --git a/lib/test_sysctl.c b/lib/test_sysctl.c index 3750323973f4a..a5a3d6c27e1fa 100644 --- a/lib/test_sysctl.c +++ b/lib/test_sysctl.c @@ -128,26 +128,6 @@ static struct ctl_table test_table[] = { { } }; -static struct ctl_table test_sysctl_table[] = { - { - .procname = "test_sysctl", - .maxlen = 0, - .mode = 0555, - .child = test_table, - }, - { } -}; - -static struct ctl_table test_sysctl_root_table[] = { - { - .procname = "debug", - .maxlen = 0, - .mode = 0555, - .child = test_sysctl_table, - }, - { } -}; - static struct ctl_table_header *test_sysctl_header; static int __init test_sysctl_init(void) @@ -155,7 +135,7 @@ static int __init test_sysctl_init(void) test_data.bitmap_0001 = kzalloc(SYSCTL_TEST_BITMAP_SIZE/8, GFP_KERNEL); if (!test_data.bitmap_0001) return -ENOMEM; - test_sysctl_header = register_sysctl_table(test_sysctl_root_table); + test_sysctl_header = register_sysctl("debug/test_sysctl", test_table); if (!test_sysctl_header) { kfree(test_data.bitmap_0001); return -ENOMEM; From c802d8d287abfcc6e07c5473e7255cf697bff764 Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:33 +1100 Subject: [PATCH 16/58] inotify: simplify subdirectory registration with register_sysctl() There is no need to user boiler plate code to specify a set of base directories we're going to stuff sysctls under. Simplify this by using register_sysctl() and specifying the directory path directly. Move inotify_user sysctl to inotify_user.c while at it to remove clutter from kernel/sysctl.c. [mcgrof@kernel.org: update commit log to reflect new path we decided to take] Link: https://lkml.kernel.org/r/20211123202422.819032-7-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Jani Nikula Cc: Jan Kara Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/notify/inotify/inotify_user.c | 11 ++++++++++- include/linux/inotify.h | 3 --- kernel/sysctl.c | 21 --------------------- 3 files changed, 10 insertions(+), 25 deletions(-) diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 29fca3284bb5c..54583f62dc440 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -58,7 +58,7 @@ struct kmem_cache *inotify_inode_mark_cachep __read_mostly; static long it_zero = 0; static long it_int_max = INT_MAX; -struct ctl_table inotify_table[] = { +static struct ctl_table inotify_table[] = { { .procname = "max_user_instances", .data = &init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES], @@ -87,6 +87,14 @@ struct ctl_table inotify_table[] = { }, { } }; + +static void __init inotify_sysctls_init(void) +{ + register_sysctl("fs/inotify", inotify_table); +} + +#else +#define inotify_sysctls_init() do { } while (0) #endif /* CONFIG_SYSCTL */ static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg) @@ -849,6 +857,7 @@ static int __init inotify_user_setup(void) inotify_max_queued_events = 16384; init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128; init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = watches_max; + inotify_sysctls_init(); return 0; } diff --git a/include/linux/inotify.h b/include/linux/inotify.h index 6a24905f6e1e2..8d20caa1b2687 100644 --- a/include/linux/inotify.h +++ b/include/linux/inotify.h @@ -7,11 +7,8 @@ #ifndef _LINUX_INOTIFY_H #define _LINUX_INOTIFY_H -#include #include -extern struct ctl_table inotify_table[]; /* for sysctl */ - #define ALL_INOTIFY_BITS (IN_ACCESS | IN_MODIFY | IN_ATTRIB | IN_CLOSE_WRITE | \ IN_CLOSE_NOWRITE | IN_OPEN | IN_MOVED_FROM | \ IN_MOVED_TO | IN_CREATE | IN_DELETE | \ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 3419c6f914192..aac99ba0d5a76 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -125,13 +125,6 @@ static const int maxolduid = 65535; static const int ngroups_max = NGROUPS_MAX; static const int cap_last_cap = CAP_LAST_CAP; -#ifdef CONFIG_INOTIFY_USER -#include -#endif -#ifdef CONFIG_FANOTIFY -#include -#endif - #ifdef CONFIG_PROC_SYSCTL /** @@ -3099,20 +3092,6 @@ static struct ctl_table fs_table[] = { .proc_handler = proc_dointvec, }, #endif -#ifdef CONFIG_INOTIFY_USER - { - .procname = "inotify", - .mode = 0555, - .child = inotify_table, - }, -#endif -#ifdef CONFIG_FANOTIFY - { - .procname = "fanotify", - .mode = 0555, - .child = fanotify_table, - }, -#endif #ifdef CONFIG_EPOLL { .procname = "epoll", From 4a45a9c4da53997be320ad034e6e546143235dbb Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:34 +1100 Subject: [PATCH 17/58] inotify-simplify-subdirectory-registration-with-register_sysctl-fix remember to register fanotify_table Link: https://lkml.kernel.org/r/YZ5A6iWLb0h3N3RC@bombadil.infradead.org Reported-by: Jan Kara Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: Douglas Gilbert Cc: Eric Biederman Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: James E.J. Bottomley Cc: Jani Nikula Cc: Jani Nikula Cc: Joel Becker Cc: John Ogness Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Martin K. Petersen Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: "Rafael J. Wysocki" Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: Tetsuo Handa Cc: "Theodore Ts'o" Cc: Xiaoming Ni Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/notify/fanotify/fanotify_user.c | 10 +++++++++- include/linux/fanotify.h | 2 -- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 559bc1e9926d6..a35693eb1f36c 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -59,7 +59,7 @@ static int fanotify_max_queued_events __read_mostly; static long ft_zero = 0; static long ft_int_max = INT_MAX; -struct ctl_table fanotify_table[] = { +static struct ctl_table fanotify_table[] = { { .procname = "max_user_groups", .data = &init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS], @@ -88,6 +88,13 @@ struct ctl_table fanotify_table[] = { }, { } }; + +static void __init fanotify_sysctls_init(void) +{ + register_sysctl("fs/fanotify", fanotify_table); +} +#else +#define fanotify_sysctls_init() do { } while (0) #endif /* CONFIG_SYSCTL */ /* @@ -1685,6 +1692,7 @@ static int __init fanotify_user_setup(void) init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS] = FANOTIFY_DEFAULT_MAX_GROUPS; init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS] = max_marks; + fanotify_sysctls_init(); return 0; } diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 616af2ea20f30..556cc63c88ee7 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -5,8 +5,6 @@ #include #include -extern struct ctl_table fanotify_table[]; /* for sysctl */ - #define FAN_GROUP_FLAG(group, flag) \ ((group)->fanotify_data.flags & (flag)) From 1971e9e1f69092cc97e514e58a11c329ec65526a Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:35 +1100 Subject: [PATCH 18/58] cdrom: simplify subdirectory registration with register_sysctl() There is no need to user boiler plate code to specify a set of base directories we're going to stuff sysctls under. Simplify this by using register_sysctl() and specifying the directory path directly. // pycocci sysctl-subdir-register-sysctl-simplify.cocci PATH @c1@ expression E1; identifier subdir, sysctls; @@ static struct ctl_table subdir[] = { { .procname = E1, .maxlen = 0, .mode = 0555, .child = sysctls, }, { } }; @c2@ identifier c1.subdir; expression E2; identifier base; @@ static struct ctl_table base[] = { { .procname = E2, .maxlen = 0, .mode = 0555, .child = subdir, }, { } }; @c3@ identifier c2.base; identifier header; @@ header = register_sysctl_table(base); @r1 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.subdir, c1.sysctls; @@ -static struct ctl_table subdir[] = { - { - .procname = E1, - .maxlen = 0, - .mode = 0555, - .child = sysctls, - }, - { } -}; @r2 depends on c1 && c2 && c3@ identifier c1.subdir; expression c2.E2; identifier c2.base; @@ -static struct ctl_table base[] = { - { - .procname = E2, - .maxlen = 0, - .mode = 0555, - .child = subdir, - }, - { } -}; @initialize:python@ @@ def make_my_fresh_expression(s1, s2): return '"' + s1.strip('"') + "/" + s2.strip('"') + '"' @r3 depends on c1 && c2 && c3@ expression c1.E1; identifier c1.sysctls; expression c2.E2; identifier c2.base; identifier c3.header; fresh identifier E3 = script:python(E2, E1) { make_my_fresh_expression(E2, E1) }; @@ header = -register_sysctl_table(base); +register_sysctl(E3, sysctls); Generated-by: Coccinelle SmPL Link: https://lkml.kernel.org/r/20211123202422.819032-8-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Jani Nikula Cc: Jan Kara Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Xiaoming Ni Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- drivers/cdrom/cdrom.c | 23 +---------------------- 1 file changed, 1 insertion(+), 22 deletions(-) diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index 9877e413fce39..1b57d4666e43c 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -3691,27 +3691,6 @@ static struct ctl_table cdrom_table[] = { }, { } }; - -static struct ctl_table cdrom_cdrom_table[] = { - { - .procname = "cdrom", - .maxlen = 0, - .mode = 0555, - .child = cdrom_table, - }, - { } -}; - -/* Make sure that /proc/sys/dev is there */ -static struct ctl_table cdrom_root_table[] = { - { - .procname = "dev", - .maxlen = 0, - .mode = 0555, - .child = cdrom_cdrom_table, - }, - { } -}; static struct ctl_table_header *cdrom_sysctl_header; static void cdrom_sysctl_register(void) @@ -3721,7 +3700,7 @@ static void cdrom_sysctl_register(void) if (!atomic_add_unless(&initialized, 1, 1)) return; - cdrom_sysctl_header = register_sysctl_table(cdrom_root_table); + cdrom_sysctl_header = register_sysctl("dev/cdrom", cdrom_table); /* set the defaults */ cdrom_sysctl_settings.autoclose = autoclose; From 2e16e22c75a2272eadfd10684e8dba4fc1d39352 Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:36 +1100 Subject: [PATCH 19/58] eventpoll: simplify sysctl declaration with register_sysctl() The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the epoll_table sysctl to fs/eventpoll.c and use register_sysctl(). Link: https://lkml.kernel.org/r/20211123202422.819032-9-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: "Eric W. Biederman" Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: Jani Nikula Cc: Jan Kara Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Tetsuo Handa Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Jani Nikula Cc: John Ogness Cc: Martin K. Petersen Cc: "Rafael J. Wysocki" Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/eventpoll.c | 10 +++++++++- include/linux/poll.h | 2 -- include/linux/sysctl.h | 1 - kernel/sysctl.c | 7 ------- 4 files changed, 9 insertions(+), 11 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 06f4c5ae1451e..e2daa940ebce7 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -307,7 +307,7 @@ static void unlist_file(struct epitems_head *head) static long long_zero; static long long_max = LONG_MAX; -struct ctl_table epoll_table[] = { +static struct ctl_table epoll_table[] = { { .procname = "max_user_watches", .data = &max_user_watches, @@ -319,6 +319,13 @@ struct ctl_table epoll_table[] = { }, { } }; + +static void __init epoll_sysctls_init(void) +{ + register_sysctl("fs/epoll", epoll_table); +} +#else +#define epoll_sysctls_init() do { } while (0) #endif /* CONFIG_SYSCTL */ static const struct file_operations eventpoll_fops; @@ -2378,6 +2385,7 @@ static int __init eventpoll_init(void) /* Allocates slab cache used to allocate "struct eppoll_entry" */ pwq_cache = kmem_cache_create("eventpoll_pwq", sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL); + epoll_sysctls_init(); ephead_cache = kmem_cache_create("ep_head", sizeof(struct epitems_head), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL); diff --git a/include/linux/poll.h b/include/linux/poll.h index 1cdc32b1f1b08..a9e0e1c2d1f2f 100644 --- a/include/linux/poll.h +++ b/include/linux/poll.h @@ -8,12 +8,10 @@ #include #include #include -#include #include #include #include -extern struct ctl_table epoll_table[]; /* for sysctl */ /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating additional memory. */ #ifdef __clang__ diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 47cf70c8eb93c..6dd0f277f8443 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -219,7 +219,6 @@ extern int no_unaligned_warning; extern struct ctl_table sysctl_mount_point[]; extern struct ctl_table random_table[]; extern struct ctl_table firmware_config_table[]; -extern struct ctl_table epoll_table[]; #else /* CONFIG_SYSCTL */ static inline struct ctl_table_header *register_sysctl_table(struct ctl_table * table) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index aac99ba0d5a76..dbc1a7c821bea 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -3092,13 +3092,6 @@ static struct ctl_table fs_table[] = { .proc_handler = proc_dointvec, }, #endif -#ifdef CONFIG_EPOLL - { - .procname = "epoll", - .mode = 0555, - .child = epoll_table, - }, -#endif #endif { .procname = "protected_symlinks", From 7be815fb2dc2e39b52517c9941cf271bd5dcab34 Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:36 +1100 Subject: [PATCH 20/58] firmware_loader: move firmware sysctl to its own files Patch series "sysctl: 3rd set of kernel/sysctl cleanups", v2. This is the third set of patches to help address cleaning the kitchen seink in kernel/sysctl.c and to move sysctls away to where they are actually implemented / used. This patch (of 8): kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the firmware configuration sysctl table to the only place where it is used, and make it clear that if sysctls are disabled this is not used. [mcgrof@kernel.org: major commit log update to justify the move] Link: https://lkml.kernel.org/r/20211124231435.1445213-1-mcgrof@kernel.org Link: https://lkml.kernel.org/r/20211124231435.1445213-2-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Cc: Kees Cook Cc: Iurii Zaikin Cc: Eric Biederman Cc: Stephen Kitt Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: "Theodore Ts'o" Cc: Al Viro Cc: Petr Mladek Cc: Sergey Senozhatsky Cc: Steven Rostedt (VMware) Cc: John Ogness Cc: Douglas Gilbert Cc: James E.J. Bottomley Cc: Martin K. Petersen Cc: Lukas Middendorf Cc: Antti Palosaari Cc: Mauro Carvalho Chehab Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: Jani Nikula Cc: Jani Nikula Cc: Jan Kara Cc: Joel Becker Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Mark Fasheh Cc: Paul Turner Cc: Peter Zijlstra Cc: Phillip Potter Cc: Qing Wang Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Suren Baghdasaryan Cc: Tetsuo Handa Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- drivers/base/firmware_loader/fallback.c | 7 ++++++- drivers/base/firmware_loader/fallback.h | 11 ++++++++++ drivers/base/firmware_loader/fallback_table.c | 21 +++++++++++++++++-- include/linux/sysctl.h | 1 - kernel/sysctl.c | 7 ------- 5 files changed, 36 insertions(+), 11 deletions(-) diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c index d7d63c1aa993f..4afb0e9312c09 100644 --- a/drivers/base/firmware_loader/fallback.c +++ b/drivers/base/firmware_loader/fallback.c @@ -199,11 +199,16 @@ static struct class firmware_class = { int register_sysfs_loader(void) { - return class_register(&firmware_class); + int ret = class_register(&firmware_class); + + if (ret != 0) + return ret; + return register_firmware_config_sysctl(); } void unregister_sysfs_loader(void) { + unregister_firmware_config_sysctl(); class_unregister(&firmware_class); } diff --git a/drivers/base/firmware_loader/fallback.h b/drivers/base/firmware_loader/fallback.h index 3af7205b302fe..9f3055d3b4ca8 100644 --- a/drivers/base/firmware_loader/fallback.h +++ b/drivers/base/firmware_loader/fallback.h @@ -42,6 +42,17 @@ void fw_fallback_set_default_timeout(void); int register_sysfs_loader(void); void unregister_sysfs_loader(void); +#ifdef CONFIG_SYSCTL +extern int register_firmware_config_sysctl(void); +extern void unregister_firmware_config_sysctl(void); +#else +static inline int register_firmware_config_sysctl(void) +{ + return 0; +} +static inline void unregister_firmware_config_sysctl(void) { } +#endif /* CONFIG_SYSCTL */ + #else /* CONFIG_FW_LOADER_USER_HELPER */ static inline int firmware_fallback_sysfs(struct firmware *fw, const char *name, struct device *device, diff --git a/drivers/base/firmware_loader/fallback_table.c b/drivers/base/firmware_loader/fallback_table.c index 46a731dede6f7..51751c46cdcf1 100644 --- a/drivers/base/firmware_loader/fallback_table.c +++ b/drivers/base/firmware_loader/fallback_table.c @@ -24,7 +24,7 @@ struct firmware_fallback_config fw_fallback_config = { EXPORT_SYMBOL_NS_GPL(fw_fallback_config, FIRMWARE_LOADER_PRIVATE); #ifdef CONFIG_SYSCTL -struct ctl_table firmware_config_table[] = { +static struct ctl_table firmware_config_table[] = { { .procname = "force_sysfs_fallback", .data = &fw_fallback_config.force_sysfs_fallback, @@ -45,4 +45,21 @@ struct ctl_table firmware_config_table[] = { }, { } }; -#endif + +static struct ctl_table_header *firmware_config_sysct_table_header; +int register_firmware_config_sysctl(void) +{ + firmware_config_sysct_table_header = + register_sysctl("kernel/firmware_config", + firmware_config_table); + if (!firmware_config_sysct_table_header) + return -ENOMEM; + return 0; +} + +void unregister_firmware_config_sysctl(void) +{ + unregister_sysctl_table(firmware_config_sysct_table_header); + firmware_config_sysct_table_header = NULL; +} +#endif /* CONFIG_SYSCTL */ diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 6dd0f277f8443..3985e9c801557 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -218,7 +218,6 @@ extern int no_unaligned_warning; extern struct ctl_table sysctl_mount_point[]; extern struct ctl_table random_table[]; -extern struct ctl_table firmware_config_table[]; #else /* CONFIG_SYSCTL */ static inline struct ctl_table_header *register_sysctl_table(struct ctl_table * table) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index dbc1a7c821bea..59a224b746bf0 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2153,13 +2153,6 @@ static struct ctl_table kern_table[] = { .mode = 0555, .child = usermodehelper_table, }, -#ifdef CONFIG_FW_LOADER_USER_HELPER - { - .procname = "firmware_config", - .mode = 0555, - .child = firmware_config_table, - }, -#endif { .procname = "overflowuid", .data = &overflowuid, From 6b6c18493d78383a87983b7b42773af48aa95978 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Fri, 3 Dec 2021 12:58:37 +1100 Subject: [PATCH 21/58] firmware_loader-move-firmware-sysctl-to-its-own-files-fix export register_firmware_config_sysctl and unregister_firmware_config_sysctl to modules ERROR: modpost: "register_firmware_config_sysctl" [drivers/base/firmware_loader/firmware_class.ko] undefined! ERROR: modpost: "unregister_firmware_config_sysctl" [drivers/base/firmware_loader/firmware_class.ko] undefined! Cc: Luis Chamberlain Reported-by: kernel test robot Cc: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- drivers/base/firmware_loader/fallback_table.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/base/firmware_loader/fallback_table.c b/drivers/base/firmware_loader/fallback_table.c index 51751c46cdcf1..e9c24a93b49c6 100644 --- a/drivers/base/firmware_loader/fallback_table.c +++ b/drivers/base/firmware_loader/fallback_table.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include @@ -56,10 +57,13 @@ int register_firmware_config_sysctl(void) return -ENOMEM; return 0; } +EXPORT_SYMBOL(register_firmware_config_sysctl); void unregister_firmware_config_sysctl(void) { unregister_sysctl_table(firmware_config_sysct_table_header); firmware_config_sysct_table_header = NULL; } +EXPORT_SYMBOL(unregister_firmware_config_sysctl); + #endif /* CONFIG_SYSCTL */ From 5606ad84c76bfc3a03c2a1dc8921980300b2cdfd Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Fri, 3 Dec 2021 12:58:38 +1100 Subject: [PATCH 22/58] firmware_loader-move-firmware-sysctl-to-its-own-files-fix-fix Cc: kernel test robot Cc: Luis Chamberlain Cc: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- drivers/base/firmware_loader/fallback_table.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/base/firmware_loader/fallback_table.c b/drivers/base/firmware_loader/fallback_table.c index e9c24a93b49c6..c3fb8292acb82 100644 --- a/drivers/base/firmware_loader/fallback_table.c +++ b/drivers/base/firmware_loader/fallback_table.c @@ -4,7 +4,7 @@ #include #include #include -#include +#include #include #include #include @@ -57,13 +57,13 @@ int register_firmware_config_sysctl(void) return -ENOMEM; return 0; } -EXPORT_SYMBOL(register_firmware_config_sysctl); +EXPORT_SYMBOL_NS_GPL(register_firmware_config_sysctl); void unregister_firmware_config_sysctl(void) { unregister_sysctl_table(firmware_config_sysct_table_header); firmware_config_sysct_table_header = NULL; } -EXPORT_SYMBOL(unregister_firmware_config_sysctl); +EXPORT_SYMBOL_NS_GPL(unregister_firmware_config_sysctl); #endif /* CONFIG_SYSCTL */ From 907aaed0ad1fe800c37024e6b478b6988f29d2db Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Fri, 3 Dec 2021 12:58:38 +1100 Subject: [PATCH 23/58] firmware_loader-move-firmware-sysctl-to-its-own-files-fix-3 akpm sucks Link: https://lkml.kernel.org/r/20211201160626.401d828d@canb.auug.org.au Signed-off-by: Stephen Rothwell Cc: Luis Chamberlain Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- drivers/base/firmware_loader/fallback_table.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/base/firmware_loader/fallback_table.c b/drivers/base/firmware_loader/fallback_table.c index c3fb8292acb82..e5ac098d0742a 100644 --- a/drivers/base/firmware_loader/fallback_table.c +++ b/drivers/base/firmware_loader/fallback_table.c @@ -57,13 +57,13 @@ int register_firmware_config_sysctl(void) return -ENOMEM; return 0; } -EXPORT_SYMBOL_NS_GPL(register_firmware_config_sysctl); +EXPORT_SYMBOL_NS_GPL(register_firmware_config_sysctl, FIRMWARE_LOADER_PRIVATE); void unregister_firmware_config_sysctl(void) { unregister_sysctl_table(firmware_config_sysct_table_header); firmware_config_sysct_table_header = NULL; } -EXPORT_SYMBOL_NS_GPL(unregister_firmware_config_sysctl); +EXPORT_SYMBOL_NS_GPL(unregister_firmware_config_sysctl, FIRMWARE_LOADER_PRIVATE); #endif /* CONFIG_SYSCTL */ From ddc27f170b1e7fe44d08efed9ae6eaa01cbcc829 Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:39 +1100 Subject: [PATCH 24/58] random: move the random sysctl declarations to its own file kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the random sysctls to their own file and use register_sysctl_init(). [mcgrof@kernel.org: commit log update to justify the move] Link: https://lkml.kernel.org/r/20211124231435.1445213-3-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: Douglas Gilbert Cc: Eric Biederman Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: James E.J. Bottomley Cc: Jani Nikula Cc: Jani Nikula Cc: Jan Kara Cc: Joel Becker Cc: John Ogness Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Martin K. Petersen Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: "Rafael J. Wysocki" Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: Tetsuo Handa Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- drivers/char/random.c | 14 ++++++++++++-- include/linux/sysctl.h | 1 - kernel/sysctl.c | 5 ----- 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index c81485e2f1264..000305722115f 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -2088,8 +2088,7 @@ static int proc_do_entropy(struct ctl_table *table, int write, } static int sysctl_poolsize = INPUT_POOL_WORDS * 32; -extern struct ctl_table random_table[]; -struct ctl_table random_table[] = { +static struct ctl_table random_table[] = { { .procname = "poolsize", .data = &sysctl_poolsize, @@ -2151,6 +2150,17 @@ struct ctl_table random_table[] = { #endif { } }; + +/* + * rand_initialize() is called before sysctl_init(), + * so we cannot call register_sysctl_init() in rand_initialize() + */ +static int __init random_sysctls_init(void) +{ + register_sysctl_init("kernel/random", random_table); + return 0; +} +device_initcall(random_sysctls_init); #endif /* CONFIG_SYSCTL */ struct batched_entropy { diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 3985e9c801557..fce05a060bc5e 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -217,7 +217,6 @@ extern int unaligned_dump_stack; extern int no_unaligned_warning; extern struct ctl_table sysctl_mount_point[]; -extern struct ctl_table random_table[]; #else /* CONFIG_SYSCTL */ static inline struct ctl_table_header *register_sysctl_table(struct ctl_table * table) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 59a224b746bf0..fb2ead3045b44 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2143,11 +2143,6 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = sysctl_max_threads, }, - { - .procname = "random", - .mode = 0555, - .child = random_table, - }, { .procname = "usermodehelper", .mode = 0555, From 9e00fdc87cd9b5af8d1ef00e2982e5b69988bef8 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:40 +1100 Subject: [PATCH 25/58] sysctl: add helper to register a sysctl mount point The way to create a subdirectory on top of sysctl_mount_point is a bit obscure, and *why* we do that even so more. Provide a helper which makes it clear why we do this. Link: https://lkml.kernel.org/r/20211124231435.1445213-4-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Suggested-by: "Eric W. Biederman" Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: Douglas Gilbert Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: James E.J. Bottomley Cc: Jani Nikula Cc: Jani Nikula Cc: Jan Kara Cc: Joel Becker Cc: John Ogness Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Martin K. Petersen Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: "Rafael J. Wysocki" Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: Tetsuo Handa Cc: "Theodore Ts'o" Cc: Xiaoming Ni Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/proc/proc_sysctl.c | 13 +++++++++++++ include/linux/sysctl.h | 7 +++++++ 2 files changed, 20 insertions(+) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 0b7a00ed6c49b..a670b4bb42342 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -35,6 +35,19 @@ struct ctl_table sysctl_mount_point[] = { { } }; +/** + * register_sysctl_mount_point() - registers a sysctl mount point + * @path: path for the mount point + * + * Used to create a permanently empty directory to serve as mount point. + * There are some subtle but important permission checks this allows in the + * case of unprivileged mounts. + */ +struct ctl_table_header *register_sysctl_mount_point(const char *path) +{ + return register_sysctl(path, sysctl_mount_point); +} + static bool is_empty_dir(struct ctl_table_header *head) { return head->ctl_table[0].child == sysctl_mount_point; diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index fce05a060bc5e..746c098a6ff57 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -209,6 +209,8 @@ extern int sysctl_init(void); extern void __register_sysctl_init(const char *path, struct ctl_table *table, const char *table_name); #define register_sysctl_init(path, table) __register_sysctl_init(path, table, #table) +extern struct ctl_table_header *register_sysctl_mount_point(const char *path); + void do_sysctl_args(void); extern int pwrsw_enabled; @@ -224,6 +226,11 @@ static inline struct ctl_table_header *register_sysctl_table(struct ctl_table * return NULL; } +static inline struct sysctl_header *register_sysctl_mount_point(const char *path) +{ + return NULL; +} + static inline struct ctl_table_header *register_sysctl_paths( const struct ctl_path *path, struct ctl_table *table) { From 05b7106e65d49a295fcb2a4a6b44dffca792e0db Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Fri, 3 Dec 2021 12:58:41 +1100 Subject: [PATCH 26/58] sysctl-add-helper-to-register-a-sysctl-mount-point-fix export register_sysctl_mount_point() to modules ERROR: modpost: "register_sysctl_mount_point" [fs/binfmt_misc.ko] undefined! Cc: Luis Chamberlain Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/proc/proc_sysctl.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index a670b4bb42342..5a9b9419566da 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -47,6 +47,7 @@ struct ctl_table_header *register_sysctl_mount_point(const char *path) { return register_sysctl(path, sysctl_mount_point); } +EXPORT_SYMBOL(register_sysctl_mount_point); static bool is_empty_dir(struct ctl_table_header *head) { From a7f532a395f6863f0931e06158a715b35bef9818 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:41 +1100 Subject: [PATCH 27/58] fs: move binfmt_misc sysctl to its own file kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. This moves the binfmt_misc sysctl to its own file to help remove clutter from kernel/sysctl.c. Link: https://lkml.kernel.org/r/20211124231435.1445213-5-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: Douglas Gilbert Cc: Eric Biederman Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: James E.J. Bottomley Cc: Jani Nikula Cc: Jani Nikula Cc: Jan Kara Cc: Joel Becker Cc: John Ogness Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Martin K. Petersen Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: "Rafael J. Wysocki" Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: Tetsuo Handa Cc: "Theodore Ts'o" Cc: Xiaoming Ni Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/binfmt_misc.c | 6 +++++- kernel/sysctl.c | 7 ------- 2 files changed, 5 insertions(+), 8 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index e1eae7ea823ae..ddea6acbddde5 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -822,7 +822,11 @@ static int __init init_misc_binfmt(void) int err = register_filesystem(&bm_fs_type); if (!err) insert_binfmt(&misc_format); - return err; + if (!register_sysctl_mount_point("fs/binfmt_misc")) { + pr_warn("Failed to create fs/binfmt_misc sysctl mount point"); + return -ENOMEM; + } + return 0; } static void __exit exit_misc_binfmt(void) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index fb2ead3045b44..48569afc32179 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -3126,13 +3126,6 @@ static struct ctl_table fs_table[] = { .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_TWO, }, -#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE) - { - .procname = "binfmt_misc", - .mode = 0555, - .child = sysctl_mount_point, - }, -#endif { .procname = "pipe-max-size", .data = &pipe_max_size, From 0595d90cc99a769387133d407d8deda043767ca1 Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:42 +1100 Subject: [PATCH 28/58] printk: move printk sysctl to printk/sysctl.c kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move printk sysctl from kernel/sysctl.c to kernel/printk/sysctl.c. Use register_sysctl() to register the sysctl interface. [mcgrof@kernel.org: fixed compile issues when PRINTK is not set, commit log update] Link: https://lkml.kernel.org/r/20211124231435.1445213-6-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: Douglas Gilbert Cc: Eric Biederman Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: James E.J. Bottomley Cc: Jani Nikula Cc: Jani Nikula Cc: Jan Kara Cc: Joel Becker Cc: John Ogness Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Martin K. Petersen Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: "Rafael J. Wysocki" Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: Tetsuo Handa Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- kernel/printk/Makefile | 5 ++- kernel/printk/internal.h | 6 +++ kernel/printk/printk.c | 1 + kernel/printk/sysctl.c | 85 ++++++++++++++++++++++++++++++++++++++++ kernel/sysctl.c | 68 -------------------------------- 5 files changed, 96 insertions(+), 69 deletions(-) create mode 100644 kernel/printk/sysctl.c diff --git a/kernel/printk/Makefile b/kernel/printk/Makefile index d118739874c0d..f5b388e810b9f 100644 --- a/kernel/printk/Makefile +++ b/kernel/printk/Makefile @@ -2,5 +2,8 @@ obj-y = printk.o obj-$(CONFIG_PRINTK) += printk_safe.o obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o -obj-$(CONFIG_PRINTK) += printk_ringbuffer.o obj-$(CONFIG_PRINTK_INDEX) += index.o + +obj-$(CONFIG_PRINTK) += printk_support.o +printk_support-y := printk_ringbuffer.o +printk_support-$(CONFIG_SYSCTL) += sysctl.o diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 9f3ed2fdb7211..6b1c4b399845b 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -4,6 +4,12 @@ */ #include +#if defined(CONFIG_PRINTK) && defined(CONFIG_SYSCTL) +void __init printk_sysctl_init(void); +#else +#define printk_sysctl_init() do { } while (0) +#endif + #ifdef CONFIG_PRINTK /* Flags for a single printk record. */ diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index cbc35d586afb6..dbb44086ba652 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -3202,6 +3202,7 @@ static int __init printk_late_init(void) ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "printk:online", console_cpu_notify, NULL); WARN_ON(ret < 0); + printk_sysctl_init(); return 0; } late_initcall(printk_late_init); diff --git a/kernel/printk/sysctl.c b/kernel/printk/sysctl.c new file mode 100644 index 0000000000000..653ae04aab7f0 --- /dev/null +++ b/kernel/printk/sysctl.c @@ -0,0 +1,85 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * sysctl.c: General linux system control interface + */ + +#include +#include +#include +#include +#include "internal.h" + +static const int ten_thousand = 10000; + +static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) +{ + if (write && !capable(CAP_SYS_ADMIN)) + return -EPERM; + + return proc_dointvec_minmax(table, write, buffer, lenp, ppos); +} + +static struct ctl_table printk_sysctls[] = { + { + .procname = "printk", + .data = &console_loglevel, + .maxlen = 4*sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .procname = "printk_ratelimit", + .data = &printk_ratelimit_state.interval, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_jiffies, + }, + { + .procname = "printk_ratelimit_burst", + .data = &printk_ratelimit_state.burst, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .procname = "printk_delay", + .data = &printk_delay_msec, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = (void *)&ten_thousand, + }, + { + .procname = "printk_devkmsg", + .data = devkmsg_log_str, + .maxlen = DEVKMSG_STR_MAX_SIZE, + .mode = 0644, + .proc_handler = devkmsg_sysctl_set_loglvl, + }, + { + .procname = "dmesg_restrict", + .data = &dmesg_restrict, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax_sysadmin, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + { + .procname = "kptr_restrict", + .data = &kptr_restrict, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax_sysadmin, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_TWO, + }, + {} +}; + +void __init printk_sysctl_init(void) +{ + register_sysctl_init("kernel", printk_sysctls); +} diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 48569afc32179..60a14d6484aa4 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -907,17 +907,6 @@ static int proc_taint(struct ctl_table *table, int write, return err; } -#ifdef CONFIG_PRINTK -static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) -{ - if (write && !capable(CAP_SYS_ADMIN)) - return -EPERM; - - return proc_dointvec_minmax(table, write, buffer, lenp, ppos); -} -#endif - /** * struct do_proc_dointvec_minmax_conv_param - proc_dointvec_minmax() range checking structure * @min: pointer to minimum allowable value @@ -2209,63 +2198,6 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_doulongvec_minmax, }, -#if defined CONFIG_PRINTK - { - .procname = "printk", - .data = &console_loglevel, - .maxlen = 4*sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, - { - .procname = "printk_ratelimit", - .data = &printk_ratelimit_state.interval, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_jiffies, - }, - { - .procname = "printk_ratelimit_burst", - .data = &printk_ratelimit_state.burst, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, - { - .procname = "printk_delay", - .data = &printk_delay_msec, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - .extra2 = (void *)&ten_thousand, - }, - { - .procname = "printk_devkmsg", - .data = devkmsg_log_str, - .maxlen = DEVKMSG_STR_MAX_SIZE, - .mode = 0644, - .proc_handler = devkmsg_sysctl_set_loglvl, - }, - { - .procname = "dmesg_restrict", - .data = &dmesg_restrict, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax_sysadmin, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, - { - .procname = "kptr_restrict", - .data = &kptr_restrict, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax_sysadmin, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_TWO, - }, -#endif { .procname = "ngroups_max", .data = (void *)&ngroups_max, From d630c4c97bd75592ba67e0493534a959a4c085af Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:43 +1100 Subject: [PATCH 29/58] scsi/sg: move sg-big-buff sysctl to scsi/sg.c kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the sg-big-buff sysctl from kernel/sysctl.c to drivers/scsi/sg.c and use register_sysctl() to register the sysctl interface. [mcgrof@kernel.org: commit log update] Link: https://lkml.kernel.org/r/20211124231435.1445213-7-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: Douglas Gilbert Cc: Eric Biederman Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: James E.J. Bottomley Cc: Jani Nikula Cc: Jani Nikula Cc: Jan Kara Cc: Joel Becker Cc: John Ogness Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Martin K. Petersen Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: "Rafael J. Wysocki" Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: Tetsuo Handa Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- drivers/scsi/sg.c | 35 ++++++++++++++++++++++++++++++++++- include/scsi/sg.h | 4 ---- kernel/sysctl.c | 12 ------------ 3 files changed, 34 insertions(+), 17 deletions(-) diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index ad12b3261845b..6b43e97bd417a 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -77,7 +77,7 @@ static int sg_proc_init(void); #define SG_DEFAULT_TIMEOUT mult_frac(SG_DEFAULT_TIMEOUT_USER, HZ, USER_HZ) -int sg_big_buff = SG_DEF_RESERVED_SIZE; +static int sg_big_buff = SG_DEF_RESERVED_SIZE; /* N.B. This variable is readable and writeable via /proc/scsi/sg/def_reserved_size . Each time sg_open() is called a buffer of this size (or less if there is not enough memory) will be reserved @@ -1634,6 +1634,37 @@ MODULE_PARM_DESC(scatter_elem_sz, "scatter gather element " MODULE_PARM_DESC(def_reserved_size, "size of buffer reserved for each fd"); MODULE_PARM_DESC(allow_dio, "allow direct I/O (default: 0 (disallow))"); +#ifdef CONFIG_SYSCTL +#include + +static struct ctl_table sg_sysctls[] = { + { + .procname = "sg-big-buff", + .data = &sg_big_buff, + .maxlen = sizeof(int), + .mode = 0444, + .proc_handler = proc_dointvec, + }, + {} +}; + +static struct ctl_table_header *hdr; +static void register_sg_sysctls(void) +{ + if (!hdr) + hdr = register_sysctl("kernel", sg_sysctls); +} + +static void unregister_sg_sysctls(void) +{ + if (hdr) + unregister_sysctl_table(hdr); +} +#else +#define register_sg_sysctls() do { } while (0) +#define unregister_sg_sysctls() do { } while (0) +#endif /* CONFIG_SYSCTL */ + static int __init init_sg(void) { @@ -1666,6 +1697,7 @@ init_sg(void) return 0; } class_destroy(sg_sysfs_class); + register_sg_sysctls(); err_out: unregister_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0), SG_MAX_DEVS); return rc; @@ -1674,6 +1706,7 @@ init_sg(void) static void __exit exit_sg(void) { + unregister_sg_sysctls(); #ifdef CONFIG_SCSI_PROC_FS remove_proc_subtree("scsi/sg", NULL); #endif /* CONFIG_SCSI_PROC_FS */ diff --git a/include/scsi/sg.h b/include/scsi/sg.h index 843cefb8efce0..068e35d365575 100644 --- a/include/scsi/sg.h +++ b/include/scsi/sg.h @@ -29,10 +29,6 @@ * For utility and test programs see: http://sg.danny.cz/sg/sg3_utils.html */ -#ifdef __KERNEL__ -extern int sg_big_buff; /* for sysctl */ -#endif - typedef struct sg_iovec /* same structure as used by readv() Linux system */ { /* call. It defines one scatter-gather element. */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 60a14d6484aa4..325672a27f021 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -94,9 +94,6 @@ #if defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_LOCK_STAT) #include #endif -#ifdef CONFIG_CHR_DEV_SG -#include -#endif #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE #include #endif @@ -2089,15 +2086,6 @@ static struct ctl_table kern_table[] = { .proc_handler = proc_dostring, }, #endif -#ifdef CONFIG_CHR_DEV_SG - { - .procname = "sg-big-buff", - .data = &sg_big_buff, - .maxlen = sizeof (int), - .mode = 0444, - .proc_handler = proc_dointvec, - }, -#endif #ifdef CONFIG_BSD_PROCESS_ACCT { .procname = "acct", From 9512283771d043339472a2a4f5c18dca03dcd942 Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:43 +1100 Subject: [PATCH 30/58] stackleak: move stack_erasing sysctl to stackleak.c kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the stack_erasing sysctl from kernel/sysctl.c to kernel/stackleak.c and use register_sysctl() to register the sysctl interface. [mcgrof@kernel.org: commit log update] Link: https://lkml.kernel.org/r/20211124231435.1445213-8-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: Douglas Gilbert Cc: Eric Biederman Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: James E.J. Bottomley Cc: Jani Nikula Cc: Jani Nikula Cc: Jan Kara Cc: Joel Becker Cc: John Ogness Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Martin K. Petersen Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: "Rafael J. Wysocki" Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: Tetsuo Handa Cc: "Theodore Ts'o" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- include/linux/stackleak.h | 5 ----- kernel/stackleak.c | 26 ++++++++++++++++++++++++-- kernel/sysctl.c | 14 -------------- 3 files changed, 24 insertions(+), 21 deletions(-) diff --git a/include/linux/stackleak.h b/include/linux/stackleak.h index a59db2f08e76b..ccaab2043fcd5 100644 --- a/include/linux/stackleak.h +++ b/include/linux/stackleak.h @@ -23,11 +23,6 @@ static inline void stackleak_task_init(struct task_struct *t) # endif } -#ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE -int stack_erasing_sysctl(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos); -#endif - #else /* !CONFIG_GCC_PLUGIN_STACKLEAK */ static inline void stackleak_task_init(struct task_struct *t) { } #endif diff --git a/kernel/stackleak.c b/kernel/stackleak.c index ce161a8e8d975..66b8af394e589 100644 --- a/kernel/stackleak.c +++ b/kernel/stackleak.c @@ -16,11 +16,13 @@ #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE #include #include +#include static DEFINE_STATIC_KEY_FALSE(stack_erasing_bypass); -int stack_erasing_sysctl(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) +#ifdef CONFIG_SYSCTL +static int stack_erasing_sysctl(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, loff_t *ppos) { int ret = 0; int state = !static_branch_unlikely(&stack_erasing_bypass); @@ -42,6 +44,26 @@ int stack_erasing_sysctl(struct ctl_table *table, int write, state ? "enabled" : "disabled"); return ret; } +static struct ctl_table stackleak_sysctls[] = { + { + .procname = "stack_erasing", + .data = NULL, + .maxlen = sizeof(int), + .mode = 0600, + .proc_handler = stack_erasing_sysctl, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + {} +}; + +static int __init stackleak_sysctls_init(void) +{ + register_sysctl_init("kernel", stackleak_sysctls); + return 0; +} +late_initcall(stackleak_sysctls_init); +#endif /* CONFIG_SYSCTL */ #define skip_erasing() static_branch_unlikely(&stack_erasing_bypass) #else diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 325672a27f021..546756894fc9e 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -94,9 +94,6 @@ #if defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_LOCK_STAT) #include #endif -#ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE -#include -#endif #if defined(CONFIG_SYSCTL) @@ -2441,17 +2438,6 @@ static struct ctl_table kern_table[] = { .extra1 = SYSCTL_ONE, .extra2 = SYSCTL_INT_MAX, }, -#endif -#ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE - { - .procname = "stack_erasing", - .data = NULL, - .maxlen = sizeof(int), - .mode = 0600, - .proc_handler = stack_erasing_sysctl, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, #endif { } }; From 66692b7a201e29f00bba57282aa6de6a8020b2f4 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:44 +1100 Subject: [PATCH 31/58] sysctl: share unsigned long const values Provide a way to share unsigned long values. This will allow others to not have to re-invent these values. Link: https://lkml.kernel.org/r/20211124231435.1445213-9-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Amir Goldstein Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Benjamin LaHaise Cc: Clemens Ladisch Cc: David Airlie Cc: Douglas Gilbert Cc: Eric Biederman Cc: Greg Kroah-Hartman Cc: Iurii Zaikin Cc: James E.J. Bottomley Cc: Jani Nikula Cc: Jani Nikula Cc: Jan Kara Cc: Joel Becker Cc: John Ogness Cc: Joonas Lahtinen Cc: Joseph Qi Cc: Julia Lawall Cc: Kees Cook Cc: Lukas Middendorf Cc: Mark Fasheh Cc: Martin K. Petersen Cc: Mauro Carvalho Chehab Cc: Paul Turner Cc: Peter Zijlstra Cc: Petr Mladek Cc: Phillip Potter Cc: Qing Wang Cc: "Rafael J. Wysocki" Cc: Rodrigo Vivi Cc: Sebastian Reichel Cc: Sergey Senozhatsky Cc: Stephen Kitt Cc: Steven Rostedt (VMware) Cc: Suren Baghdasaryan Cc: Tetsuo Handa Cc: "Theodore Ts'o" Cc: Xiaoming Ni Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/proc/proc_sysctl.c | 3 +++ include/linux/sysctl.h | 6 ++++++ kernel/sysctl.c | 9 +++------ 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index 5a9b9419566da..c21bbf926be2a 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -29,6 +29,9 @@ static const struct inode_operations proc_sys_dir_operations; const int sysctl_vals[] = { -1, 0, 1, 2, 4, 100, 200, 1000, 3000, INT_MAX }; EXPORT_SYMBOL(sysctl_vals); +const unsigned long sysctl_long_vals[] = { 0, 1, LONG_MAX }; +EXPORT_SYMBOL_GPL(sysctl_long_vals); + /* Support for permanently empty directories */ struct ctl_table sysctl_mount_point[] = { diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 746c098a6ff57..2de6d20d191ba 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -51,6 +51,12 @@ struct ctl_dir; extern const int sysctl_vals[]; +#define SYSCTL_LONG_ZERO ((void *)&sysctl_long_vals[0]) +#define SYSCTL_LONG_ONE ((void *)&sysctl_long_vals[1]) +#define SYSCTL_LONG_MAX ((void *)&sysctl_long_vals[2]) + +extern const unsigned long sysctl_long_vals[]; + typedef int proc_handler(struct ctl_table *ctl, int write, void *buffer, size_t *lenp, loff_t *ppos); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 546756894fc9e..8f9c275e02423 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -99,9 +99,6 @@ /* Constants used for minimum and maximum */ -static const unsigned long zero_ul; -static const unsigned long one_ul = 1; -static const unsigned long long_max = LONG_MAX; #ifdef CONFIG_PRINTK static const int ten_thousand = 10000; #endif @@ -2512,7 +2509,7 @@ static struct ctl_table vm_table[] = { .maxlen = sizeof(dirty_background_bytes), .mode = 0644, .proc_handler = dirty_background_bytes_handler, - .extra1 = (void *)&one_ul, + .extra1 = SYSCTL_LONG_ONE, }, { .procname = "dirty_ratio", @@ -2930,8 +2927,8 @@ static struct ctl_table fs_table[] = { .maxlen = sizeof(files_stat.max_files), .mode = 0644, .proc_handler = proc_doulongvec_minmax, - .extra1 = (void *)&zero_ul, - .extra2 = (void *)&long_max, + .extra1 = SYSCTL_LONG_ZERO, + .extra2 = SYSCTL_LONG_MAX, }, { .procname = "nr_open", From fc41db4c306545434694917b80ec1202ad7e82d5 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:44 +1100 Subject: [PATCH 32/58] fs: move inode sysctls to its own file Patch series "sysctl: 4th set of kernel/sysctl cleanups". This is slimming down the fs uses of kernel/sysctl.c to the point that the next step is to just get rid of the fs base directory for it and move that elsehwere, so that next patch series starts dealing with that to demo how we can end up cleaning up a full base directory from kernel/sysctl.c, one at a time. This patch (of 9): kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the inode sysctls to its own file. Since we are no longer using this outside of fs/ remove the extern declaration of its respective proc helper. We use early_initcall() as it is the earliest we can use. Link: https://lkml.kernel.org/r/20211129205548.605569-1-mcgrof@kernel.org Link: https://lkml.kernel.org/r/20211129205548.605569-2-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Kees Cook Cc: Iurii Zaikin Cc: Xiaoming Ni Cc: Eric Biederman Cc: Stephen Kitt Cc: Lukas Middendorf Cc: Antti Palosaari Cc: Mauro Carvalho Chehab Cc: Andy Shevchenko Cc: Jeff Layton Cc: "J. Bruce Fields" Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/inode.c | 31 ++++++++++++++++++++++++++++--- include/linux/fs.h | 3 --- kernel/sysctl.c | 14 -------------- 3 files changed, 28 insertions(+), 20 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 980e7b7a54607..bef6ba9b8eb42 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -70,7 +70,7 @@ EXPORT_SYMBOL(empty_aops); /* * Statistics gathering.. */ -struct inodes_stat_t inodes_stat; +static struct inodes_stat_t inodes_stat; static DEFINE_PER_CPU(unsigned long, nr_inodes); static DEFINE_PER_CPU(unsigned long, nr_unused); @@ -106,13 +106,38 @@ long get_nr_dirty_inodes(void) * Handle nr_inode sysctl */ #ifdef CONFIG_SYSCTL -int proc_nr_inodes(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) +static int proc_nr_inodes(struct ctl_table *table, int write, void *buffer, + size_t *lenp, loff_t *ppos) { inodes_stat.nr_inodes = get_nr_inodes(); inodes_stat.nr_unused = get_nr_inodes_unused(); return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); } + +static struct ctl_table inodes_sysctls[] = { + { + .procname = "inode-nr", + .data = &inodes_stat, + .maxlen = 2*sizeof(long), + .mode = 0444, + .proc_handler = proc_nr_inodes, + }, + { + .procname = "inode-state", + .data = &inodes_stat, + .maxlen = 7*sizeof(long), + .mode = 0444, + .proc_handler = proc_nr_inodes, + }, + { } +}; + +static int __init init_fs_inode_sysctls(void) +{ + register_sysctl_init("fs", inodes_sysctls); + return 0; +} +early_initcall(init_fs_inode_sysctls); #endif static int no_open(struct inode *inode, struct file *file) diff --git a/include/linux/fs.h b/include/linux/fs.h index c8510da6cc6dc..044de67c81672 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -82,7 +82,6 @@ extern void __init files_maxfiles_init(void); extern struct files_stat_struct files_stat; extern unsigned long get_max_files(void); extern unsigned int sysctl_nr_open; -extern struct inodes_stat_t inodes_stat; extern int leases_enable, lease_break_time; extern int sysctl_protected_symlinks; extern int sysctl_protected_hardlinks; @@ -3537,8 +3536,6 @@ int proc_nr_files(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos); int proc_nr_dentry(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos); -int proc_nr_inodes(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos); int __init list_bdev_fs_names(char *buf, size_t size); #define __FMODE_EXEC ((__force int) FMODE_EXEC) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 8f9c275e02423..3eb71bf91b71b 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2900,20 +2900,6 @@ static struct ctl_table vm_table[] = { }; static struct ctl_table fs_table[] = { - { - .procname = "inode-nr", - .data = &inodes_stat, - .maxlen = 2*sizeof(long), - .mode = 0444, - .proc_handler = proc_nr_inodes, - }, - { - .procname = "inode-state", - .data = &inodes_stat, - .maxlen = 7*sizeof(long), - .mode = 0444, - .proc_handler = proc_nr_inodes, - }, { .procname = "file-nr", .data = &files_stat, From 977c61a363331b4a7b538e360c926b26d6e91caa Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 3 Dec 2021 12:14:28 -0800 Subject: [PATCH 33/58] fs/inode: avoid unused-variable warning Now that 'inodes_stat' is marked 'static', it causes a harmless warning whenever it is unused: fs/inode.c:73:29: error: 'inodes_stat' defined but not used [-Werror=unused-variable] 73 | static struct inodes_stat_t inodes_stat; Move it into the #ifdef that guards its only references. Link: https://lkml.kernel.org/r/20211203190123.874239-1-arnd@kernel.org Fixes: 245314851782 ("fs: move inode sysctls to its own file") Signed-off-by: Arnd Bergmann Cc: Luis Chamberlain Cc: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/inode.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index bef6ba9b8eb42..63324df6fa273 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -67,11 +67,6 @@ const struct address_space_operations empty_aops = { }; EXPORT_SYMBOL(empty_aops); -/* - * Statistics gathering.. - */ -static struct inodes_stat_t inodes_stat; - static DEFINE_PER_CPU(unsigned long, nr_inodes); static DEFINE_PER_CPU(unsigned long, nr_unused); @@ -106,6 +101,11 @@ long get_nr_dirty_inodes(void) * Handle nr_inode sysctl */ #ifdef CONFIG_SYSCTL +/* + * Statistics gathering.. + */ +static struct inodes_stat_t inodes_stat; + static int proc_nr_inodes(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { From 6e6599cc5032c47e56d76a783f11557327420765 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:45 +1100 Subject: [PATCH 34/58] fs: move fs stat sysctls to file_table.c kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. We can create the sysctl dynamically on early init for fs stat to help with this clutter. This dusts off the fs stat syctls knobs and puts them into where they are declared. Link: https://lkml.kernel.org/r/20211129205548.605569-3-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Eric Biederman Cc: Iurii Zaikin Cc: "J. Bruce Fields" Cc: Jeff Layton Cc: Kees Cook Cc: Lukas Middendorf Cc: Mauro Carvalho Chehab Cc: Stephen Kitt Cc: Xiaoming Ni Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/file_table.c | 47 ++++++++++++++++++++++++++++++++++++++-------- include/linux/fs.h | 3 --- kernel/sysctl.c | 25 ------------------------ 3 files changed, 39 insertions(+), 36 deletions(-) diff --git a/fs/file_table.c b/fs/file_table.c index 45437f8e1003e..57edef16dce46 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -33,7 +33,7 @@ #include "internal.h" /* sysctl tunables... */ -struct files_stat_struct files_stat = { +static struct files_stat_struct files_stat = { .max_files = NR_FILE }; @@ -75,22 +75,53 @@ unsigned long get_max_files(void) } EXPORT_SYMBOL_GPL(get_max_files); +#if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS) + /* * Handle nr_files sysctl */ -#if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS) -int proc_nr_files(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) +static int proc_nr_files(struct ctl_table *table, int write, void *buffer, + size_t *lenp, loff_t *ppos) { files_stat.nr_files = get_nr_files(); return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); } -#else -int proc_nr_files(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) + +static struct ctl_table fs_stat_sysctls[] = { + { + .procname = "file-nr", + .data = &files_stat, + .maxlen = sizeof(files_stat), + .mode = 0444, + .proc_handler = proc_nr_files, + }, + { + .procname = "file-max", + .data = &files_stat.max_files, + .maxlen = sizeof(files_stat.max_files), + .mode = 0644, + .proc_handler = proc_doulongvec_minmax, + .extra1 = SYSCTL_LONG_ZERO, + .extra2 = SYSCTL_LONG_MAX, + }, + { + .procname = "nr_open", + .data = &sysctl_nr_open, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &sysctl_nr_open_min, + .extra2 = &sysctl_nr_open_max, + }, + { } +}; + +static int __init init_fs_stat_sysctls(void) { - return -ENOSYS; + register_sysctl_init("fs", fs_stat_sysctls); + return 0; } +fs_initcall(init_fs_stat_sysctls); #endif static struct file *__alloc_file(int flags, const struct cred *cred) diff --git a/include/linux/fs.h b/include/linux/fs.h index 044de67c81672..1e67619661203 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -79,7 +79,6 @@ extern void __init inode_init_early(void); extern void __init files_init(void); extern void __init files_maxfiles_init(void); -extern struct files_stat_struct files_stat; extern unsigned long get_max_files(void); extern unsigned int sysctl_nr_open; extern int leases_enable, lease_break_time; @@ -3532,8 +3531,6 @@ ssize_t simple_attr_write(struct file *file, const char __user *buf, size_t len, loff_t *ppos); struct ctl_table; -int proc_nr_files(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos); int proc_nr_dentry(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos); int __init list_bdev_fs_names(char *buf, size_t size); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 3eb71bf91b71b..a94d24595fa37 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2900,31 +2900,6 @@ static struct ctl_table vm_table[] = { }; static struct ctl_table fs_table[] = { - { - .procname = "file-nr", - .data = &files_stat, - .maxlen = sizeof(files_stat), - .mode = 0444, - .proc_handler = proc_nr_files, - }, - { - .procname = "file-max", - .data = &files_stat.max_files, - .maxlen = sizeof(files_stat.max_files), - .mode = 0644, - .proc_handler = proc_doulongvec_minmax, - .extra1 = SYSCTL_LONG_ZERO, - .extra2 = SYSCTL_LONG_MAX, - }, - { - .procname = "nr_open", - .data = &sysctl_nr_open, - .maxlen = sizeof(unsigned int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = &sysctl_nr_open_min, - .extra2 = &sysctl_nr_open_max, - }, { .procname = "dentry-state", .data = &dentry_stat, From ab2b3b935eea94427ff30f55557de87df8bfc99c Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:46 +1100 Subject: [PATCH 35/58] fs: move dcache sysctls to its own file kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the dcache sysctl clutter out of kernel/sysctl.c. This is a small one-off entry, perhaps later we can simplify this representation, but for now we use the helpers we have. We won't know how we can simplify this further untl we're fully done with the cleanup. Link: https://lkml.kernel.org/r/20211129205548.605569-4-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Eric Biederman Cc: Iurii Zaikin Cc: "J. Bruce Fields" Cc: Jeff Layton Cc: Kees Cook Cc: Lukas Middendorf Cc: Mauro Carvalho Chehab Cc: Stephen Kitt Cc: Xiaoming Ni Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/dcache.c | 32 +++++++++++++++++++++++++++++--- include/linux/dcache.h | 10 ---------- include/linux/fs.h | 2 -- kernel/sysctl.c | 7 ------- 4 files changed, 29 insertions(+), 22 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index cf871a81f4fdc..0eef1102f460b 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -115,9 +115,17 @@ static inline struct hlist_bl_head *in_lookup_hash(const struct dentry *parent, return in_lookup_hashtable + hash_32(hash, IN_LOOKUP_SHIFT); } +struct dentry_stat_t { + long nr_dentry; + long nr_unused; + long age_limit; /* age in seconds */ + long want_pages; /* pages requested by system */ + long nr_negative; /* # of unused negative dentries */ + long dummy; /* Reserved for future use */ +}; /* Statistics gathering. */ -struct dentry_stat_t dentry_stat = { +static struct dentry_stat_t dentry_stat = { .age_limit = 45, }; @@ -167,14 +175,32 @@ static long get_nr_dentry_negative(void) return sum < 0 ? 0 : sum; } -int proc_nr_dentry(struct ctl_table *table, int write, void *buffer, - size_t *lenp, loff_t *ppos) +static int proc_nr_dentry(struct ctl_table *table, int write, void *buffer, + size_t *lenp, loff_t *ppos) { dentry_stat.nr_dentry = get_nr_dentry(); dentry_stat.nr_unused = get_nr_dentry_unused(); dentry_stat.nr_negative = get_nr_dentry_negative(); return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); } + +static struct ctl_table fs_dcache_sysctls[] = { + { + .procname = "dentry-state", + .data = &dentry_stat, + .maxlen = 6*sizeof(long), + .mode = 0444, + .proc_handler = proc_nr_dentry, + }, + { } +}; + +static int __init init_fs_dcache_sysctls(void) +{ + register_sysctl_init("fs", fs_dcache_sysctls); + return 0; +} +fs_initcall(init_fs_dcache_sysctls); #endif /* diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 9e23d33bb6f1c..f5bba51480b2f 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -61,16 +61,6 @@ extern const struct qstr empty_name; extern const struct qstr slash_name; extern const struct qstr dotdot_name; -struct dentry_stat_t { - long nr_dentry; - long nr_unused; - long age_limit; /* age in seconds */ - long want_pages; /* pages requested by system */ - long nr_negative; /* # of unused negative dentries */ - long dummy; /* Reserved for future use */ -}; -extern struct dentry_stat_t dentry_stat; - /* * Try to keep struct dentry aligned on 64 byte cachelines (this will * give reasonable cacheline footprint with larger lines without the diff --git a/include/linux/fs.h b/include/linux/fs.h index 1e67619661203..9b856d5da9e26 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3531,8 +3531,6 @@ ssize_t simple_attr_write(struct file *file, const char __user *buf, size_t len, loff_t *ppos); struct ctl_table; -int proc_nr_dentry(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos); int __init list_bdev_fs_names(char *buf, size_t size); #define __FMODE_EXEC ((__force int) FMODE_EXEC) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index a94d24595fa37..dbd267d0f0143 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2900,13 +2900,6 @@ static struct ctl_table vm_table[] = { }; static struct ctl_table fs_table[] = { - { - .procname = "dentry-state", - .data = &dentry_stat, - .maxlen = 6*sizeof(long), - .mode = 0444, - .proc_handler = proc_nr_dentry, - }, { .procname = "overflowuid", .data = &fs_overflowuid, From 60ac49e7ad4c0e141629eb0f2f264d029ab828dd Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 3 Dec 2021 12:14:31 -0800 Subject: [PATCH 36/58] fs/dcache: avoid unused-function warning Now that 'dentry_stat' is marked 'static', we can run into this warning: fs/dcache.c:128:29: error: 'dentry_stat' defined but not used [-Werror=unused-variable] 128 | static struct dentry_stat_t dentry_stat = { Hide it in the same #ifdef as its only references. Link: https://lkml.kernel.org/r/20211203190123.874239-2-arnd@kernel.org Fixes: f0eea17ca8da ("fs: move dcache sysctls to its own file") Signed-off-by: Arnd Bergmann Cc: Luis Chamberlain Cc: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/dcache.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index 0eef1102f460b..c84269c6e8bf6 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -124,16 +124,15 @@ struct dentry_stat_t { long dummy; /* Reserved for future use */ }; -/* Statistics gathering. */ -static struct dentry_stat_t dentry_stat = { - .age_limit = 45, -}; - static DEFINE_PER_CPU(long, nr_dentry); static DEFINE_PER_CPU(long, nr_dentry_unused); static DEFINE_PER_CPU(long, nr_dentry_negative); #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS) +/* Statistics gathering. */ +static struct dentry_stat_t dentry_stat = { + .age_limit = 45, +}; /* * Here we resort to our own counters instead of using generic per-cpu counters From f15a7f670dc5b5ff85e451b2e4bbef6bf2682a25 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:46 +1100 Subject: [PATCH 37/58] sysctl: move maxolduid as a sysctl specific const The maxolduid value is only shared for sysctl purposes for use on a max range. Just stuff this into our shared const array. Link: https://lkml.kernel.org/r/20211129205548.605569-5-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Eric Biederman Cc: Iurii Zaikin Cc: "J. Bruce Fields" Cc: Jeff Layton Cc: Kees Cook Cc: Lukas Middendorf Cc: Mauro Carvalho Chehab Cc: Stephen Kitt Cc: Xiaoming Ni Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/proc/proc_sysctl.c | 2 +- include/linux/sysctl.h | 3 +++ kernel/sysctl.c | 12 ++++-------- 3 files changed, 8 insertions(+), 9 deletions(-) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index c21bbf926be2a..eb16c2bf5b725 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -26,7 +26,7 @@ static const struct file_operations proc_sys_dir_file_operations; static const struct inode_operations proc_sys_dir_operations; /* shared constants to be used in various sysctls */ -const int sysctl_vals[] = { -1, 0, 1, 2, 4, 100, 200, 1000, 3000, INT_MAX }; +const int sysctl_vals[] = { -1, 0, 1, 2, 4, 100, 200, 1000, 65535, INT_MAX }; EXPORT_SYMBOL(sysctl_vals); const unsigned long sysctl_long_vals[] = { 0, 1, LONG_MAX }; diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 2de6d20d191ba..bb921eb8a02d4 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -49,6 +49,9 @@ struct ctl_dir; #define SYSCTL_THREE_THOUSAND ((void *)&sysctl_vals[8]) #define SYSCTL_INT_MAX ((void *)&sysctl_vals[9]) +/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ +#define SYSCTL_MAXOLDUID ((void *)&sysctl_vals[10]) + extern const int sysctl_vals[]; #define SYSCTL_LONG_ZERO ((void *)&sysctl_long_vals[0]) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index dbd267d0f0143..05d9dd85e17f3 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -109,10 +109,6 @@ static const int six_hundred_forty_kb = 640 * 1024; /* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */ static const unsigned long dirty_bytes_min = 2 * PAGE_SIZE; -/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ -static const int maxolduid = 65535; -/* minolduid is SYSCTL_ZERO */ - static const int ngroups_max = NGROUPS_MAX; static const int cap_last_cap = CAP_LAST_CAP; @@ -2126,7 +2122,7 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, - .extra2 = (void *)&maxolduid, + .extra2 = SYSCTL_MAXOLDUID, }, { .procname = "overflowgid", @@ -2135,7 +2131,7 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, - .extra2 = (void *)&maxolduid, + .extra2 = SYSCTL_MAXOLDUID, }, #ifdef CONFIG_S390 { @@ -2907,7 +2903,7 @@ static struct ctl_table fs_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, - .extra2 = (void *)&maxolduid, + .extra2 = SYSCTL_MAXOLDUID, }, { .procname = "overflowgid", @@ -2916,7 +2912,7 @@ static struct ctl_table fs_table[] = { .mode = 0644, .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, - .extra2 = (void *)&maxolduid, + .extra2 = SYSCTL_MAXOLDUID, }, #ifdef CONFIG_FILE_LOCKING { From 812330b42651ffc53a695dda82b7a81d0ef790fa Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:47 +1100 Subject: [PATCH 38/58] fs: move shared sysctls to fs/sysctls.c To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move sysctls which are shared between filesystems into a common file outside of kernel/sysctl.c. Link: https://lkml.kernel.org/r/20211129205548.605569-6-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Eric Biederman Cc: Iurii Zaikin Cc: "J. Bruce Fields" Cc: Jeff Layton Cc: Kees Cook Cc: Lukas Middendorf Cc: Mauro Carvalho Chehab Cc: Stephen Kitt Cc: Xiaoming Ni Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/Makefile | 1 + fs/sysctls.c | 38 ++++++++++++++++++++++++++++++++++++++ kernel/sysctl.c | 18 ------------------ 3 files changed, 39 insertions(+), 18 deletions(-) create mode 100644 fs/sysctls.c diff --git a/fs/Makefile b/fs/Makefile index 84c5e4cdfee5a..ea8770d124da2 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -28,6 +28,7 @@ obj-y += notify/ obj-$(CONFIG_EPOLL) += eventpoll.o obj-y += anon_inodes.o obj-$(CONFIG_SIGNALFD) += signalfd.o +obj-$(CONFIG_SYSCTL) += sysctls.o obj-$(CONFIG_TIMERFD) += timerfd.o obj-$(CONFIG_EVENTFD) += eventfd.o obj-$(CONFIG_USERFAULTFD) += userfaultfd.o diff --git a/fs/sysctls.c b/fs/sysctls.c new file mode 100644 index 0000000000000..54216cd1ecd75 --- /dev/null +++ b/fs/sysctls.c @@ -0,0 +1,38 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * /proc/sys/fs shared sysctls + * + * These sysctls are shared between different filesystems. + */ +#include +#include + +static struct ctl_table fs_shared_sysctls[] = { + { + .procname = "overflowuid", + .data = &fs_overflowuid, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_MAXOLDUID, + }, + { + .procname = "overflowgid", + .data = &fs_overflowgid, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_MAXOLDUID, + }, + { } +}; + +static int __init init_fs_shared_sysctls(void) +{ + register_sysctl_init("fs", fs_shared_sysctls); + return 0; +} + +early_initcall(init_fs_shared_sysctls); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 05d9dd85e17f3..865173cefcef8 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2896,24 +2896,6 @@ static struct ctl_table vm_table[] = { }; static struct ctl_table fs_table[] = { - { - .procname = "overflowuid", - .data = &fs_overflowuid, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_MAXOLDUID, - }, - { - .procname = "overflowgid", - .data = &fs_overflowgid, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_MAXOLDUID, - }, #ifdef CONFIG_FILE_LOCKING { .procname = "leases-enable", From 16708fbe8899c88f40f9ac73734fbc4b16cadd3a Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:48 +1100 Subject: [PATCH 39/58] fs: move locking sysctls where they are used kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. The locking fs sysctls are only used on fs/locks.c, so move them there. Link: https://lkml.kernel.org/r/20211129205548.605569-7-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Eric Biederman Cc: Iurii Zaikin Cc: "J. Bruce Fields" Cc: Jeff Layton Cc: Kees Cook Cc: Lukas Middendorf Cc: Mauro Carvalho Chehab Cc: Stephen Kitt Cc: Xiaoming Ni Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/locks.c | 34 ++++++++++++++++++++++++++++++++-- include/linux/fs.h | 4 ---- kernel/sysctl.c | 20 -------------------- 3 files changed, 32 insertions(+), 26 deletions(-) diff --git a/fs/locks.c b/fs/locks.c index 0fca9d6809781..8c6df10cd9ed2 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -62,6 +62,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include @@ -88,8 +89,37 @@ static int target_leasetype(struct file_lock *fl) return fl->fl_type; } -int leases_enable = 1; -int lease_break_time = 45; +static int leases_enable = 1; +static int lease_break_time = 45; + +#ifdef CONFIG_SYSCTL +static struct ctl_table locks_sysctls[] = { + { + .procname = "leases-enable", + .data = &leases_enable, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, +#ifdef CONFIG_MMU + { + .procname = "lease-break-time", + .data = &lease_break_time, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, +#endif /* CONFIG_MMU */ + {} +}; + +static int __init init_fs_locks_sysctls(void) +{ + register_sysctl_init("fs", locks_sysctls); + return 0; +} +early_initcall(init_fs_locks_sysctls); +#endif /* CONFIG_SYSCTL */ /* * The global file_lock_list is only used for displaying /proc/locks, so we diff --git a/include/linux/fs.h b/include/linux/fs.h index 9b856d5da9e26..0e08c3dd8f756 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -82,10 +82,6 @@ extern void __init files_maxfiles_init(void); extern unsigned long get_max_files(void); extern unsigned int sysctl_nr_open; extern int leases_enable, lease_break_time; -extern int sysctl_protected_symlinks; -extern int sysctl_protected_hardlinks; -extern int sysctl_protected_fifos; -extern int sysctl_protected_regular; typedef __kernel_rwf_t rwf_t; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 865173cefcef8..52a86746ac9d9 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2896,26 +2896,6 @@ static struct ctl_table vm_table[] = { }; static struct ctl_table fs_table[] = { -#ifdef CONFIG_FILE_LOCKING - { - .procname = "leases-enable", - .data = &leases_enable, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, -#endif -#ifdef CONFIG_MMU -#ifdef CONFIG_FILE_LOCKING - { - .procname = "lease-break-time", - .data = &lease_break_time, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, -#endif -#endif { .procname = "protected_symlinks", .data = &sysctl_protected_symlinks, From 6a767d8f8525081902e97c88e2fc256091589564 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:48 +1100 Subject: [PATCH 40/58] fs: move namei sysctls to its own file kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move namei's own sysctl knobs to its own file. Other than the move we also avoid initializing two static variables to 0 as this is not needed: * sysctl_protected_symlinks * sysctl_protected_hardlinks Link: https://lkml.kernel.org/r/20211129205548.605569-8-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Eric Biederman Cc: Iurii Zaikin Cc: "J. Bruce Fields" Cc: Jeff Layton Cc: Kees Cook Cc: Lukas Middendorf Cc: Mauro Carvalho Chehab Cc: Stephen Kitt Cc: Xiaoming Ni Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/namei.c | 58 ++++++++++++++++++++++++++++++++++++++++++---- include/linux/fs.h | 1 - kernel/sysctl.c | 36 ---------------------------- 3 files changed, 54 insertions(+), 41 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index d81f04f8d8188..b867a92c078ec 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1020,10 +1020,60 @@ static inline void put_link(struct nameidata *nd) path_put(&last->link); } -int sysctl_protected_symlinks __read_mostly = 0; -int sysctl_protected_hardlinks __read_mostly = 0; -int sysctl_protected_fifos __read_mostly; -int sysctl_protected_regular __read_mostly; +static int sysctl_protected_symlinks __read_mostly; +static int sysctl_protected_hardlinks __read_mostly; +static int sysctl_protected_fifos __read_mostly; +static int sysctl_protected_regular __read_mostly; + +#ifdef CONFIG_SYSCTL +static struct ctl_table namei_sysctls[] = { + { + .procname = "protected_symlinks", + .data = &sysctl_protected_symlinks, + .maxlen = sizeof(int), + .mode = 0600, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + { + .procname = "protected_hardlinks", + .data = &sysctl_protected_hardlinks, + .maxlen = sizeof(int), + .mode = 0600, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + { + .procname = "protected_fifos", + .data = &sysctl_protected_fifos, + .maxlen = sizeof(int), + .mode = 0600, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_TWO, + }, + { + .procname = "protected_regular", + .data = &sysctl_protected_regular, + .maxlen = sizeof(int), + .mode = 0600, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_TWO, + }, + { } +}; + +static int __init init_fs_namei_sysctls(void) +{ + register_sysctl_init("fs", namei_sysctls); + return 0; +} +fs_initcall(init_fs_namei_sysctls); + +#endif /* CONFIG_SYSCTL */ /** * may_follow_link - Check symlink following for unsafe situations diff --git a/include/linux/fs.h b/include/linux/fs.h index 0e08c3dd8f756..9617dea249782 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -81,7 +81,6 @@ extern void __init files_maxfiles_init(void); extern unsigned long get_max_files(void); extern unsigned int sysctl_nr_open; -extern int leases_enable, lease_break_time; typedef __kernel_rwf_t rwf_t; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 52a86746ac9d9..39ba8e09ce318 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2896,42 +2896,6 @@ static struct ctl_table vm_table[] = { }; static struct ctl_table fs_table[] = { - { - .procname = "protected_symlinks", - .data = &sysctl_protected_symlinks, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, - { - .procname = "protected_hardlinks", - .data = &sysctl_protected_hardlinks, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, - { - .procname = "protected_fifos", - .data = &sysctl_protected_fifos, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_TWO, - }, - { - .procname = "protected_regular", - .data = &sysctl_protected_regular, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_TWO, - }, { .procname = "suid_dumpable", .data = &suid_dumpable, From aafc9c6cc5e4117888b13878fc16d9d98a2a1e3c Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:49 +1100 Subject: [PATCH 41/58] fs: move fs/exec.c sysctls into its own file kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the fs/exec.c respective sysctls to its own file. Since checkpatch complains about style issues with the old code, this move also fixes a few of those minor style issues: * Use pr_warn() instead of prink(WARNING * New empty lines are wanted at the beginning of routines Link: https://lkml.kernel.org/r/20211129205548.605569-9-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Eric Biederman Cc: Iurii Zaikin Cc: "J. Bruce Fields" Cc: Jeff Layton Cc: Kees Cook Cc: Lukas Middendorf Cc: Mauro Carvalho Chehab Cc: Stephen Kitt Cc: Xiaoming Ni Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/exec.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++ kernel/sysctl.c | 66 ------------------------------------ 2 files changed, 90 insertions(+), 66 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index fa142638b191c..3fd2866edec38 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -65,6 +65,7 @@ #include #include #include +#include #include #include @@ -2097,3 +2098,92 @@ COMPAT_SYSCALL_DEFINE5(execveat, int, fd, argv, envp, flags); } #endif + +#ifdef CONFIG_SYSCTL + +static void validate_coredump_safety(void) +{ +#ifdef CONFIG_COREDUMP + if (suid_dumpable == SUID_DUMP_ROOT && + core_pattern[0] != '/' && core_pattern[0] != '|') { + pr_warn( +"Unsafe core_pattern used with fs.suid_dumpable=2.\n" +"Pipe handler or fully qualified core dump path required.\n" +"Set kernel.core_pattern before fs.suid_dumpable.\n" + ); + } +#endif +} + +static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + int error = proc_dointvec_minmax(table, write, buffer, lenp, ppos); + + if (!error) + validate_coredump_safety(); + return error; +} + +static struct ctl_table fs_exec_sysctls[] = { + { + .procname = "suid_dumpable", + .data = &suid_dumpable, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax_coredump, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_TWO, + }, + { } +}; + +#ifdef CONFIG_COREDUMP + +static int proc_dostring_coredump(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + int error = proc_dostring(table, write, buffer, lenp, ppos); + + if (!error) + validate_coredump_safety(); + return error; +} + +static struct ctl_table kernel_exec_sysctls[] = { + { + .procname = "core_uses_pid", + .data = &core_uses_pid, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .procname = "core_pattern", + .data = core_pattern, + .maxlen = CORENAME_MAX_SIZE, + .mode = 0644, + .proc_handler = proc_dostring_coredump, + }, + { + .procname = "core_pipe_limit", + .data = &core_pipe_limit, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { } +}; +#endif + +static int __init init_fs_exec_sysctls(void) +{ + register_sysctl_init("fs", fs_exec_sysctls); +#ifdef CONFIG_COREDUMP + register_sysctl_init("kernel", kernel_exec_sysctls); +#endif + return 0; +} + +fs_initcall(init_fs_exec_sysctls); +#endif /* CONFIG_SYSCTL */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 39ba8e09ce318..0146fc5499780 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1116,40 +1116,6 @@ static int proc_dopipe_max_size(struct ctl_table *table, int write, do_proc_dopipe_max_size_conv, NULL); } -static void validate_coredump_safety(void) -{ -#ifdef CONFIG_COREDUMP - if (suid_dumpable == SUID_DUMP_ROOT && - core_pattern[0] != '/' && core_pattern[0] != '|') { - printk(KERN_WARNING -"Unsafe core_pattern used with fs.suid_dumpable=2.\n" -"Pipe handler or fully qualified core dump path required.\n" -"Set kernel.core_pattern before fs.suid_dumpable.\n" - ); - } -#endif -} - -static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) -{ - int error = proc_dointvec_minmax(table, write, buffer, lenp, ppos); - if (!error) - validate_coredump_safety(); - return error; -} - -#ifdef CONFIG_COREDUMP -static int proc_dostring_coredump(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) -{ - int error = proc_dostring(table, write, buffer, lenp, ppos); - if (!error) - validate_coredump_safety(); - return error; -} -#endif - #ifdef CONFIG_MAGIC_SYSRQ static int sysrq_sysctl_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) @@ -1873,29 +1839,6 @@ static struct ctl_table kern_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, -#ifdef CONFIG_COREDUMP - { - .procname = "core_uses_pid", - .data = &core_uses_pid, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, - { - .procname = "core_pattern", - .data = core_pattern, - .maxlen = CORENAME_MAX_SIZE, - .mode = 0644, - .proc_handler = proc_dostring_coredump, - }, - { - .procname = "core_pipe_limit", - .data = &core_pipe_limit, - .maxlen = sizeof(unsigned int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, -#endif #ifdef CONFIG_PROC_SYSCTL { .procname = "tainted", @@ -2896,15 +2839,6 @@ static struct ctl_table vm_table[] = { }; static struct ctl_table fs_table[] = { - { - .procname = "suid_dumpable", - .data = &suid_dumpable, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax_coredump, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_TWO, - }, { .procname = "pipe-max-size", .data = &pipe_max_size, From 493e70a5f20f8d623847b61241d409b28e0d49d8 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:49 +1100 Subject: [PATCH 42/58] fs: move pipe sysctls to is own file kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the pipe sysctls to its own file. Link: https://lkml.kernel.org/r/20211129205548.605569-10-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Andy Shevchenko Cc: Antti Palosaari Cc: Eric Biederman Cc: Iurii Zaikin Cc: "J. Bruce Fields" Cc: Jeff Layton Cc: Kees Cook Cc: Lukas Middendorf Cc: Mauro Carvalho Chehab Cc: Stephen Kitt Cc: Xiaoming Ni Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/pipe.c | 64 +++++++++++++++++++++++++++++++++++++-- include/linux/pipe_fs_i.h | 4 --- include/linux/sysctl.h | 6 ++++ kernel/sysctl.c | 61 ++++--------------------------------- 4 files changed, 73 insertions(+), 62 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index 6d4342bad9f15..cc28623a67b61 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -50,13 +51,13 @@ * The max size that a non-root user is allowed to grow the pipe. Can * be set by root in /proc/sys/fs/pipe-max-size */ -unsigned int pipe_max_size = 1048576; +static unsigned int pipe_max_size = 1048576; /* Maximum allocatable pages per user. Hard limit is unset by default, soft * matches default values. */ -unsigned long pipe_user_pages_hard; -unsigned long pipe_user_pages_soft = PIPE_DEF_BUFFERS * INR_OPEN_CUR; +static unsigned long pipe_user_pages_hard; +static unsigned long pipe_user_pages_soft = PIPE_DEF_BUFFERS * INR_OPEN_CUR; /* * We use head and tail indices that aren't masked off, except at the point of @@ -1428,6 +1429,60 @@ static struct file_system_type pipe_fs_type = { .kill_sb = kill_anon_super, }; +#ifdef CONFIG_SYSCTL +static int do_proc_dopipe_max_size_conv(unsigned long *lvalp, + unsigned int *valp, + int write, void *data) +{ + if (write) { + unsigned int val; + + val = round_pipe_size(*lvalp); + if (val == 0) + return -EINVAL; + + *valp = val; + } else { + unsigned int val = *valp; + *lvalp = (unsigned long) val; + } + + return 0; +} + +static int proc_dopipe_max_size(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + return do_proc_douintvec(table, write, buffer, lenp, ppos, + do_proc_dopipe_max_size_conv, NULL); +} + +static struct ctl_table fs_pipe_sysctls[] = { + { + .procname = "pipe-max-size", + .data = &pipe_max_size, + .maxlen = sizeof(pipe_max_size), + .mode = 0644, + .proc_handler = proc_dopipe_max_size, + }, + { + .procname = "pipe-user-pages-hard", + .data = &pipe_user_pages_hard, + .maxlen = sizeof(pipe_user_pages_hard), + .mode = 0644, + .proc_handler = proc_doulongvec_minmax, + }, + { + .procname = "pipe-user-pages-soft", + .data = &pipe_user_pages_soft, + .maxlen = sizeof(pipe_user_pages_soft), + .mode = 0644, + .proc_handler = proc_doulongvec_minmax, + }, + { } +}; +#endif + static int __init init_pipe_fs(void) { int err = register_filesystem(&pipe_fs_type); @@ -1439,6 +1494,9 @@ static int __init init_pipe_fs(void) unregister_filesystem(&pipe_fs_type); } } +#ifdef CONFIG_SYSCTL + register_sysctl_init("fs", fs_pipe_sysctls); +#endif return err; } diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index fc5642431b923..c00c618ef290d 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -238,10 +238,6 @@ void pipe_lock(struct pipe_inode_info *); void pipe_unlock(struct pipe_inode_info *); void pipe_double_lock(struct pipe_inode_info *, struct pipe_inode_info *); -extern unsigned int pipe_max_size; -extern unsigned long pipe_user_pages_hard; -extern unsigned long pipe_user_pages_soft; - /* Wait for a pipe to be readable/writable while dropping the pipe lock */ void pipe_wait_readable(struct pipe_inode_info *); void pipe_wait_writable(struct pipe_inode_info *); diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index bb921eb8a02d4..4294e9668bd51 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -221,6 +221,12 @@ extern void __register_sysctl_init(const char *path, struct ctl_table *table, extern struct ctl_table_header *register_sysctl_mount_point(const char *path); void do_sysctl_args(void); +int do_proc_douintvec(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos, + int (*conv)(unsigned long *lvalp, + unsigned int *valp, + int write, void *data), + void *data); extern int pwrsw_enabled; extern int unaligned_enabled; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 0146fc5499780..a4cde441635d6 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -56,7 +56,6 @@ #include #include #include -#include #include #include #include @@ -760,12 +759,12 @@ static int __do_proc_douintvec(void *tbl_data, struct ctl_table *table, return do_proc_douintvec_r(i, buffer, lenp, ppos, conv, data); } -static int do_proc_douintvec(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos, - int (*conv)(unsigned long *lvalp, - unsigned int *valp, - int write, void *data), - void *data) +int do_proc_douintvec(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos, + int (*conv)(unsigned long *lvalp, + unsigned int *valp, + int write, void *data), + void *data) { return __do_proc_douintvec(table->data, table, write, buffer, lenp, ppos, conv, data); @@ -1089,33 +1088,6 @@ int proc_dou8vec_minmax(struct ctl_table *table, int write, } EXPORT_SYMBOL_GPL(proc_dou8vec_minmax); -static int do_proc_dopipe_max_size_conv(unsigned long *lvalp, - unsigned int *valp, - int write, void *data) -{ - if (write) { - unsigned int val; - - val = round_pipe_size(*lvalp); - if (val == 0) - return -EINVAL; - - *valp = val; - } else { - unsigned int val = *valp; - *lvalp = (unsigned long) val; - } - - return 0; -} - -static int proc_dopipe_max_size(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) -{ - return do_proc_douintvec(table, write, buffer, lenp, ppos, - do_proc_dopipe_max_size_conv, NULL); -} - #ifdef CONFIG_MAGIC_SYSRQ static int sysrq_sysctl_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) @@ -2839,27 +2811,6 @@ static struct ctl_table vm_table[] = { }; static struct ctl_table fs_table[] = { - { - .procname = "pipe-max-size", - .data = &pipe_max_size, - .maxlen = sizeof(pipe_max_size), - .mode = 0644, - .proc_handler = proc_dopipe_max_size, - }, - { - .procname = "pipe-user-pages-hard", - .data = &pipe_user_pages_hard, - .maxlen = sizeof(pipe_user_pages_hard), - .mode = 0644, - .proc_handler = proc_doulongvec_minmax, - }, - { - .procname = "pipe-user-pages-soft", - .data = &pipe_user_pages_soft, - .maxlen = sizeof(pipe_user_pages_soft), - .mode = 0644, - .proc_handler = proc_doulongvec_minmax, - }, { .procname = "mount-max", .data = &sysctl_mount_max, From b4ee6ffd02c202b8b7399b9f4f875c0e0a28cdbc Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:50 +1100 Subject: [PATCH 43/58] sysctl: add and use base directory declarer and registration helper Patch series "sysctl: add and use base directory declarer and registration helper". In this patch series we start addressing base directories, and so we start with the "fs" sysctls. The end goal is we end up completely moving all "fs" sysctl knobs out from kernel/sysctl. This patch (of 6): Add a set of helpers which can be used to declare and register base directory sysctls on their own. We do this so we can later move each of the base sysctl directories like "fs", "kernel", etc, to their own respective files instead of shoving the declarations and registrations all on kernel/sysctl.c. The lazy approach has caught up and with this, we just end up extending the list of base directories / sysctls on one file and this makes maintenance difficult due to merge conflicts from many developers. The declarations are used first by kernel/sysctl.c for registration its own base which over time we'll try to clean up. It will be used in the next patch to demonstrate how to cleanly deal with base sysctl directories. Link: https://lkml.kernel.org/r/20211129211943.640266-1-mcgrof@kernel.org Link: https://lkml.kernel.org/r/20211129211943.640266-2-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Kees Cook Cc: Iurii Zaikin Cc: Xiaoming Ni Cc: Eric Biederman Cc: Stephen Kitt Cc: Lukas Middendorf Cc: Antti Palosaari Cc: Mauro Carvalho Chehab Cc: Christian Brauner Cc: Eric Biggers Cc: "Naveen N. Rao" Cc: "David S. Miller" Cc: Masami Hiramatsu Cc: Anil S Keshavamurthy Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/proc/proc_sysctl.c | 9 +++++++++ include/linux/sysctl.h | 23 +++++++++++++++++++++++ kernel/sysctl.c | 41 ++++++++++------------------------------- 3 files changed, 42 insertions(+), 31 deletions(-) diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index eb16c2bf5b725..e4ea0b3e60e08 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -1647,6 +1647,15 @@ struct ctl_table_header *register_sysctl_table(struct ctl_table *table) } EXPORT_SYMBOL(register_sysctl_table); +int __register_sysctl_base(struct ctl_table *base_table) +{ + struct ctl_table_header *hdr; + + hdr = register_sysctl_table(base_table); + kmemleak_not_leak(hdr); + return 0; +} + static void put_links(struct ctl_table_header *header) { struct ctl_table_set *root_set = &sysctl_table_root.default_set; diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 4294e9668bd51..b4b280e7b6c1b 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -194,6 +194,19 @@ struct ctl_path { #ifdef CONFIG_SYSCTL +#define DECLARE_SYSCTL_BASE(_name, _table) \ +static struct ctl_table _name##_base_table[] = { \ + { \ + .procname = #_name, \ + .mode = 0555, \ + .child = _table, \ + }, \ +} + +extern int __register_sysctl_base(struct ctl_table *base_table); + +#define register_sysctl_base(_name) __register_sysctl_base(_name##_base_table) + void proc_sys_poll_notify(struct ctl_table_poll *poll); extern void setup_sysctl_set(struct ctl_table_set *p, @@ -236,6 +249,16 @@ extern int no_unaligned_warning; extern struct ctl_table sysctl_mount_point[]; #else /* CONFIG_SYSCTL */ + +#define DECLARE_SYSCTL_BASE(_name, _table) + +static inline int __register_sysctl_base(struct ctl_table *base_table) +{ + return 0; +} + +#define register_sysctl_base(table) __register_sysctl_base(table) + static inline struct ctl_table_header *register_sysctl_table(struct ctl_table * table) { return NULL; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index a4cde441635d6..bbbafe545723b 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2850,41 +2850,20 @@ static struct ctl_table dev_table[] = { { } }; -static struct ctl_table sysctl_base_table[] = { - { - .procname = "kernel", - .mode = 0555, - .child = kern_table, - }, - { - .procname = "vm", - .mode = 0555, - .child = vm_table, - }, - { - .procname = "fs", - .mode = 0555, - .child = fs_table, - }, - { - .procname = "debug", - .mode = 0555, - .child = debug_table, - }, - { - .procname = "dev", - .mode = 0555, - .child = dev_table, - }, - { } -}; +DECLARE_SYSCTL_BASE(kernel, kern_table); +DECLARE_SYSCTL_BASE(vm, vm_table); +DECLARE_SYSCTL_BASE(fs, fs_table); +DECLARE_SYSCTL_BASE(debug, debug_table); +DECLARE_SYSCTL_BASE(dev, dev_table); int __init sysctl_init(void) { - struct ctl_table_header *hdr; + register_sysctl_base(kernel); + register_sysctl_base(vm); + register_sysctl_base(fs); + register_sysctl_base(debug); + register_sysctl_base(dev); - hdr = register_sysctl_table(sysctl_base_table); - kmemleak_not_leak(hdr); return 0; } #endif /* CONFIG_SYSCTL */ From 71f97d355612b3825f6122509fd298f9f1d295f9 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:50 +1100 Subject: [PATCH 44/58] sysctl-add-and-use-base-directory-declarer-and-registration-helper-fix null-terminate the ctl_table arrays Link: https://lkml.kernel.org/r/YafJY3rXDYnjK/gs@bombadil.infradead.org Cc: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- include/linux/sysctl.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index b4b280e7b6c1b..02134a8abad7a 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -201,6 +201,7 @@ static struct ctl_table _name##_base_table[] = { \ .mode = 0555, \ .child = _table, \ }, \ + { }, \ } extern int __register_sysctl_base(struct ctl_table *base_table); From 88d802b5db7179cf6245949fdf991f3a37fb02c1 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:51 +1100 Subject: [PATCH 45/58] fs: move namespace sysctls and declare fs base directory This moves the namespace sysctls to its own file as part of the kernel/sysctl.c spring cleaning Since we have now removed all sysctls for "fs", we now have to declare it on the filesystem code, we do that using the new helper, which reduces boiler plate code. We rename init_fs_shared_sysctls() to init_fs_sysctls() to reflect that now fs/sysctls.c is taking on the burden of being the first to register the base directory as well. Lastly, since init code will load in the order in which we link it we have to move the sysctl code to be linked in early, so that its early init routine runs prior to other fs code. This way, other filesystem code can register their own sysctls using the helpers after this: * register_sysctl_init() * register_sysctl() Link: https://lkml.kernel.org/r/20211129211943.640266-3-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Anil S Keshavamurthy Cc: Antti Palosaari Cc: Christian Brauner Cc: "David S. Miller" Cc: Eric Biederman Cc: Eric Biggers Cc: Iurii Zaikin Cc: Kees Cook Cc: Lukas Middendorf Cc: Masami Hiramatsu Cc: Mauro Carvalho Chehab Cc: "Naveen N. Rao" Cc: Stephen Kitt Cc: Xiaoming Ni Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/Makefile | 3 ++- fs/namespace.c | 24 +++++++++++++++++++++++- fs/sysctls.c | 9 +++++---- include/linux/mount.h | 3 --- kernel/sysctl.c | 14 -------------- 5 files changed, 30 insertions(+), 23 deletions(-) diff --git a/fs/Makefile b/fs/Makefile index ea8770d124da2..dab324aea08f2 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -6,6 +6,8 @@ # Rewritten to use lists instead of if-statements. # +obj-$(CONFIG_SYSCTL) += sysctls.o + obj-y := open.o read_write.o file_table.o super.o \ char_dev.o stat.o exec.o pipe.o namei.o fcntl.o \ ioctl.o readdir.o select.o dcache.o inode.o \ @@ -28,7 +30,6 @@ obj-y += notify/ obj-$(CONFIG_EPOLL) += eventpoll.o obj-y += anon_inodes.o obj-$(CONFIG_SIGNALFD) += signalfd.o -obj-$(CONFIG_SYSCTL) += sysctls.o obj-$(CONFIG_TIMERFD) += timerfd.o obj-$(CONFIG_EVENTFD) += eventfd.o obj-$(CONFIG_USERFAULTFD) += userfaultfd.o diff --git a/fs/namespace.c b/fs/namespace.c index d1bb7a90454ac..4e026e366d734 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -37,7 +37,7 @@ #include "internal.h" /* Maximum number of mounts in a mount namespace */ -unsigned int sysctl_mount_max __read_mostly = 100000; +static unsigned int sysctl_mount_max __read_mostly = 100000; static unsigned int m_hash_mask __read_mostly; static unsigned int m_hash_shift __read_mostly; @@ -4637,3 +4637,25 @@ const struct proc_ns_operations mntns_operations = { .install = mntns_install, .owner = mntns_owner, }; + +#ifdef CONFIG_SYSCTL +static struct ctl_table fs_namespace_sysctls[] = { + { + .procname = "mount-max", + .data = &sysctl_mount_max, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ONE, + }, + { } +}; + +static int __init init_fs_namespace_sysctls(void) +{ + register_sysctl_init("fs", fs_namespace_sysctls); + return 0; +} +fs_initcall(init_fs_namespace_sysctls); + +#endif /* CONFIG_SYSCTL */ diff --git a/fs/sysctls.c b/fs/sysctls.c index 54216cd1ecd75..c701273c9432b 100644 --- a/fs/sysctls.c +++ b/fs/sysctls.c @@ -29,10 +29,11 @@ static struct ctl_table fs_shared_sysctls[] = { { } }; -static int __init init_fs_shared_sysctls(void) +DECLARE_SYSCTL_BASE(fs, fs_shared_sysctls); + +static int __init init_fs_sysctls(void) { - register_sysctl_init("fs", fs_shared_sysctls); - return 0; + return register_sysctl_base(fs); } -early_initcall(init_fs_shared_sysctls); +early_initcall(init_fs_sysctls); diff --git a/include/linux/mount.h b/include/linux/mount.h index 5d92a7e1a742d..7f18a7555dff7 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -113,9 +113,6 @@ extern void mnt_set_expiry(struct vfsmount *mnt, struct list_head *expiry_list); extern void mark_mounts_for_expiry(struct list_head *mounts); extern dev_t name_to_dev_t(const char *name); - -extern unsigned int sysctl_mount_max; - extern bool path_is_mountpoint(const struct path *path); extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index bbbafe545723b..8d4cab1fbe9fa 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2810,18 +2810,6 @@ static struct ctl_table vm_table[] = { { } }; -static struct ctl_table fs_table[] = { - { - .procname = "mount-max", - .data = &sysctl_mount_max, - .maxlen = sizeof(unsigned int), - .mode = 0644, - .proc_handler = proc_dointvec_minmax, - .extra1 = SYSCTL_ONE, - }, - { } -}; - static struct ctl_table debug_table[] = { #ifdef CONFIG_SYSCTL_EXCEPTION_TRACE { @@ -2852,7 +2840,6 @@ static struct ctl_table dev_table[] = { DECLARE_SYSCTL_BASE(kernel, kern_table); DECLARE_SYSCTL_BASE(vm, vm_table); -DECLARE_SYSCTL_BASE(fs, fs_table); DECLARE_SYSCTL_BASE(debug, debug_table); DECLARE_SYSCTL_BASE(dev, dev_table); @@ -2860,7 +2847,6 @@ int __init sysctl_init(void) { register_sysctl_base(kernel); register_sysctl_base(vm); - register_sysctl_base(fs); register_sysctl_base(debug); register_sysctl_base(dev); From 2742439948f475fbc39445eaab996784c4a6866e Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 3 Dec 2021 12:58:51 +1100 Subject: [PATCH 46/58] kernel/sysctl.c: rename sysctl_init() to sysctl_init_bases() Rename sysctl_init() to sysctl_init_bases() so to reflect exactly what this is doing. Link: https://lkml.kernel.org/r/20211129211943.640266-4-mcgrof@kernel.org Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Anil S Keshavamurthy Cc: Antti Palosaari Cc: Christian Brauner Cc: "David S. Miller" Cc: Eric Biederman Cc: Eric Biggers Cc: Iurii Zaikin Cc: Kees Cook Cc: Lukas Middendorf Cc: Masami Hiramatsu Cc: Mauro Carvalho Chehab Cc: "Naveen N. Rao" Cc: Stephen Kitt Cc: Xiaoming Ni Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- arch/arm/mm/alignment.c | 2 +- arch/sh/mm/alignment.c | 2 +- fs/proc/proc_sysctl.c | 4 ++-- include/linux/sysctl.h | 2 +- kernel/sysctl.c | 2 +- 5 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c index adbb3817d0be7..6f499559d1936 100644 --- a/arch/arm/mm/alignment.c +++ b/arch/arm/mm/alignment.c @@ -1005,7 +1005,7 @@ static int __init noalign_setup(char *__unused) __setup("noalign", noalign_setup); /* - * This needs to be done after sysctl_init, otherwise sys/ will be + * This needs to be done after sysctl_init_bases(), otherwise sys/ will be * overwritten. Actually, this shouldn't be in sys/ at all since * it isn't a sysctl, and it doesn't contain sysctl information. * We now locate it in /proc/cpu/alignment instead. diff --git a/arch/sh/mm/alignment.c b/arch/sh/mm/alignment.c index fb517b82a87b1..d802801ceb939 100644 --- a/arch/sh/mm/alignment.c +++ b/arch/sh/mm/alignment.c @@ -161,7 +161,7 @@ static const struct proc_ops alignment_proc_ops = { }; /* - * This needs to be done after sysctl_init, otherwise sys/ will be + * This needs to be done after sysctl_init_bases(), otherwise sys/ will be * overwritten. Actually, this shouldn't be in sys/ at all since * it isn't a sysctl, and it doesn't contain sysctl information. * We now locate it in /proc/cpu/alignment instead. diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index e4ea0b3e60e08..2391ea3cd8a02 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -1420,7 +1420,7 @@ EXPORT_SYMBOL(register_sysctl); * Context: Can only be called after your respective sysctl base path has been * registered. So for instance, most base directories are registered early on * init before init levels are processed through proc_sys_init() and - * sysctl_init(). + * sysctl_init_bases(). */ void __init __register_sysctl_init(const char *path, struct ctl_table *table, const char *table_name) @@ -1769,7 +1769,7 @@ int __init proc_sys_init(void) proc_sys_root->proc_dir_ops = &proc_sys_dir_file_operations; proc_sys_root->nlink = 0; - return sysctl_init(); + return sysctl_init_bases(); } struct sysctl_alias { diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 02134a8abad7a..180adf7da785c 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -228,7 +228,7 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path, void unregister_sysctl_table(struct ctl_table_header * table); -extern int sysctl_init(void); +extern int sysctl_init_bases(void); extern void __register_sysctl_init(const char *path, struct ctl_table *table, const char *table_name); #define register_sysctl_init(path, table) __register_sysctl_init(path, table, #table) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 8d4cab1fbe9fa..421d29a86c73e 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2843,7 +2843,7 @@ DECLARE_SYSCTL_BASE(vm, vm_table); DECLARE_SYSCTL_BASE(debug, debug_table); DECLARE_SYSCTL_BASE(dev, dev_table); -int __init sysctl_init(void) +int __init sysctl_init_bases(void) { register_sysctl_base(kernel); register_sysctl_base(vm); From 4fe17b0a9a6fc09b075a2e234b12b0476f4a8880 Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:52 +1100 Subject: [PATCH 47/58] printk: fix build warning when CONFIG_PRINTK=n build warning when CONFIG_PRINTK=n kernel/printk/printk.c:175:5: warning: no previous prototype for 'devkmsg_sysctl_set_loglvl' [-Wmissing-prototypes] devkmsg_sysctl_set_loglvl() is only used in sysctl.c when CONFIG_PRINTK=y, but it participates in the build when CONFIG_PRINTK=n. So add compile dependency CONFIG_PRINTK=y && CONFIG_SYSCTL=y to fix the build warning. Link: https://lkml.kernel.org/r/20211129211943.640266-5-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Anil S Keshavamurthy Cc: Antti Palosaari Cc: Christian Brauner Cc: "David S. Miller" Cc: Eric Biederman Cc: Eric Biggers Cc: Iurii Zaikin Cc: Kees Cook Cc: Lukas Middendorf Cc: Masami Hiramatsu Cc: Mauro Carvalho Chehab Cc: "Naveen N. Rao" Cc: Stephen Kitt Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- include/linux/printk.h | 4 ---- kernel/printk/internal.h | 2 ++ kernel/printk/printk.c | 3 ++- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/include/linux/printk.h b/include/linux/printk.h index 9497f6b983399..1522df223c0f7 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -183,10 +183,6 @@ extern bool printk_timed_ratelimit(unsigned long *caller_jiffies, extern int printk_delay_msec; extern int dmesg_restrict; -extern int -devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, void *buf, - size_t *lenp, loff_t *ppos); - extern void wake_up_klogd(void); char *log_buf_addr_get(void); diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 6b1c4b399845b..d947ca6c84f99 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -6,6 +6,8 @@ #if defined(CONFIG_PRINTK) && defined(CONFIG_SYSCTL) void __init printk_sysctl_init(void); +int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos); #else #define printk_sysctl_init() do { } while (0) #endif diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index dbb44086ba652..55722b94909b1 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -171,7 +171,7 @@ static int __init control_devkmsg(char *str) __setup("printk.devkmsg=", control_devkmsg); char devkmsg_log_str[DEVKMSG_STR_MAX_SIZE] = "ratelimit"; - +#if defined(CONFIG_PRINTK) && defined(CONFIG_SYSCTL) int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { @@ -210,6 +210,7 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, return 0; } +#endif /* CONFIG_PRINTK && CONFIG_SYSCTL */ /* Number of registered extended console drivers. */ static int nr_ext_console_drivers; From dfe9f3089392d0cdc3122079c8cd97893c1ab4b8 Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:52 +1100 Subject: [PATCH 48/58] fs/coredump: move coredump sysctls into its own file This moves the fs/coredump.c respective sysctls to its own file. Link: https://lkml.kernel.org/r/20211129211943.640266-6-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Anil S Keshavamurthy Cc: Antti Palosaari Cc: Christian Brauner Cc: "David S. Miller" Cc: Eric Biederman Cc: Eric Biggers Cc: Iurii Zaikin Cc: Kees Cook Cc: Lukas Middendorf Cc: Masami Hiramatsu Cc: Mauro Carvalho Chehab Cc: "Naveen N. Rao" Cc: Stephen Kitt Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/coredump.c | 66 +++++++++++++++++++++++++++++++++++++--- fs/exec.c | 55 --------------------------------- include/linux/coredump.h | 10 +++--- kernel/sysctl.c | 2 -- 4 files changed, 67 insertions(+), 66 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index a6b3c196cdef5..570d983986688 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include @@ -52,9 +53,9 @@ #include -int core_uses_pid; -unsigned int core_pipe_limit; -char core_pattern[CORENAME_MAX_SIZE] = "core"; +static int core_uses_pid; +static unsigned int core_pipe_limit; +static char core_pattern[CORENAME_MAX_SIZE] = "core"; static int core_name_size = CORENAME_MAX_SIZE; struct core_name { @@ -62,8 +63,6 @@ struct core_name { int used, size; }; -/* The maximal length of core_pattern is also specified in sysctl.c */ - static int expand_corename(struct core_name *cn, int size) { char *corename = krealloc(cn->corename, size, GFP_KERNEL); @@ -895,6 +894,63 @@ int dump_align(struct coredump_params *cprm, int align) } EXPORT_SYMBOL(dump_align); +#ifdef CONFIG_SYSCTL + +void validate_coredump_safety(void) +{ + if (suid_dumpable == SUID_DUMP_ROOT && + core_pattern[0] != '/' && core_pattern[0] != '|') { + pr_warn( +"Unsafe core_pattern used with fs.suid_dumpable=2.\n" +"Pipe handler or fully qualified core dump path required.\n" +"Set kernel.core_pattern before fs.suid_dumpable.\n" + ); + } +} + +static int proc_dostring_coredump(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + int error = proc_dostring(table, write, buffer, lenp, ppos); + + if (!error) + validate_coredump_safety(); + return error; +} + +static struct ctl_table coredump_sysctls[] = { + { + .procname = "core_uses_pid", + .data = &core_uses_pid, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { + .procname = "core_pattern", + .data = core_pattern, + .maxlen = CORENAME_MAX_SIZE, + .mode = 0644, + .proc_handler = proc_dostring_coredump, + }, + { + .procname = "core_pipe_limit", + .data = &core_pipe_limit, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + { } +}; + +static int __init init_fs_coredump_sysctls(void) +{ + register_sysctl_init("kernel", coredump_sysctls); + return 0; +} +fs_initcall(init_fs_coredump_sysctls); +#endif /* CONFIG_SYSCTL */ + /* * The purpose of always_dump_vma() is to make sure that special kernel mappings * that are useful for post-mortem analysis are included in every core dump. diff --git a/fs/exec.c b/fs/exec.c index 3fd2866edec38..cc5ec43df028f 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -2101,20 +2101,6 @@ COMPAT_SYSCALL_DEFINE5(execveat, int, fd, #ifdef CONFIG_SYSCTL -static void validate_coredump_safety(void) -{ -#ifdef CONFIG_COREDUMP - if (suid_dumpable == SUID_DUMP_ROOT && - core_pattern[0] != '/' && core_pattern[0] != '|') { - pr_warn( -"Unsafe core_pattern used with fs.suid_dumpable=2.\n" -"Pipe handler or fully qualified core dump path required.\n" -"Set kernel.core_pattern before fs.suid_dumpable.\n" - ); - } -#endif -} - static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { @@ -2138,50 +2124,9 @@ static struct ctl_table fs_exec_sysctls[] = { { } }; -#ifdef CONFIG_COREDUMP - -static int proc_dostring_coredump(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos) -{ - int error = proc_dostring(table, write, buffer, lenp, ppos); - - if (!error) - validate_coredump_safety(); - return error; -} - -static struct ctl_table kernel_exec_sysctls[] = { - { - .procname = "core_uses_pid", - .data = &core_uses_pid, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, - { - .procname = "core_pattern", - .data = core_pattern, - .maxlen = CORENAME_MAX_SIZE, - .mode = 0644, - .proc_handler = proc_dostring_coredump, - }, - { - .procname = "core_pipe_limit", - .data = &core_pipe_limit, - .maxlen = sizeof(unsigned int), - .mode = 0644, - .proc_handler = proc_dointvec, - }, - { } -}; -#endif - static int __init init_fs_exec_sysctls(void) { register_sysctl_init("fs", fs_exec_sysctls); -#ifdef CONFIG_COREDUMP - register_sysctl_init("kernel", kernel_exec_sysctls); -#endif return 0; } diff --git a/include/linux/coredump.h b/include/linux/coredump.h index 78fcd776b185a..248a68c668b45 100644 --- a/include/linux/coredump.h +++ b/include/linux/coredump.h @@ -14,10 +14,6 @@ struct core_vma_metadata { unsigned long dump_size; }; -extern int core_uses_pid; -extern char core_pattern[]; -extern unsigned int core_pipe_limit; - /* * These are the only things you should do on a core-file: use only these * functions to write out all the necessary info. @@ -37,4 +33,10 @@ extern void do_coredump(const kernel_siginfo_t *siginfo); static inline void do_coredump(const kernel_siginfo_t *siginfo) {} #endif +#if defined(CONFIG_COREDUMP) && defined(CONFIG_SYSCTL) +extern void validate_coredump_safety(void); +#else +static inline void validate_coredump_safety(void) {} +#endif + #endif /* _LINUX_COREDUMP_H */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 421d29a86c73e..a4c352f0a5143 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -61,12 +61,10 @@ #include #include #include -#include #include #include #include #include -#include #include #include #include From 08402082523e9bd5a412819128d1ed518702c3ef Mon Sep 17 00:00:00 2001 From: Xiaoming Ni Date: Fri, 3 Dec 2021 12:58:53 +1100 Subject: [PATCH 49/58] kprobe: move sysctl_kprobes_optimization to kprobes.c kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. Move sysctl_kprobes_optimization from kernel/sysctl.c to kernel/kprobes.c. Use register_sysctl() to register the sysctl interface. [mcgrof@kernel.org: fix compile issue when CONFIG_OPTPROBES is disabled] Link: https://lkml.kernel.org/r/20211129211943.640266-7-mcgrof@kernel.org Signed-off-by: Xiaoming Ni Signed-off-by: Luis Chamberlain Cc: Al Viro Cc: Anil S Keshavamurthy Cc: Antti Palosaari Cc: Christian Brauner Cc: "David S. Miller" Cc: Eric Biederman Cc: Eric Biggers Cc: Iurii Zaikin Cc: Kees Cook Cc: Lukas Middendorf Cc: Masami Hiramatsu Cc: Mauro Carvalho Chehab Cc: "Naveen N. Rao" Cc: Stephen Kitt Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- include/linux/kprobes.h | 6 ------ kernel/kprobes.c | 30 ++++++++++++++++++++++++++---- kernel/sysctl.c | 12 ------------ 3 files changed, 26 insertions(+), 22 deletions(-) diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 8c8f7a4d93afb..19b884353b159 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -348,12 +348,6 @@ extern void opt_pre_handler(struct kprobe *p, struct pt_regs *regs); DEFINE_INSN_CACHE_OPS(optinsn); -#ifdef CONFIG_SYSCTL -extern int sysctl_kprobes_optimization; -extern int proc_kprobes_optimization_handler(struct ctl_table *table, - int write, void *buffer, - size_t *length, loff_t *ppos); -#endif /* CONFIG_SYSCTL */ extern void wait_for_kprobe_optimizer(void); #else /* !CONFIG_OPTPROBES */ static inline void wait_for_kprobe_optimizer(void) { } diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 21eccc961bba3..94cab8c9ce56c 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -48,6 +48,9 @@ #define KPROBE_HASH_BITS 6 #define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS) +#if !defined(CONFIG_OPTPROBES) || !defined(CONFIG_SYSCTL) +#define kprobe_sysctls_init() do { } while (0) +#endif static int kprobes_initialized; /* kprobe_table can be accessed by @@ -938,10 +941,10 @@ static void unoptimize_all_kprobes(void) } static DEFINE_MUTEX(kprobe_sysctl_mutex); -int sysctl_kprobes_optimization; -int proc_kprobes_optimization_handler(struct ctl_table *table, int write, - void *buffer, size_t *length, - loff_t *ppos) +static int sysctl_kprobes_optimization; +static int proc_kprobes_optimization_handler(struct ctl_table *table, + int write, void *buffer, + size_t *length, loff_t *ppos) { int ret; @@ -957,6 +960,24 @@ int proc_kprobes_optimization_handler(struct ctl_table *table, int write, return ret; } + +static struct ctl_table kprobe_sysctls[] = { + { + .procname = "kprobes-optimization", + .data = &sysctl_kprobes_optimization, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_kprobes_optimization_handler, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + {} +}; + +static void __init kprobe_sysctls_init(void) +{ + register_sysctl_init("debug", kprobe_sysctls); +} #endif /* CONFIG_SYSCTL */ /* Put a breakpoint for a probe. */ @@ -2584,6 +2605,7 @@ static int __init init_kprobes(void) err = register_module_notifier(&kprobe_module_nb); kprobes_initialized = (err == 0); + kprobe_sysctls_init(); return err; } early_initcall(init_kprobes); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index a4c352f0a5143..7f07b058b1803 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -55,7 +55,6 @@ #include #include #include -#include #include #include #include @@ -2817,17 +2816,6 @@ static struct ctl_table debug_table[] = { .mode = 0644, .proc_handler = proc_dointvec }, -#endif -#if defined(CONFIG_OPTPROBES) - { - .procname = "kprobes-optimization", - .data = &sysctl_kprobes_optimization, - .maxlen = sizeof(int), - .mode = 0644, - .proc_handler = proc_kprobes_optimization_handler, - .extra1 = SYSCTL_ZERO, - .extra2 = SYSCTL_ONE, - }, #endif { } }; From a26f1abcc8edc3bb1cc58aa564c5954e9080cfab Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Fri, 3 Dec 2021 12:58:54 +1100 Subject: [PATCH 50/58] fs: proc: store PDE()->data into inode->i_private PDE_DATA(inode) is introduced to get user private data and hide the layout of struct proc_dir_entry. The inode->i_private is used to do the same thing as well. Save a copy of user private data to inode-> i_private when proc inode is allocated. This means the user also can get their private data by inode->i_private. Introduce pde_data() to wrap inode->i_private so that we can remove PDE_DATA() from fs/proc/generic.c and make PTE_DATE() as a wrapper of pde_data(). It will be easier if we decide to remove PDE_DATE() in the future. Link: https://lkml.kernel.org/r/20211124081956.87711-1-songmuchun@bytedance.com Signed-off-by: Muchun Song Acked-by: Christian Brauner Cc: Alexey Dobriyan Cc: Alexey Gladkov Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- fs/proc/generic.c | 6 ------ fs/proc/inode.c | 1 + fs/proc/internal.h | 5 ----- include/linux/proc_fs.h | 13 ++++++++++++- 4 files changed, 13 insertions(+), 12 deletions(-) diff --git a/fs/proc/generic.c b/fs/proc/generic.c index 5b78739e60e40..f2132407e1335 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -791,12 +791,6 @@ void proc_remove(struct proc_dir_entry *de) } EXPORT_SYMBOL(proc_remove); -void *PDE_DATA(const struct inode *inode) -{ - return __PDE_DATA(inode); -} -EXPORT_SYMBOL(PDE_DATA); - /* * Pull a user buffer into memory and pass it to the file's write handler if * one is supplied. The ->write() method is permitted to modify the diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 599eb724ff2d5..f84355c5a36d3 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -650,6 +650,7 @@ struct inode *proc_get_inode(struct super_block *sb, struct proc_dir_entry *de) return NULL; } + inode->i_private = de->data; inode->i_ino = de->low_ino; inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); PROC_I(inode)->pde = de; diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 03415f3fb3a81..06a80f78433d8 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -115,11 +115,6 @@ static inline struct proc_dir_entry *PDE(const struct inode *inode) return PROC_I(inode)->pde; } -static inline void *__PDE_DATA(const struct inode *inode) -{ - return PDE(inode)->data; -} - static inline struct pid *proc_pid(const struct inode *inode) { return PROC_I(inode)->pid; diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index 01b9268451a8e..b6e7005cc1b27 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -110,7 +110,18 @@ extern struct proc_dir_entry *proc_create_data(const char *, umode_t, struct proc_dir_entry *proc_create(const char *name, umode_t mode, struct proc_dir_entry *parent, const struct proc_ops *proc_ops); extern void proc_set_size(struct proc_dir_entry *, loff_t); extern void proc_set_user(struct proc_dir_entry *, kuid_t, kgid_t); -extern void *PDE_DATA(const struct inode *); + +/* + * Obtain the private data passed by user through proc_create_data() or + * related. + */ +static inline void *pde_data(const struct inode *inode) +{ + return inode->i_private; +} + +#define PDE_DATA(i) pde_data(i) + extern void *proc_get_parent_data(const struct inode *); extern void proc_remove(struct proc_dir_entry *); extern void remove_proc_entry(const char *, struct proc_dir_entry *); From d5b52a99b3aa4dfe30b6e19bc3f08ec10ba932f1 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Fri, 3 Dec 2021 12:58:54 +1100 Subject: [PATCH 51/58] proc: remove PDE_DATA() completely Remove PDE_DATA() completely and replace it with pde_data(). Link: https://lkml.kernel.org/r/20211124081956.87711-2-songmuchun@bytedance.com Signed-off-by: Muchun Song Acked-by: Christian Brauner Cc: Alexey Dobriyan Cc: Alexey Gladkov Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- arch/alpha/kernel/srm_env.c | 4 ++-- arch/arm/kernel/atags_proc.c | 2 +- arch/ia64/kernel/salinfo.c | 10 ++++---- arch/powerpc/kernel/proc_powerpc.c | 4 ++-- arch/sh/mm/alignment.c | 2 +- arch/xtensa/platforms/iss/simdisk.c | 4 ++-- drivers/acpi/proc.c | 2 +- drivers/hwmon/dell-smm-hwmon.c | 4 ++-- drivers/net/bonding/bond_procfs.c | 8 +++---- drivers/net/wireless/cisco/airo.c | 22 ++++++++--------- .../net/wireless/intersil/hostap/hostap_ap.c | 16 ++++++------- .../intersil/hostap/hostap_download.c | 2 +- .../wireless/intersil/hostap/hostap_proc.c | 24 +++++++++---------- drivers/net/wireless/ray_cs.c | 2 +- drivers/nubus/proc.c | 2 +- drivers/parisc/led.c | 4 ++-- drivers/pci/proc.c | 10 ++++---- drivers/platform/x86/thinkpad_acpi.c | 4 ++-- drivers/platform/x86/toshiba_acpi.c | 16 ++++++------- drivers/pnp/isapnp/proc.c | 2 +- drivers/pnp/pnpbios/proc.c | 4 ++-- drivers/scsi/scsi_proc.c | 4 ++-- drivers/usb/gadget/function/rndis.c | 4 ++-- drivers/zorro/proc.c | 2 +- fs/afs/proc.c | 6 ++--- fs/ext4/mballoc.c | 14 +++++------ fs/jbd2/journal.c | 2 +- fs/proc/proc_net.c | 8 +++---- include/linux/proc_fs.h | 4 +--- include/linux/seq_file.h | 2 +- ipc/util.c | 2 +- kernel/irq/proc.c | 8 +++---- kernel/resource.c | 4 ++-- net/atm/proc.c | 4 ++-- net/bluetooth/af_bluetooth.c | 8 +++---- net/can/bcm.c | 2 +- net/can/proc.c | 2 +- net/core/neighbour.c | 6 ++--- net/core/pktgen.c | 6 ++--- net/ipv4/netfilter/ipt_CLUSTERIP.c | 6 ++--- net/ipv4/raw.c | 8 +++---- net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/udp.c | 6 ++--- net/netfilter/x_tables.c | 10 ++++---- net/netfilter/xt_hashlimit.c | 18 +++++++------- net/netfilter/xt_recent.c | 4 ++-- net/sunrpc/auth_gss/svcauth_gss.c | 4 ++-- net/sunrpc/cache.c | 24 +++++++++---------- net/sunrpc/stats.c | 2 +- sound/core/info.c | 4 ++-- 50 files changed, 161 insertions(+), 163 deletions(-) diff --git a/arch/alpha/kernel/srm_env.c b/arch/alpha/kernel/srm_env.c index 528d2be581829..217b4dca51dd9 100644 --- a/arch/alpha/kernel/srm_env.c +++ b/arch/alpha/kernel/srm_env.c @@ -83,14 +83,14 @@ static int srm_env_proc_show(struct seq_file *m, void *v) static int srm_env_proc_open(struct inode *inode, struct file *file) { - return single_open(file, srm_env_proc_show, PDE_DATA(inode)); + return single_open(file, srm_env_proc_show, pde_data(inode)); } static ssize_t srm_env_proc_write(struct file *file, const char __user *buffer, size_t count, loff_t *pos) { int res; - unsigned long id = (unsigned long)PDE_DATA(file_inode(file)); + unsigned long id = (unsigned long)pde_data(file_inode(file)); char *buf = (char *) __get_free_page(GFP_USER); unsigned long ret1, ret2; diff --git a/arch/arm/kernel/atags_proc.c b/arch/arm/kernel/atags_proc.c index 3c2faf2bd124e..3ec2afe784232 100644 --- a/arch/arm/kernel/atags_proc.c +++ b/arch/arm/kernel/atags_proc.c @@ -13,7 +13,7 @@ struct buffer { static ssize_t atags_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { - struct buffer *b = PDE_DATA(file_inode(file)); + struct buffer *b = pde_data(file_inode(file)); return simple_read_from_buffer(buf, count, ppos, b->data, b->size); } diff --git a/arch/ia64/kernel/salinfo.c b/arch/ia64/kernel/salinfo.c index a25ab9b37953e..bd3ba276e69c3 100644 --- a/arch/ia64/kernel/salinfo.c +++ b/arch/ia64/kernel/salinfo.c @@ -282,7 +282,7 @@ salinfo_event_open(struct inode *inode, struct file *file) static ssize_t salinfo_event_read(struct file *file, char __user *buffer, size_t count, loff_t *ppos) { - struct salinfo_data *data = PDE_DATA(file_inode(file)); + struct salinfo_data *data = pde_data(file_inode(file)); char cmd[32]; size_t size; int i, n, cpu = -1; @@ -340,7 +340,7 @@ static const struct proc_ops salinfo_event_proc_ops = { static int salinfo_log_open(struct inode *inode, struct file *file) { - struct salinfo_data *data = PDE_DATA(inode); + struct salinfo_data *data = pde_data(inode); if (!capable(CAP_SYS_ADMIN)) return -EPERM; @@ -365,7 +365,7 @@ salinfo_log_open(struct inode *inode, struct file *file) static int salinfo_log_release(struct inode *inode, struct file *file) { - struct salinfo_data *data = PDE_DATA(inode); + struct salinfo_data *data = pde_data(inode); if (data->state == STATE_NO_DATA) { vfree(data->log_buffer); @@ -433,7 +433,7 @@ salinfo_log_new_read(int cpu, struct salinfo_data *data) static ssize_t salinfo_log_read(struct file *file, char __user *buffer, size_t count, loff_t *ppos) { - struct salinfo_data *data = PDE_DATA(file_inode(file)); + struct salinfo_data *data = pde_data(file_inode(file)); u8 *buf; u64 bufsize; @@ -494,7 +494,7 @@ salinfo_log_clear(struct salinfo_data *data, int cpu) static ssize_t salinfo_log_write(struct file *file, const char __user *buffer, size_t count, loff_t *ppos) { - struct salinfo_data *data = PDE_DATA(file_inode(file)); + struct salinfo_data *data = pde_data(file_inode(file)); char cmd[32]; size_t size; u32 offset; diff --git a/arch/powerpc/kernel/proc_powerpc.c b/arch/powerpc/kernel/proc_powerpc.c index 877817471e3c1..6a029f2378e18 100644 --- a/arch/powerpc/kernel/proc_powerpc.c +++ b/arch/powerpc/kernel/proc_powerpc.c @@ -25,7 +25,7 @@ static ssize_t page_map_read( struct file *file, char __user *buf, size_t nbytes loff_t *ppos) { return simple_read_from_buffer(buf, nbytes, ppos, - PDE_DATA(file_inode(file)), PAGE_SIZE); + pde_data(file_inode(file)), PAGE_SIZE); } static int page_map_mmap( struct file *file, struct vm_area_struct *vma ) @@ -34,7 +34,7 @@ static int page_map_mmap( struct file *file, struct vm_area_struct *vma ) return -EINVAL; remap_pfn_range(vma, vma->vm_start, - __pa(PDE_DATA(file_inode(file))) >> PAGE_SHIFT, + __pa(pde_data(file_inode(file))) >> PAGE_SHIFT, PAGE_SIZE, vma->vm_page_prot); return 0; } diff --git a/arch/sh/mm/alignment.c b/arch/sh/mm/alignment.c index d802801ceb939..3a76a766f4232 100644 --- a/arch/sh/mm/alignment.c +++ b/arch/sh/mm/alignment.c @@ -140,7 +140,7 @@ static int alignment_proc_open(struct inode *inode, struct file *file) static ssize_t alignment_proc_write(struct file *file, const char __user *buffer, size_t count, loff_t *pos) { - int *data = PDE_DATA(file_inode(file)); + int *data = pde_data(file_inode(file)); char mode; if (count > 0) { diff --git a/arch/xtensa/platforms/iss/simdisk.c b/arch/xtensa/platforms/iss/simdisk.c index 07b642c1916a8..8eb6ad1a3a1de 100644 --- a/arch/xtensa/platforms/iss/simdisk.c +++ b/arch/xtensa/platforms/iss/simdisk.c @@ -208,7 +208,7 @@ static int simdisk_detach(struct simdisk *dev) static ssize_t proc_read_simdisk(struct file *file, char __user *buf, size_t size, loff_t *ppos) { - struct simdisk *dev = PDE_DATA(file_inode(file)); + struct simdisk *dev = pde_data(file_inode(file)); const char *s = dev->filename; if (s) { ssize_t n = simple_read_from_buffer(buf, size, ppos, @@ -225,7 +225,7 @@ static ssize_t proc_write_simdisk(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { char *tmp = memdup_user_nul(buf, count); - struct simdisk *dev = PDE_DATA(file_inode(file)); + struct simdisk *dev = pde_data(file_inode(file)); int err; if (IS_ERR(tmp)) diff --git a/drivers/acpi/proc.c b/drivers/acpi/proc.c index 0cca7991f1864..4322f2da6d10d 100644 --- a/drivers/acpi/proc.c +++ b/drivers/acpi/proc.c @@ -127,7 +127,7 @@ static int acpi_system_wakeup_device_open_fs(struct inode *inode, struct file *file) { return single_open(file, acpi_system_wakeup_device_seq_show, - PDE_DATA(inode)); + pde_data(inode)); } static const struct proc_ops acpi_system_wakeup_device_proc_ops = { diff --git a/drivers/hwmon/dell-smm-hwmon.c b/drivers/hwmon/dell-smm-hwmon.c index 5596c211f38d9..4b2a73804db8e 100644 --- a/drivers/hwmon/dell-smm-hwmon.c +++ b/drivers/hwmon/dell-smm-hwmon.c @@ -550,7 +550,7 @@ i8k_ioctl_unlocked(struct file *fp, struct dell_smm_data *data, unsigned int cmd static long i8k_ioctl(struct file *fp, unsigned int cmd, unsigned long arg) { - struct dell_smm_data *data = PDE_DATA(file_inode(fp)); + struct dell_smm_data *data = pde_data(file_inode(fp)); long ret; mutex_lock(&data->i8k_mutex); @@ -607,7 +607,7 @@ static int i8k_proc_show(struct seq_file *seq, void *offset) static int i8k_open_fs(struct inode *inode, struct file *file) { - return single_open(file, i8k_proc_show, PDE_DATA(inode)); + return single_open(file, i8k_proc_show, pde_data(inode)); } static const struct proc_ops i8k_proc_ops = { diff --git a/drivers/net/bonding/bond_procfs.c b/drivers/net/bonding/bond_procfs.c index 2ec11af5f0cce..46b150e6289ef 100644 --- a/drivers/net/bonding/bond_procfs.c +++ b/drivers/net/bonding/bond_procfs.c @@ -11,7 +11,7 @@ static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos) __acquires(RCU) { - struct bonding *bond = PDE_DATA(file_inode(seq->file)); + struct bonding *bond = pde_data(file_inode(seq->file)); struct list_head *iter; struct slave *slave; loff_t off = 0; @@ -30,7 +30,7 @@ static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos) static void *bond_info_seq_next(struct seq_file *seq, void *v, loff_t *pos) { - struct bonding *bond = PDE_DATA(file_inode(seq->file)); + struct bonding *bond = pde_data(file_inode(seq->file)); struct list_head *iter; struct slave *slave; bool found = false; @@ -57,7 +57,7 @@ static void bond_info_seq_stop(struct seq_file *seq, void *v) static void bond_info_show_master(struct seq_file *seq) { - struct bonding *bond = PDE_DATA(file_inode(seq->file)); + struct bonding *bond = pde_data(file_inode(seq->file)); const struct bond_opt_value *optval; struct slave *curr, *primary; int i; @@ -175,7 +175,7 @@ static void bond_info_show_master(struct seq_file *seq) static void bond_info_show_slave(struct seq_file *seq, const struct slave *slave) { - struct bonding *bond = PDE_DATA(file_inode(seq->file)); + struct bonding *bond = pde_data(file_inode(seq->file)); seq_printf(seq, "\nSlave Interface: %s\n", slave->dev->name); seq_printf(seq, "MII Status: %s\n", bond_slave_link_status(slave->link)); diff --git a/drivers/net/wireless/cisco/airo.c b/drivers/net/wireless/cisco/airo.c index 45594f003ef7b..452d08545d31a 100644 --- a/drivers/net/wireless/cisco/airo.c +++ b/drivers/net/wireless/cisco/airo.c @@ -4672,7 +4672,7 @@ static ssize_t proc_write(struct file *file, static int proc_status_open(struct inode *inode, struct file *file) { struct proc_data *data; - struct net_device *dev = PDE_DATA(inode); + struct net_device *dev = pde_data(inode); struct airo_info *apriv = dev->ml_priv; CapabilityRid cap_rid; StatusRid status_rid; @@ -4756,7 +4756,7 @@ static int proc_stats_rid_open(struct inode *inode, u16 rid) { struct proc_data *data; - struct net_device *dev = PDE_DATA(inode); + struct net_device *dev = pde_data(inode); struct airo_info *apriv = dev->ml_priv; StatsRid stats; int i, j; @@ -4819,7 +4819,7 @@ static inline int sniffing_mode(struct airo_info *ai) static void proc_config_on_close(struct inode *inode, struct file *file) { struct proc_data *data = file->private_data; - struct net_device *dev = PDE_DATA(inode); + struct net_device *dev = pde_data(inode); struct airo_info *ai = dev->ml_priv; char *line; @@ -5030,7 +5030,7 @@ static const char *get_rmode(__le16 mode) static int proc_config_open(struct inode *inode, struct file *file) { struct proc_data *data; - struct net_device *dev = PDE_DATA(inode); + struct net_device *dev = pde_data(inode); struct airo_info *ai = dev->ml_priv; int i; __le16 mode; @@ -5120,7 +5120,7 @@ static int proc_config_open(struct inode *inode, struct file *file) static void proc_SSID_on_close(struct inode *inode, struct file *file) { struct proc_data *data = file->private_data; - struct net_device *dev = PDE_DATA(inode); + struct net_device *dev = pde_data(inode); struct airo_info *ai = dev->ml_priv; SsidRid SSID_rid; int i; @@ -5156,7 +5156,7 @@ static void proc_SSID_on_close(struct inode *inode, struct file *file) static void proc_APList_on_close(struct inode *inode, struct file *file) { struct proc_data *data = file->private_data; - struct net_device *dev = PDE_DATA(inode); + struct net_device *dev = pde_data(inode); struct airo_info *ai = dev->ml_priv; APListRid *APList_rid = &ai->APList; int i; @@ -5280,7 +5280,7 @@ static int set_wep_tx_idx(struct airo_info *ai, u16 index, int perm, int lock) static void proc_wepkey_on_close(struct inode *inode, struct file *file) { struct proc_data *data; - struct net_device *dev = PDE_DATA(inode); + struct net_device *dev = pde_data(inode); struct airo_info *ai = dev->ml_priv; int i, rc; char key[16]; @@ -5331,7 +5331,7 @@ static void proc_wepkey_on_close(struct inode *inode, struct file *file) static int proc_wepkey_open(struct inode *inode, struct file *file) { struct proc_data *data; - struct net_device *dev = PDE_DATA(inode); + struct net_device *dev = pde_data(inode); struct airo_info *ai = dev->ml_priv; char *ptr; WepKeyRid wkr; @@ -5379,7 +5379,7 @@ static int proc_wepkey_open(struct inode *inode, struct file *file) static int proc_SSID_open(struct inode *inode, struct file *file) { struct proc_data *data; - struct net_device *dev = PDE_DATA(inode); + struct net_device *dev = pde_data(inode); struct airo_info *ai = dev->ml_priv; int i; char *ptr; @@ -5423,7 +5423,7 @@ static int proc_SSID_open(struct inode *inode, struct file *file) static int proc_APList_open(struct inode *inode, struct file *file) { struct proc_data *data; - struct net_device *dev = PDE_DATA(inode); + struct net_device *dev = pde_data(inode); struct airo_info *ai = dev->ml_priv; int i; char *ptr; @@ -5462,7 +5462,7 @@ static int proc_APList_open(struct inode *inode, struct file *file) static int proc_BSSList_open(struct inode *inode, struct file *file) { struct proc_data *data; - struct net_device *dev = PDE_DATA(inode); + struct net_device *dev = pde_data(inode); struct airo_info *ai = dev->ml_priv; char *ptr; BSSListRid BSSList_rid; diff --git a/drivers/net/wireless/intersil/hostap/hostap_ap.c b/drivers/net/wireless/intersil/hostap/hostap_ap.c index 8bcc1cdcb75b0..462ccc7d7d1ac 100644 --- a/drivers/net/wireless/intersil/hostap/hostap_ap.c +++ b/drivers/net/wireless/intersil/hostap/hostap_ap.c @@ -69,7 +69,7 @@ static void prism2_send_mgmt(struct net_device *dev, #if !defined(PRISM2_NO_PROCFS_DEBUG) && defined(CONFIG_PROC_FS) static int ap_debug_proc_show(struct seq_file *m, void *v) { - struct ap_data *ap = PDE_DATA(file_inode(m->file)); + struct ap_data *ap = pde_data(file_inode(m->file)); seq_printf(m, "BridgedUnicastFrames=%u\n", ap->bridged_unicast); seq_printf(m, "BridgedMulticastFrames=%u\n", ap->bridged_multicast); @@ -320,7 +320,7 @@ void hostap_deauth_all_stas(struct net_device *dev, struct ap_data *ap, static int ap_control_proc_show(struct seq_file *m, void *v) { - struct ap_data *ap = PDE_DATA(file_inode(m->file)); + struct ap_data *ap = pde_data(file_inode(m->file)); char *policy_txt; struct mac_entry *entry; @@ -352,20 +352,20 @@ static int ap_control_proc_show(struct seq_file *m, void *v) static void *ap_control_proc_start(struct seq_file *m, loff_t *_pos) { - struct ap_data *ap = PDE_DATA(file_inode(m->file)); + struct ap_data *ap = pde_data(file_inode(m->file)); spin_lock_bh(&ap->mac_restrictions.lock); return seq_list_start_head(&ap->mac_restrictions.mac_list, *_pos); } static void *ap_control_proc_next(struct seq_file *m, void *v, loff_t *_pos) { - struct ap_data *ap = PDE_DATA(file_inode(m->file)); + struct ap_data *ap = pde_data(file_inode(m->file)); return seq_list_next(v, &ap->mac_restrictions.mac_list, _pos); } static void ap_control_proc_stop(struct seq_file *m, void *v) { - struct ap_data *ap = PDE_DATA(file_inode(m->file)); + struct ap_data *ap = pde_data(file_inode(m->file)); spin_unlock_bh(&ap->mac_restrictions.lock); } @@ -554,20 +554,20 @@ static int prism2_ap_proc_show(struct seq_file *m, void *v) static void *prism2_ap_proc_start(struct seq_file *m, loff_t *_pos) { - struct ap_data *ap = PDE_DATA(file_inode(m->file)); + struct ap_data *ap = pde_data(file_inode(m->file)); spin_lock_bh(&ap->sta_table_lock); return seq_list_start_head(&ap->sta_list, *_pos); } static void *prism2_ap_proc_next(struct seq_file *m, void *v, loff_t *_pos) { - struct ap_data *ap = PDE_DATA(file_inode(m->file)); + struct ap_data *ap = pde_data(file_inode(m->file)); return seq_list_next(v, &ap->sta_list, _pos); } static void prism2_ap_proc_stop(struct seq_file *m, void *v) { - struct ap_data *ap = PDE_DATA(file_inode(m->file)); + struct ap_data *ap = pde_data(file_inode(m->file)); spin_unlock_bh(&ap->sta_table_lock); } diff --git a/drivers/net/wireless/intersil/hostap/hostap_download.c b/drivers/net/wireless/intersil/hostap/hostap_download.c index 7c6a5a6d1d45d..3672291ced5c8 100644 --- a/drivers/net/wireless/intersil/hostap/hostap_download.c +++ b/drivers/net/wireless/intersil/hostap/hostap_download.c @@ -227,7 +227,7 @@ static int prism2_download_aux_dump_proc_open(struct inode *inode, struct file * sizeof(struct prism2_download_aux_dump)); if (ret == 0) { struct seq_file *m = file->private_data; - m->private = PDE_DATA(inode); + m->private = pde_data(inode); } return ret; } diff --git a/drivers/net/wireless/intersil/hostap/hostap_proc.c b/drivers/net/wireless/intersil/hostap/hostap_proc.c index 51c847d98755d..61f68786056f4 100644 --- a/drivers/net/wireless/intersil/hostap/hostap_proc.c +++ b/drivers/net/wireless/intersil/hostap/hostap_proc.c @@ -97,20 +97,20 @@ static int prism2_wds_proc_show(struct seq_file *m, void *v) static void *prism2_wds_proc_start(struct seq_file *m, loff_t *_pos) { - local_info_t *local = PDE_DATA(file_inode(m->file)); + local_info_t *local = pde_data(file_inode(m->file)); read_lock_bh(&local->iface_lock); return seq_list_start(&local->hostap_interfaces, *_pos); } static void *prism2_wds_proc_next(struct seq_file *m, void *v, loff_t *_pos) { - local_info_t *local = PDE_DATA(file_inode(m->file)); + local_info_t *local = pde_data(file_inode(m->file)); return seq_list_next(v, &local->hostap_interfaces, _pos); } static void prism2_wds_proc_stop(struct seq_file *m, void *v) { - local_info_t *local = PDE_DATA(file_inode(m->file)); + local_info_t *local = pde_data(file_inode(m->file)); read_unlock_bh(&local->iface_lock); } @@ -123,7 +123,7 @@ static const struct seq_operations prism2_wds_proc_seqops = { static int prism2_bss_list_proc_show(struct seq_file *m, void *v) { - local_info_t *local = PDE_DATA(file_inode(m->file)); + local_info_t *local = pde_data(file_inode(m->file)); struct list_head *ptr = v; struct hostap_bss_info *bss; @@ -151,21 +151,21 @@ static int prism2_bss_list_proc_show(struct seq_file *m, void *v) static void *prism2_bss_list_proc_start(struct seq_file *m, loff_t *_pos) __acquires(&local->lock) { - local_info_t *local = PDE_DATA(file_inode(m->file)); + local_info_t *local = pde_data(file_inode(m->file)); spin_lock_bh(&local->lock); return seq_list_start_head(&local->bss_list, *_pos); } static void *prism2_bss_list_proc_next(struct seq_file *m, void *v, loff_t *_pos) { - local_info_t *local = PDE_DATA(file_inode(m->file)); + local_info_t *local = pde_data(file_inode(m->file)); return seq_list_next(v, &local->bss_list, _pos); } static void prism2_bss_list_proc_stop(struct seq_file *m, void *v) __releases(&local->lock) { - local_info_t *local = PDE_DATA(file_inode(m->file)); + local_info_t *local = pde_data(file_inode(m->file)); spin_unlock_bh(&local->lock); } @@ -198,7 +198,7 @@ static int prism2_crypt_proc_show(struct seq_file *m, void *v) static ssize_t prism2_pda_proc_read(struct file *file, char __user *buf, size_t count, loff_t *_pos) { - local_info_t *local = PDE_DATA(file_inode(file)); + local_info_t *local = pde_data(file_inode(file)); size_t off; if (local->pda == NULL || *_pos >= PRISM2_PDA_SIZE) @@ -272,7 +272,7 @@ static int prism2_io_debug_proc_read(char *page, char **start, off_t off, #ifndef PRISM2_NO_STATION_MODES static int prism2_scan_results_proc_show(struct seq_file *m, void *v) { - local_info_t *local = PDE_DATA(file_inode(m->file)); + local_info_t *local = pde_data(file_inode(m->file)); unsigned long entry; int i, len; struct hfa384x_hostscan_result *scanres; @@ -322,7 +322,7 @@ static int prism2_scan_results_proc_show(struct seq_file *m, void *v) static void *prism2_scan_results_proc_start(struct seq_file *m, loff_t *_pos) { - local_info_t *local = PDE_DATA(file_inode(m->file)); + local_info_t *local = pde_data(file_inode(m->file)); spin_lock_bh(&local->lock); /* We have a header (pos 0) + N results to show (pos 1...N) */ @@ -333,7 +333,7 @@ static void *prism2_scan_results_proc_start(struct seq_file *m, loff_t *_pos) static void *prism2_scan_results_proc_next(struct seq_file *m, void *v, loff_t *_pos) { - local_info_t *local = PDE_DATA(file_inode(m->file)); + local_info_t *local = pde_data(file_inode(m->file)); ++*_pos; if (*_pos > local->last_scan_results_count) @@ -343,7 +343,7 @@ static void *prism2_scan_results_proc_next(struct seq_file *m, void *v, loff_t * static void prism2_scan_results_proc_stop(struct seq_file *m, void *v) { - local_info_t *local = PDE_DATA(file_inode(m->file)); + local_info_t *local = pde_data(file_inode(m->file)); spin_unlock_bh(&local->lock); } diff --git a/drivers/net/wireless/ray_cs.c b/drivers/net/wireless/ray_cs.c index e3a3dc3e45b43..2987ad9271f64 100644 --- a/drivers/net/wireless/ray_cs.c +++ b/drivers/net/wireless/ray_cs.c @@ -2746,7 +2746,7 @@ static ssize_t int_proc_write(struct file *file, const char __user *buffer, nr = nr * 10 + c; p++; } while (--len); - *(int *)PDE_DATA(file_inode(file)) = nr; + *(int *)pde_data(file_inode(file)) = nr; return count; } diff --git a/drivers/nubus/proc.c b/drivers/nubus/proc.c index 88e1f9a0faafd..22fb11da519bf 100644 --- a/drivers/nubus/proc.c +++ b/drivers/nubus/proc.c @@ -109,7 +109,7 @@ static int nubus_proc_rsrc_show(struct seq_file *m, void *v) struct inode *inode = m->private; struct nubus_proc_pde_data *pde_data; - pde_data = PDE_DATA(inode); + pde_data = pde_data(inode); if (!pde_data) return 0; diff --git a/drivers/parisc/led.c b/drivers/parisc/led.c index cf91cb024be30..1e4a5663d0112 100644 --- a/drivers/parisc/led.c +++ b/drivers/parisc/led.c @@ -168,14 +168,14 @@ static int led_proc_show(struct seq_file *m, void *v) static int led_proc_open(struct inode *inode, struct file *file) { - return single_open(file, led_proc_show, PDE_DATA(inode)); + return single_open(file, led_proc_show, pde_data(inode)); } static ssize_t led_proc_write(struct file *file, const char __user *buf, size_t count, loff_t *pos) { - void *data = PDE_DATA(file_inode(file)); + void *data = pde_data(file_inode(file)); char *cur, lbuf[32]; int d; diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c index cb18f8a13ab66..9c7edec64f7e2 100644 --- a/drivers/pci/proc.c +++ b/drivers/pci/proc.c @@ -21,14 +21,14 @@ static int proc_initialized; /* = 0 */ static loff_t proc_bus_pci_lseek(struct file *file, loff_t off, int whence) { - struct pci_dev *dev = PDE_DATA(file_inode(file)); + struct pci_dev *dev = pde_data(file_inode(file)); return fixed_size_llseek(file, off, whence, dev->cfg_size); } static ssize_t proc_bus_pci_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) { - struct pci_dev *dev = PDE_DATA(file_inode(file)); + struct pci_dev *dev = pde_data(file_inode(file)); unsigned int pos = *ppos; unsigned int cnt, size; @@ -114,7 +114,7 @@ static ssize_t proc_bus_pci_write(struct file *file, const char __user *buf, size_t nbytes, loff_t *ppos) { struct inode *ino = file_inode(file); - struct pci_dev *dev = PDE_DATA(ino); + struct pci_dev *dev = pde_data(ino); int pos = *ppos; int size = dev->cfg_size; int cnt, ret; @@ -196,7 +196,7 @@ struct pci_filp_private { static long proc_bus_pci_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { - struct pci_dev *dev = PDE_DATA(file_inode(file)); + struct pci_dev *dev = pde_data(file_inode(file)); #ifdef HAVE_PCI_MMAP struct pci_filp_private *fpriv = file->private_data; #endif /* HAVE_PCI_MMAP */ @@ -244,7 +244,7 @@ static long proc_bus_pci_ioctl(struct file *file, unsigned int cmd, #ifdef HAVE_PCI_MMAP static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma) { - struct pci_dev *dev = PDE_DATA(file_inode(file)); + struct pci_dev *dev = pde_data(file_inode(file)); struct pci_filp_private *fpriv = file->private_data; int i, ret, write_combine = 0, res_bit = IORESOURCE_MEM; diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c index 679bc86804deb..981ee51986364 100644 --- a/drivers/platform/x86/thinkpad_acpi.c +++ b/drivers/platform/x86/thinkpad_acpi.c @@ -880,14 +880,14 @@ static int dispatch_proc_show(struct seq_file *m, void *v) static int dispatch_proc_open(struct inode *inode, struct file *file) { - return single_open(file, dispatch_proc_show, PDE_DATA(inode)); + return single_open(file, dispatch_proc_show, pde_data(inode)); } static ssize_t dispatch_proc_write(struct file *file, const char __user *userbuf, size_t count, loff_t *pos) { - struct ibm_struct *ibm = PDE_DATA(file_inode(file)); + struct ibm_struct *ibm = pde_data(file_inode(file)); char *kernbuf; int ret; diff --git a/drivers/platform/x86/toshiba_acpi.c b/drivers/platform/x86/toshiba_acpi.c index 352508d304675..f113dec98e21d 100644 --- a/drivers/platform/x86/toshiba_acpi.c +++ b/drivers/platform/x86/toshiba_acpi.c @@ -1368,7 +1368,7 @@ static int lcd_proc_show(struct seq_file *m, void *v) static int lcd_proc_open(struct inode *inode, struct file *file) { - return single_open(file, lcd_proc_show, PDE_DATA(inode)); + return single_open(file, lcd_proc_show, pde_data(inode)); } static int set_lcd_brightness(struct toshiba_acpi_dev *dev, int value) @@ -1404,7 +1404,7 @@ static int set_lcd_status(struct backlight_device *bd) static ssize_t lcd_proc_write(struct file *file, const char __user *buf, size_t count, loff_t *pos) { - struct toshiba_acpi_dev *dev = PDE_DATA(file_inode(file)); + struct toshiba_acpi_dev *dev = pde_data(file_inode(file)); char cmd[42]; size_t len; int levels; @@ -1469,13 +1469,13 @@ static int video_proc_show(struct seq_file *m, void *v) static int video_proc_open(struct inode *inode, struct file *file) { - return single_open(file, video_proc_show, PDE_DATA(inode)); + return single_open(file, video_proc_show, pde_data(inode)); } static ssize_t video_proc_write(struct file *file, const char __user *buf, size_t count, loff_t *pos) { - struct toshiba_acpi_dev *dev = PDE_DATA(file_inode(file)); + struct toshiba_acpi_dev *dev = pde_data(file_inode(file)); char *buffer; char *cmd; int lcd_out = -1, crt_out = -1, tv_out = -1; @@ -1580,13 +1580,13 @@ static int fan_proc_show(struct seq_file *m, void *v) static int fan_proc_open(struct inode *inode, struct file *file) { - return single_open(file, fan_proc_show, PDE_DATA(inode)); + return single_open(file, fan_proc_show, pde_data(inode)); } static ssize_t fan_proc_write(struct file *file, const char __user *buf, size_t count, loff_t *pos) { - struct toshiba_acpi_dev *dev = PDE_DATA(file_inode(file)); + struct toshiba_acpi_dev *dev = pde_data(file_inode(file)); char cmd[42]; size_t len; int value; @@ -1628,13 +1628,13 @@ static int keys_proc_show(struct seq_file *m, void *v) static int keys_proc_open(struct inode *inode, struct file *file) { - return single_open(file, keys_proc_show, PDE_DATA(inode)); + return single_open(file, keys_proc_show, pde_data(inode)); } static ssize_t keys_proc_write(struct file *file, const char __user *buf, size_t count, loff_t *pos) { - struct toshiba_acpi_dev *dev = PDE_DATA(file_inode(file)); + struct toshiba_acpi_dev *dev = pde_data(file_inode(file)); char cmd[42]; size_t len; int value; diff --git a/drivers/pnp/isapnp/proc.c b/drivers/pnp/isapnp/proc.c index 1ae458c02656c..55ae72a2818bb 100644 --- a/drivers/pnp/isapnp/proc.c +++ b/drivers/pnp/isapnp/proc.c @@ -22,7 +22,7 @@ static loff_t isapnp_proc_bus_lseek(struct file *file, loff_t off, int whence) static ssize_t isapnp_proc_bus_read(struct file *file, char __user * buf, size_t nbytes, loff_t * ppos) { - struct pnp_dev *dev = PDE_DATA(file_inode(file)); + struct pnp_dev *dev = pde_data(file_inode(file)); int pos = *ppos; int cnt, size = 256; diff --git a/drivers/pnp/pnpbios/proc.c b/drivers/pnp/pnpbios/proc.c index a806830e3a407..0f0d819b157fd 100644 --- a/drivers/pnp/pnpbios/proc.c +++ b/drivers/pnp/pnpbios/proc.c @@ -173,13 +173,13 @@ static int pnpbios_proc_show(struct seq_file *m, void *v) static int pnpbios_proc_open(struct inode *inode, struct file *file) { - return single_open(file, pnpbios_proc_show, PDE_DATA(inode)); + return single_open(file, pnpbios_proc_show, pde_data(inode)); } static ssize_t pnpbios_proc_write(struct file *file, const char __user *buf, size_t count, loff_t *pos) { - void *data = PDE_DATA(file_inode(file)); + void *data = pde_data(file_inode(file)); struct pnp_bios_node *node; int boot = (long)data >> 8; u8 nodenum = (long)data; diff --git a/drivers/scsi/scsi_proc.c b/drivers/scsi/scsi_proc.c index d6982d3557396..95aee1ad13834 100644 --- a/drivers/scsi/scsi_proc.c +++ b/drivers/scsi/scsi_proc.c @@ -49,7 +49,7 @@ static DEFINE_MUTEX(global_host_template_mutex); static ssize_t proc_scsi_host_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { - struct Scsi_Host *shost = PDE_DATA(file_inode(file)); + struct Scsi_Host *shost = pde_data(file_inode(file)); ssize_t ret = -ENOMEM; char *page; @@ -79,7 +79,7 @@ static int proc_scsi_show(struct seq_file *m, void *v) static int proc_scsi_host_open(struct inode *inode, struct file *file) { - return single_open_size(file, proc_scsi_show, PDE_DATA(inode), + return single_open_size(file, proc_scsi_show, pde_data(inode), 4 * PAGE_SIZE); } diff --git a/drivers/usb/gadget/function/rndis.c b/drivers/usb/gadget/function/rndis.c index 64de9f1b874c5..431d5a7d737e2 100644 --- a/drivers/usb/gadget/function/rndis.c +++ b/drivers/usb/gadget/function/rndis.c @@ -1117,7 +1117,7 @@ static int rndis_proc_show(struct seq_file *m, void *v) static ssize_t rndis_proc_write(struct file *file, const char __user *buffer, size_t count, loff_t *ppos) { - rndis_params *p = PDE_DATA(file_inode(file)); + rndis_params *p = pde_data(file_inode(file)); u32 speed = 0; int i, fl_speed = 0; @@ -1161,7 +1161,7 @@ static ssize_t rndis_proc_write(struct file *file, const char __user *buffer, static int rndis_proc_open(struct inode *inode, struct file *file) { - return single_open(file, rndis_proc_show, PDE_DATA(inode)); + return single_open(file, rndis_proc_show, pde_data(inode)); } static const struct proc_ops rndis_proc_ops = { diff --git a/drivers/zorro/proc.c b/drivers/zorro/proc.c index 1c9ae08225d8d..f916bf60b312f 100644 --- a/drivers/zorro/proc.c +++ b/drivers/zorro/proc.c @@ -30,7 +30,7 @@ proc_bus_zorro_lseek(struct file *file, loff_t off, int whence) static ssize_t proc_bus_zorro_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) { - struct zorro_dev *z = PDE_DATA(file_inode(file)); + struct zorro_dev *z = pde_data(file_inode(file)); struct ConfigDev cd; loff_t pos = *ppos; diff --git a/fs/afs/proc.c b/fs/afs/proc.c index 065a28bfa3f18..e1b863449296c 100644 --- a/fs/afs/proc.c +++ b/fs/afs/proc.c @@ -227,7 +227,7 @@ static int afs_proc_cell_volumes_show(struct seq_file *m, void *v) static void *afs_proc_cell_volumes_start(struct seq_file *m, loff_t *_pos) __acquires(cell->proc_lock) { - struct afs_cell *cell = PDE_DATA(file_inode(m->file)); + struct afs_cell *cell = pde_data(file_inode(m->file)); rcu_read_lock(); return seq_hlist_start_head_rcu(&cell->proc_volumes, *_pos); @@ -236,7 +236,7 @@ static void *afs_proc_cell_volumes_start(struct seq_file *m, loff_t *_pos) static void *afs_proc_cell_volumes_next(struct seq_file *m, void *v, loff_t *_pos) { - struct afs_cell *cell = PDE_DATA(file_inode(m->file)); + struct afs_cell *cell = pde_data(file_inode(m->file)); return seq_hlist_next_rcu(v, &cell->proc_volumes, _pos); } @@ -322,7 +322,7 @@ static void *afs_proc_cell_vlservers_start(struct seq_file *m, loff_t *_pos) { struct afs_vl_seq_net_private *priv = m->private; struct afs_vlserver_list *vllist; - struct afs_cell *cell = PDE_DATA(file_inode(m->file)); + struct afs_cell *cell = pde_data(file_inode(m->file)); loff_t pos = *_pos; rcu_read_lock(); diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 215b7068f548a..a5d129cb932ad 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2834,7 +2834,7 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) static void *ext4_mb_seq_groups_start(struct seq_file *seq, loff_t *pos) { - struct super_block *sb = PDE_DATA(file_inode(seq->file)); + struct super_block *sb = pde_data(file_inode(seq->file)); ext4_group_t group; if (*pos < 0 || *pos >= ext4_get_groups_count(sb)) @@ -2845,7 +2845,7 @@ static void *ext4_mb_seq_groups_start(struct seq_file *seq, loff_t *pos) static void *ext4_mb_seq_groups_next(struct seq_file *seq, void *v, loff_t *pos) { - struct super_block *sb = PDE_DATA(file_inode(seq->file)); + struct super_block *sb = pde_data(file_inode(seq->file)); ext4_group_t group; ++*pos; @@ -2857,7 +2857,7 @@ static void *ext4_mb_seq_groups_next(struct seq_file *seq, void *v, loff_t *pos) static int ext4_mb_seq_groups_show(struct seq_file *seq, void *v) { - struct super_block *sb = PDE_DATA(file_inode(seq->file)); + struct super_block *sb = pde_data(file_inode(seq->file)); ext4_group_t group = (ext4_group_t) ((unsigned long) v); int i; int err, buddy_loaded = 0; @@ -2985,7 +2985,7 @@ int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset) static void *ext4_mb_seq_structs_summary_start(struct seq_file *seq, loff_t *pos) __acquires(&EXT4_SB(sb)->s_mb_rb_lock) { - struct super_block *sb = PDE_DATA(file_inode(seq->file)); + struct super_block *sb = pde_data(file_inode(seq->file)); unsigned long position; read_lock(&EXT4_SB(sb)->s_mb_rb_lock); @@ -2998,7 +2998,7 @@ __acquires(&EXT4_SB(sb)->s_mb_rb_lock) static void *ext4_mb_seq_structs_summary_next(struct seq_file *seq, void *v, loff_t *pos) { - struct super_block *sb = PDE_DATA(file_inode(seq->file)); + struct super_block *sb = pde_data(file_inode(seq->file)); unsigned long position; ++*pos; @@ -3010,7 +3010,7 @@ static void *ext4_mb_seq_structs_summary_next(struct seq_file *seq, void *v, lof static int ext4_mb_seq_structs_summary_show(struct seq_file *seq, void *v) { - struct super_block *sb = PDE_DATA(file_inode(seq->file)); + struct super_block *sb = pde_data(file_inode(seq->file)); struct ext4_sb_info *sbi = EXT4_SB(sb); unsigned long position = ((unsigned long) v); struct ext4_group_info *grp; @@ -3058,7 +3058,7 @@ static int ext4_mb_seq_structs_summary_show(struct seq_file *seq, void *v) static void ext4_mb_seq_structs_summary_stop(struct seq_file *seq, void *v) __releases(&EXT4_SB(sb)->s_mb_rb_lock) { - struct super_block *sb = PDE_DATA(file_inode(seq->file)); + struct super_block *sb = pde_data(file_inode(seq->file)); read_unlock(&EXT4_SB(sb)->s_mb_rb_lock); } diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 35302bc192eb9..2726db7a139ae 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1210,7 +1210,7 @@ static const struct seq_operations jbd2_seq_info_ops = { static int jbd2_seq_info_open(struct inode *inode, struct file *file) { - journal_t *journal = PDE_DATA(inode); + journal_t *journal = pde_data(inode); struct jbd2_stats_proc_session *s; int rc, size; diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c index 15c2e55d2ed2c..4df857acb3944 100644 --- a/fs/proc/proc_net.c +++ b/fs/proc/proc_net.c @@ -125,7 +125,7 @@ EXPORT_SYMBOL_GPL(proc_create_net_data); * @parent: The parent directory in which to create. * @ops: The seq_file ops with which to read the file. * @write: The write method with which to 'modify' the file. - * @data: Data for retrieval by PDE_DATA(). + * @data: Data for retrieval by pde_data(). * * Create a network namespaced proc file in the @parent directory with the * specified @name and @mode that allows reading of a file that displays a @@ -140,7 +140,7 @@ EXPORT_SYMBOL_GPL(proc_create_net_data); * modified by the @write function. @write should return 0 on success. * * The @data value is accessible from the @show and @write functions by calling - * PDE_DATA() on the file inode. The network namespace must be accessed by + * pde_data() on the file inode. The network namespace must be accessed by * calling seq_file_net() on the seq_file struct. */ struct proc_dir_entry *proc_create_net_data_write(const char *name, umode_t mode, @@ -217,7 +217,7 @@ EXPORT_SYMBOL_GPL(proc_create_net_single); * @parent: The parent directory in which to create. * @show: The seqfile show method with which to read the file. * @write: The write method with which to 'modify' the file. - * @data: Data for retrieval by PDE_DATA(). + * @data: Data for retrieval by pde_data(). * * Create a network-namespaced proc file in the @parent directory with the * specified @name and @mode that allows reading of a file that displays a @@ -232,7 +232,7 @@ EXPORT_SYMBOL_GPL(proc_create_net_single); * modified by the @write function. @write should return 0 on success. * * The @data value is accessible from the @show and @write functions by calling - * PDE_DATA() on the file inode. The network namespace must be accessed by + * pde_data() on the file inode. The network namespace must be accessed by * calling seq_file_single_net() on the seq_file struct. */ struct proc_dir_entry *proc_create_net_single_write(const char *name, umode_t mode, diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index b6e7005cc1b27..81d6e4ec2294b 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -120,8 +120,6 @@ static inline void *pde_data(const struct inode *inode) return inode->i_private; } -#define PDE_DATA(i) pde_data(i) - extern void *proc_get_parent_data(const struct inode *); extern void proc_remove(struct proc_dir_entry *); extern void remove_proc_entry(const char *, struct proc_dir_entry *); @@ -202,7 +200,7 @@ proc_create_data(const char *name, umode_t mode, struct proc_dir_entry *parent, static inline void proc_set_size(struct proc_dir_entry *de, loff_t size) {} static inline void proc_set_user(struct proc_dir_entry *de, kuid_t uid, kgid_t gid) {} -static inline void *PDE_DATA(const struct inode *inode) {BUG(); return NULL;} +static inline void *pde_data(const struct inode *inode) {BUG(); return NULL;} static inline void *proc_get_parent_data(const struct inode *inode) { BUG(); return NULL; } static inline void proc_remove(struct proc_dir_entry *de) {} diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h index 72dbb44a4573b..88cc16444b431 100644 --- a/include/linux/seq_file.h +++ b/include/linux/seq_file.h @@ -209,7 +209,7 @@ static const struct file_operations __name ## _fops = { \ #define DEFINE_PROC_SHOW_ATTRIBUTE(__name) \ static int __name ## _open(struct inode *inode, struct file *file) \ { \ - return single_open(file, __name ## _show, PDE_DATA(inode)); \ + return single_open(file, __name ## _show, pde_data(inode)); \ } \ \ static const struct proc_ops __name ## _proc_ops = { \ diff --git a/ipc/util.c b/ipc/util.c index fa2d86ef3fb80..a2208d0f26b2d 100644 --- a/ipc/util.c +++ b/ipc/util.c @@ -894,7 +894,7 @@ static int sysvipc_proc_open(struct inode *inode, struct file *file) if (!iter) return -ENOMEM; - iter->iface = PDE_DATA(inode); + iter->iface = pde_data(inode); iter->ns = get_ipc_ns(current->nsproxy->ipc_ns); iter->pid_ns = get_pid_ns(task_active_pid_ns(current)); diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c index ee595ec09778d..623b8136e9af3 100644 --- a/kernel/irq/proc.c +++ b/kernel/irq/proc.c @@ -137,7 +137,7 @@ static inline int irq_select_affinity_usr(unsigned int irq) static ssize_t write_irq_affinity(int type, struct file *file, const char __user *buffer, size_t count, loff_t *pos) { - unsigned int irq = (int)(long)PDE_DATA(file_inode(file)); + unsigned int irq = (int)(long)pde_data(file_inode(file)); cpumask_var_t new_value; int err; @@ -190,12 +190,12 @@ static ssize_t irq_affinity_list_proc_write(struct file *file, static int irq_affinity_proc_open(struct inode *inode, struct file *file) { - return single_open(file, irq_affinity_proc_show, PDE_DATA(inode)); + return single_open(file, irq_affinity_proc_show, pde_data(inode)); } static int irq_affinity_list_proc_open(struct inode *inode, struct file *file) { - return single_open(file, irq_affinity_list_proc_show, PDE_DATA(inode)); + return single_open(file, irq_affinity_list_proc_show, pde_data(inode)); } static const struct proc_ops irq_affinity_proc_ops = { @@ -265,7 +265,7 @@ static ssize_t default_affinity_write(struct file *file, static int default_affinity_open(struct inode *inode, struct file *file) { - return single_open(file, default_affinity_show, PDE_DATA(inode)); + return single_open(file, default_affinity_show, pde_data(inode)); } static const struct proc_ops default_affinity_proc_ops = { diff --git a/kernel/resource.c b/kernel/resource.c index 5ad3eba619ba7..9c08d6e9eef27 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -99,7 +99,7 @@ enum { MAX_IORES_LEVEL = 5 }; static void *r_start(struct seq_file *m, loff_t *pos) __acquires(resource_lock) { - struct resource *p = PDE_DATA(file_inode(m->file)); + struct resource *p = pde_data(file_inode(m->file)); loff_t l = 0; read_lock(&resource_lock); for (p = p->child; p && l < *pos; p = r_next(m, p, &l)) @@ -115,7 +115,7 @@ static void r_stop(struct seq_file *m, void *v) static int r_show(struct seq_file *m, void *v) { - struct resource *root = PDE_DATA(file_inode(m->file)); + struct resource *root = pde_data(file_inode(m->file)); struct resource *r = v, *p; unsigned long long start, end; int width = root->end < 0x10000 ? 4 : 8; diff --git a/net/atm/proc.c b/net/atm/proc.c index 4369ffa3302a2..9bf736290e48c 100644 --- a/net/atm/proc.c +++ b/net/atm/proc.c @@ -108,7 +108,7 @@ static int __vcc_walk(struct sock **sock, int family, int *bucket, loff_t l) static inline void *vcc_walk(struct seq_file *seq, loff_t l) { struct vcc_state *state = seq->private; - int family = (uintptr_t)(PDE_DATA(file_inode(seq->file))); + int family = (uintptr_t)(pde_data(file_inode(seq->file))); return __vcc_walk(&state->sk, family, &state->bucket, l) ? state : NULL; @@ -324,7 +324,7 @@ static ssize_t proc_dev_atm_read(struct file *file, char __user *buf, page = get_zeroed_page(GFP_KERNEL); if (!page) return -ENOMEM; - dev = PDE_DATA(file_inode(file)); + dev = pde_data(file_inode(file)); if (!dev->ops->proc_read) length = -EINVAL; else { diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c index 1661979b6a6e8..ee319779781e6 100644 --- a/net/bluetooth/af_bluetooth.c +++ b/net/bluetooth/af_bluetooth.c @@ -611,7 +611,7 @@ EXPORT_SYMBOL(bt_sock_wait_ready); static void *bt_seq_start(struct seq_file *seq, loff_t *pos) __acquires(seq->private->l->lock) { - struct bt_sock_list *l = PDE_DATA(file_inode(seq->file)); + struct bt_sock_list *l = pde_data(file_inode(seq->file)); read_lock(&l->lock); return seq_hlist_start_head(&l->head, *pos); @@ -619,7 +619,7 @@ static void *bt_seq_start(struct seq_file *seq, loff_t *pos) static void *bt_seq_next(struct seq_file *seq, void *v, loff_t *pos) { - struct bt_sock_list *l = PDE_DATA(file_inode(seq->file)); + struct bt_sock_list *l = pde_data(file_inode(seq->file)); return seq_hlist_next(v, &l->head, pos); } @@ -627,14 +627,14 @@ static void *bt_seq_next(struct seq_file *seq, void *v, loff_t *pos) static void bt_seq_stop(struct seq_file *seq, void *v) __releases(seq->private->l->lock) { - struct bt_sock_list *l = PDE_DATA(file_inode(seq->file)); + struct bt_sock_list *l = pde_data(file_inode(seq->file)); read_unlock(&l->lock); } static int bt_seq_show(struct seq_file *seq, void *v) { - struct bt_sock_list *l = PDE_DATA(file_inode(seq->file)); + struct bt_sock_list *l = pde_data(file_inode(seq->file)); if (v == SEQ_START_TOKEN) { seq_puts(seq, "sk RefCnt Rmem Wmem User Inode Parent"); diff --git a/net/can/bcm.c b/net/can/bcm.c index bc88d901a1c0c..95d209b52e6a6 100644 --- a/net/can/bcm.c +++ b/net/can/bcm.c @@ -193,7 +193,7 @@ static int bcm_proc_show(struct seq_file *m, void *v) { char ifname[IFNAMSIZ]; struct net *net = m->private; - struct sock *sk = (struct sock *)PDE_DATA(m->file->f_inode); + struct sock *sk = (struct sock *)pde_data(m->file->f_inode); struct bcm_sock *bo = bcm_sk(sk); struct bcm_op *op; diff --git a/net/can/proc.c b/net/can/proc.c index b3099f0a3cb82..bbce97825f13f 100644 --- a/net/can/proc.c +++ b/net/can/proc.c @@ -305,7 +305,7 @@ static inline void can_rcvlist_proc_show_one(struct seq_file *m, int idx, static int can_rcvlist_proc_show(struct seq_file *m, void *v) { /* double cast to prevent GCC warning */ - int idx = (int)(long)PDE_DATA(m->file->f_inode); + int idx = (int)(long)pde_data(m->file->f_inode); struct net_device *dev; struct can_dev_rcv_lists *dev_rcv_lists; struct net *net = m->private; diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 72ba027c34cfe..0cdd4d9ad9424 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -3365,7 +3365,7 @@ EXPORT_SYMBOL(neigh_seq_stop); static void *neigh_stat_seq_start(struct seq_file *seq, loff_t *pos) { - struct neigh_table *tbl = PDE_DATA(file_inode(seq->file)); + struct neigh_table *tbl = pde_data(file_inode(seq->file)); int cpu; if (*pos == 0) @@ -3382,7 +3382,7 @@ static void *neigh_stat_seq_start(struct seq_file *seq, loff_t *pos) static void *neigh_stat_seq_next(struct seq_file *seq, void *v, loff_t *pos) { - struct neigh_table *tbl = PDE_DATA(file_inode(seq->file)); + struct neigh_table *tbl = pde_data(file_inode(seq->file)); int cpu; for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) { @@ -3402,7 +3402,7 @@ static void neigh_stat_seq_stop(struct seq_file *seq, void *v) static int neigh_stat_seq_show(struct seq_file *seq, void *v) { - struct neigh_table *tbl = PDE_DATA(file_inode(seq->file)); + struct neigh_table *tbl = pde_data(file_inode(seq->file)); struct neigh_statistics *st = v; if (v == SEQ_START_TOKEN) { diff --git a/net/core/pktgen.c b/net/core/pktgen.c index a3d74e2704c42..68eb848fcef3e 100644 --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -545,7 +545,7 @@ static ssize_t pgctrl_write(struct file *file, const char __user *buf, static int pgctrl_open(struct inode *inode, struct file *file) { - return single_open(file, pgctrl_show, PDE_DATA(inode)); + return single_open(file, pgctrl_show, pde_data(inode)); } static const struct proc_ops pktgen_proc_ops = { @@ -1810,7 +1810,7 @@ static ssize_t pktgen_if_write(struct file *file, static int pktgen_if_open(struct inode *inode, struct file *file) { - return single_open(file, pktgen_if_show, PDE_DATA(inode)); + return single_open(file, pktgen_if_show, pde_data(inode)); } static const struct proc_ops pktgen_if_proc_ops = { @@ -1947,7 +1947,7 @@ static ssize_t pktgen_thread_write(struct file *file, static int pktgen_thread_open(struct inode *inode, struct file *file) { - return single_open(file, pktgen_thread_show, PDE_DATA(inode)); + return single_open(file, pktgen_thread_show, pde_data(inode)); } static const struct proc_ops pktgen_thread_proc_ops = { diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c index 8fd1aba8af31c..96deecc729d1b 100644 --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c @@ -773,7 +773,7 @@ static int clusterip_proc_open(struct inode *inode, struct file *file) if (!ret) { struct seq_file *sf = file->private_data; - struct clusterip_config *c = PDE_DATA(inode); + struct clusterip_config *c = pde_data(inode); sf->private = c; @@ -785,7 +785,7 @@ static int clusterip_proc_open(struct inode *inode, struct file *file) static int clusterip_proc_release(struct inode *inode, struct file *file) { - struct clusterip_config *c = PDE_DATA(inode); + struct clusterip_config *c = pde_data(inode); int ret; ret = seq_release(inode, file); @@ -799,7 +799,7 @@ static int clusterip_proc_release(struct inode *inode, struct file *file) static ssize_t clusterip_proc_write(struct file *file, const char __user *input, size_t size, loff_t *ofs) { - struct clusterip_config *c = PDE_DATA(file_inode(file)); + struct clusterip_config *c = pde_data(file_inode(file)); #define PROC_WRITELEN 10 char buffer[PROC_WRITELEN+1]; unsigned long nodenum; diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index a53f256bf9d39..9eb5fc2478687 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -971,7 +971,7 @@ struct proto raw_prot = { static struct sock *raw_get_first(struct seq_file *seq) { struct sock *sk; - struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file)); + struct raw_hashinfo *h = pde_data(file_inode(seq->file)); struct raw_iter_state *state = raw_seq_private(seq); for (state->bucket = 0; state->bucket < RAW_HTABLE_SIZE; @@ -987,7 +987,7 @@ static struct sock *raw_get_first(struct seq_file *seq) static struct sock *raw_get_next(struct seq_file *seq, struct sock *sk) { - struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file)); + struct raw_hashinfo *h = pde_data(file_inode(seq->file)); struct raw_iter_state *state = raw_seq_private(seq); do { @@ -1016,7 +1016,7 @@ static struct sock *raw_get_idx(struct seq_file *seq, loff_t pos) void *raw_seq_start(struct seq_file *seq, loff_t *pos) __acquires(&h->lock) { - struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file)); + struct raw_hashinfo *h = pde_data(file_inode(seq->file)); read_lock(&h->lock); return *pos ? raw_get_idx(seq, *pos - 1) : SEQ_START_TOKEN; @@ -1039,7 +1039,7 @@ EXPORT_SYMBOL_GPL(raw_seq_next); void raw_seq_stop(struct seq_file *seq, void *v) __releases(&h->lock) { - struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file)); + struct raw_hashinfo *h = pde_data(file_inode(seq->file)); read_unlock(&h->lock); } diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 3dd19a2bf06c4..20eb6ad73dcd7 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2991,7 +2991,7 @@ static unsigned short seq_file_family(const struct seq_file *seq) #endif /* Iterated from proc fs */ - afinfo = PDE_DATA(file_inode(seq->file)); + afinfo = pde_data(file_inode(seq->file)); return afinfo->family; } diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 799b663daf878..6080fa10bb7c2 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2952,7 +2952,7 @@ static struct sock *udp_get_first(struct seq_file *seq, int start) if (state->bpf_seq_afinfo) afinfo = state->bpf_seq_afinfo; else - afinfo = PDE_DATA(file_inode(seq->file)); + afinfo = pde_data(file_inode(seq->file)); for (state->bucket = start; state->bucket <= afinfo->udp_table->mask; ++state->bucket) { @@ -2985,7 +2985,7 @@ static struct sock *udp_get_next(struct seq_file *seq, struct sock *sk) if (state->bpf_seq_afinfo) afinfo = state->bpf_seq_afinfo; else - afinfo = PDE_DATA(file_inode(seq->file)); + afinfo = pde_data(file_inode(seq->file)); do { sk = sk_next(sk); @@ -3042,7 +3042,7 @@ void udp_seq_stop(struct seq_file *seq, void *v) if (state->bpf_seq_afinfo) afinfo = state->bpf_seq_afinfo; else - afinfo = PDE_DATA(file_inode(seq->file)); + afinfo = pde_data(file_inode(seq->file)); if (state->bucket <= afinfo->udp_table->mask) spin_unlock_bh(&afinfo->udp_table->hash[state->bucket].lock); diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 25524e3933496..54a489f16b171 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -1517,7 +1517,7 @@ EXPORT_SYMBOL_GPL(xt_unregister_table); #ifdef CONFIG_PROC_FS static void *xt_table_seq_start(struct seq_file *seq, loff_t *pos) { - u8 af = (unsigned long)PDE_DATA(file_inode(seq->file)); + u8 af = (unsigned long)pde_data(file_inode(seq->file)); struct net *net = seq_file_net(seq); struct xt_pernet *xt_net; @@ -1529,7 +1529,7 @@ static void *xt_table_seq_start(struct seq_file *seq, loff_t *pos) static void *xt_table_seq_next(struct seq_file *seq, void *v, loff_t *pos) { - u8 af = (unsigned long)PDE_DATA(file_inode(seq->file)); + u8 af = (unsigned long)pde_data(file_inode(seq->file)); struct net *net = seq_file_net(seq); struct xt_pernet *xt_net; @@ -1540,7 +1540,7 @@ static void *xt_table_seq_next(struct seq_file *seq, void *v, loff_t *pos) static void xt_table_seq_stop(struct seq_file *seq, void *v) { - u_int8_t af = (unsigned long)PDE_DATA(file_inode(seq->file)); + u_int8_t af = (unsigned long)pde_data(file_inode(seq->file)); mutex_unlock(&xt[af].mutex); } @@ -1584,7 +1584,7 @@ static void *xt_mttg_seq_next(struct seq_file *seq, void *v, loff_t *ppos, [MTTG_TRAV_NFP_UNSPEC] = MTTG_TRAV_NFP_SPEC, [MTTG_TRAV_NFP_SPEC] = MTTG_TRAV_DONE, }; - uint8_t nfproto = (unsigned long)PDE_DATA(file_inode(seq->file)); + uint8_t nfproto = (unsigned long)pde_data(file_inode(seq->file)); struct nf_mttg_trav *trav = seq->private; if (ppos != NULL) @@ -1633,7 +1633,7 @@ static void *xt_mttg_seq_start(struct seq_file *seq, loff_t *pos, static void xt_mttg_seq_stop(struct seq_file *seq, void *v) { - uint8_t nfproto = (unsigned long)PDE_DATA(file_inode(seq->file)); + uint8_t nfproto = (unsigned long)pde_data(file_inode(seq->file)); struct nf_mttg_trav *trav = seq->private; switch (trav->class) { diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c index 9c5cfd74a0ee4..0859b8f767645 100644 --- a/net/netfilter/xt_hashlimit.c +++ b/net/netfilter/xt_hashlimit.c @@ -1052,7 +1052,7 @@ static struct xt_match hashlimit_mt_reg[] __read_mostly = { static void *dl_seq_start(struct seq_file *s, loff_t *pos) __acquires(htable->lock) { - struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file)); + struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); unsigned int *bucket; spin_lock_bh(&htable->lock); @@ -1069,7 +1069,7 @@ static void *dl_seq_start(struct seq_file *s, loff_t *pos) static void *dl_seq_next(struct seq_file *s, void *v, loff_t *pos) { - struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file)); + struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); unsigned int *bucket = v; *pos = ++(*bucket); @@ -1083,7 +1083,7 @@ static void *dl_seq_next(struct seq_file *s, void *v, loff_t *pos) static void dl_seq_stop(struct seq_file *s, void *v) __releases(htable->lock) { - struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file)); + struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); unsigned int *bucket = v; if (!IS_ERR(bucket)) @@ -1125,7 +1125,7 @@ static void dl_seq_print(struct dsthash_ent *ent, u_int8_t family, static int dl_seq_real_show_v2(struct dsthash_ent *ent, u_int8_t family, struct seq_file *s) { - struct xt_hashlimit_htable *ht = PDE_DATA(file_inode(s->file)); + struct xt_hashlimit_htable *ht = pde_data(file_inode(s->file)); spin_lock(&ent->lock); /* recalculate to show accurate numbers */ @@ -1140,7 +1140,7 @@ static int dl_seq_real_show_v2(struct dsthash_ent *ent, u_int8_t family, static int dl_seq_real_show_v1(struct dsthash_ent *ent, u_int8_t family, struct seq_file *s) { - struct xt_hashlimit_htable *ht = PDE_DATA(file_inode(s->file)); + struct xt_hashlimit_htable *ht = pde_data(file_inode(s->file)); spin_lock(&ent->lock); /* recalculate to show accurate numbers */ @@ -1155,7 +1155,7 @@ static int dl_seq_real_show_v1(struct dsthash_ent *ent, u_int8_t family, static int dl_seq_real_show(struct dsthash_ent *ent, u_int8_t family, struct seq_file *s) { - struct xt_hashlimit_htable *ht = PDE_DATA(file_inode(s->file)); + struct xt_hashlimit_htable *ht = pde_data(file_inode(s->file)); spin_lock(&ent->lock); /* recalculate to show accurate numbers */ @@ -1169,7 +1169,7 @@ static int dl_seq_real_show(struct dsthash_ent *ent, u_int8_t family, static int dl_seq_show_v2(struct seq_file *s, void *v) { - struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file)); + struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); unsigned int *bucket = (unsigned int *)v; struct dsthash_ent *ent; @@ -1183,7 +1183,7 @@ static int dl_seq_show_v2(struct seq_file *s, void *v) static int dl_seq_show_v1(struct seq_file *s, void *v) { - struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file)); + struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); unsigned int *bucket = v; struct dsthash_ent *ent; @@ -1197,7 +1197,7 @@ static int dl_seq_show_v1(struct seq_file *s, void *v) static int dl_seq_show(struct seq_file *s, void *v) { - struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file)); + struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); unsigned int *bucket = v; struct dsthash_ent *ent; diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c index 0446307516cdf..7ddb9a78e3fc8 100644 --- a/net/netfilter/xt_recent.c +++ b/net/netfilter/xt_recent.c @@ -551,7 +551,7 @@ static int recent_seq_open(struct inode *inode, struct file *file) if (st == NULL) return -ENOMEM; - st->table = PDE_DATA(inode); + st->table = pde_data(inode); return 0; } @@ -559,7 +559,7 @@ static ssize_t recent_mt_proc_write(struct file *file, const char __user *input, size_t size, loff_t *loff) { - struct recent_table *t = PDE_DATA(file_inode(file)); + struct recent_table *t = pde_data(file_inode(file)); struct recent_entry *e; char buf[sizeof("+b335:1d35:1e55:dead:c0de:1715:5afe:c0de")]; const char *c = buf; diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index b87565b64928d..c2ba9d4cd2c77 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -1433,7 +1433,7 @@ static bool use_gss_proxy(struct net *net) static ssize_t write_gssp(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { - struct net *net = PDE_DATA(file_inode(file)); + struct net *net = pde_data(file_inode(file)); char tbuf[20]; unsigned long i; int res; @@ -1461,7 +1461,7 @@ static ssize_t write_gssp(struct file *file, const char __user *buf, static ssize_t read_gssp(struct file *file, char __user *buf, size_t count, loff_t *ppos) { - struct net *net = PDE_DATA(file_inode(file)); + struct net *net = pde_data(file_inode(file)); struct sunrpc_net *sn = net_generic(net, sunrpc_net_id); unsigned long p = *ppos; char tbuf[10]; diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c index 59641803472c6..bb1177395b99d 100644 --- a/net/sunrpc/cache.c +++ b/net/sunrpc/cache.c @@ -1536,7 +1536,7 @@ static ssize_t write_flush(struct file *file, const char __user *buf, static ssize_t cache_read_procfs(struct file *filp, char __user *buf, size_t count, loff_t *ppos) { - struct cache_detail *cd = PDE_DATA(file_inode(filp)); + struct cache_detail *cd = pde_data(file_inode(filp)); return cache_read(filp, buf, count, ppos, cd); } @@ -1544,14 +1544,14 @@ static ssize_t cache_read_procfs(struct file *filp, char __user *buf, static ssize_t cache_write_procfs(struct file *filp, const char __user *buf, size_t count, loff_t *ppos) { - struct cache_detail *cd = PDE_DATA(file_inode(filp)); + struct cache_detail *cd = pde_data(file_inode(filp)); return cache_write(filp, buf, count, ppos, cd); } static __poll_t cache_poll_procfs(struct file *filp, poll_table *wait) { - struct cache_detail *cd = PDE_DATA(file_inode(filp)); + struct cache_detail *cd = pde_data(file_inode(filp)); return cache_poll(filp, wait, cd); } @@ -1560,21 +1560,21 @@ static long cache_ioctl_procfs(struct file *filp, unsigned int cmd, unsigned long arg) { struct inode *inode = file_inode(filp); - struct cache_detail *cd = PDE_DATA(inode); + struct cache_detail *cd = pde_data(inode); return cache_ioctl(inode, filp, cmd, arg, cd); } static int cache_open_procfs(struct inode *inode, struct file *filp) { - struct cache_detail *cd = PDE_DATA(inode); + struct cache_detail *cd = pde_data(inode); return cache_open(inode, filp, cd); } static int cache_release_procfs(struct inode *inode, struct file *filp) { - struct cache_detail *cd = PDE_DATA(inode); + struct cache_detail *cd = pde_data(inode); return cache_release(inode, filp, cd); } @@ -1591,14 +1591,14 @@ static const struct proc_ops cache_channel_proc_ops = { static int content_open_procfs(struct inode *inode, struct file *filp) { - struct cache_detail *cd = PDE_DATA(inode); + struct cache_detail *cd = pde_data(inode); return content_open(inode, filp, cd); } static int content_release_procfs(struct inode *inode, struct file *filp) { - struct cache_detail *cd = PDE_DATA(inode); + struct cache_detail *cd = pde_data(inode); return content_release(inode, filp, cd); } @@ -1612,14 +1612,14 @@ static const struct proc_ops content_proc_ops = { static int open_flush_procfs(struct inode *inode, struct file *filp) { - struct cache_detail *cd = PDE_DATA(inode); + struct cache_detail *cd = pde_data(inode); return open_flush(inode, filp, cd); } static int release_flush_procfs(struct inode *inode, struct file *filp) { - struct cache_detail *cd = PDE_DATA(inode); + struct cache_detail *cd = pde_data(inode); return release_flush(inode, filp, cd); } @@ -1627,7 +1627,7 @@ static int release_flush_procfs(struct inode *inode, struct file *filp) static ssize_t read_flush_procfs(struct file *filp, char __user *buf, size_t count, loff_t *ppos) { - struct cache_detail *cd = PDE_DATA(file_inode(filp)); + struct cache_detail *cd = pde_data(file_inode(filp)); return read_flush(filp, buf, count, ppos, cd); } @@ -1636,7 +1636,7 @@ static ssize_t write_flush_procfs(struct file *filp, const char __user *buf, size_t count, loff_t *ppos) { - struct cache_detail *cd = PDE_DATA(file_inode(filp)); + struct cache_detail *cd = pde_data(file_inode(filp)); return write_flush(filp, buf, count, ppos, cd); } diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c index c964b48eaabae..52908f9e6eab5 100644 --- a/net/sunrpc/stats.c +++ b/net/sunrpc/stats.c @@ -66,7 +66,7 @@ static int rpc_proc_show(struct seq_file *seq, void *v) { static int rpc_proc_open(struct inode *inode, struct file *file) { - return single_open(file, rpc_proc_show, PDE_DATA(inode)); + return single_open(file, rpc_proc_show, pde_data(inode)); } static const struct proc_ops rpc_proc_ops = { diff --git a/sound/core/info.c b/sound/core/info.c index a451b24199c3e..782fba87cc043 100644 --- a/sound/core/info.c +++ b/sound/core/info.c @@ -234,7 +234,7 @@ static int snd_info_entry_mmap(struct file *file, struct vm_area_struct *vma) static int snd_info_entry_open(struct inode *inode, struct file *file) { - struct snd_info_entry *entry = PDE_DATA(inode); + struct snd_info_entry *entry = pde_data(inode); struct snd_info_private_data *data; int mode, err; @@ -365,7 +365,7 @@ static int snd_info_seq_show(struct seq_file *seq, void *p) static int snd_info_text_entry_open(struct inode *inode, struct file *file) { - struct snd_info_entry *entry = PDE_DATA(inode); + struct snd_info_entry *entry = pde_data(inode); struct snd_info_private_data *data; int err; From 02643e1da86612e416c91c354193b3188da888e8 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Fri, 3 Dec 2021 12:58:55 +1100 Subject: [PATCH 52/58] proc-remove-pde_data-completely-fix fix naming clash in drivers/nubus/proc.c Cc: Alexey Dobriyan Cc: Alexey Gladkov Cc: Christian Brauner Cc: Muchun Song Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- drivers/nubus/proc.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/drivers/nubus/proc.c b/drivers/nubus/proc.c index 22fb11da519bf..a334037b4119c 100644 --- a/drivers/nubus/proc.c +++ b/drivers/nubus/proc.c @@ -93,30 +93,30 @@ struct nubus_proc_pde_data { static struct nubus_proc_pde_data * nubus_proc_alloc_pde_data(unsigned char *ptr, unsigned int size) { - struct nubus_proc_pde_data *pde_data; + struct nubus_proc_pded *pded; - pde_data = kmalloc(sizeof(*pde_data), GFP_KERNEL); - if (!pde_data) + pded = kmalloc(sizeof(*pded), GFP_KERNEL); + if (!pded) return NULL; - pde_data->res_ptr = ptr; - pde_data->res_size = size; - return pde_data; + pded->res_ptr = ptr; + pded->res_size = size; + return pded; } static int nubus_proc_rsrc_show(struct seq_file *m, void *v) { struct inode *inode = m->private; - struct nubus_proc_pde_data *pde_data; + struct nubus_proc_pde_data *pded; - pde_data = pde_data(inode); - if (!pde_data) + pded = pde_data(inode); + if (!pded) return 0; - if (pde_data->res_size > m->size) + if (pded->res_size > m->size) return -EFBIG; - if (pde_data->res_size) { + if (pded->res_size) { int lanes = (int)proc_get_parent_data(inode); struct nubus_dirent ent; @@ -124,11 +124,11 @@ static int nubus_proc_rsrc_show(struct seq_file *m, void *v) return 0; ent.mask = lanes; - ent.base = pde_data->res_ptr; + ent.base = pded->res_ptr; ent.data = 0; - nubus_seq_write_rsrc_mem(m, &ent, pde_data->res_size); + nubus_seq_write_rsrc_mem(m, &ent, pded->res_size); } else { - unsigned int data = (unsigned int)pde_data->res_ptr; + unsigned int data = (unsigned int)pded->res_ptr; seq_putc(m, data >> 16); seq_putc(m, data >> 8); @@ -142,18 +142,18 @@ void nubus_proc_add_rsrc_mem(struct proc_dir_entry *procdir, unsigned int size) { char name[9]; - struct nubus_proc_pde_data *pde_data; + struct nubus_proc_pde_data *pded; if (!procdir) return; snprintf(name, sizeof(name), "%x", ent->type); if (size) - pde_data = nubus_proc_alloc_pde_data(nubus_dirptr(ent), size); + pded = nubus_proc_alloc_pde_data(nubus_dirptr(ent), size); else - pde_data = NULL; + pded = NULL; proc_create_single_data(name, S_IFREG | 0444, procdir, - nubus_proc_rsrc_show, pde_data); + nubus_proc_rsrc_show, pded); } void nubus_proc_add_rsrc(struct proc_dir_entry *procdir, From f4e3a11fc6faa2f214184de9e541dc3886324222 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Fri, 3 Dec 2021 12:28:10 -0800 Subject: [PATCH 53/58] proc-remove-pde_data-completely-fix-fix Cc: Muchun Song Cc: Stephen Rothwell Reported-by: kernel test robot Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- drivers/nubus/proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nubus/proc.c b/drivers/nubus/proc.c index a334037b4119c..1fd667852271f 100644 --- a/drivers/nubus/proc.c +++ b/drivers/nubus/proc.c @@ -93,7 +93,7 @@ struct nubus_proc_pde_data { static struct nubus_proc_pde_data * nubus_proc_alloc_pde_data(unsigned char *ptr, unsigned int size) { - struct nubus_proc_pded *pded; + struct nubus_proc_pde_data *pded; pded = kmalloc(sizeof(*pded), GFP_KERNEL); if (!pded) From f1f73c050ed06cf3d188f8d540d940394e216cee Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Fri, 3 Dec 2021 12:58:55 +1100 Subject: [PATCH 54/58] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() Currently, enabling CONFIG_STACKDEPOT means its stack_table will be allocated from memblock, even if stack depot ends up not actually used. The default size of stack_table is 4MB on 32-bit, 8MB on 64-bit. This is fine for use-cases such as KASAN which is also a config option and has overhead on its own. But it's an issue for functionality that has to be actually enabled on boot (page_owner) or depends on hardware (GPU drivers) and thus the memory might be wasted. This was raised as an issue [1] when attempting to add stackdepot support for SLUB's debug object tracking functionality. It's common to build kernels with CONFIG_SLUB_DEBUG and enable slub_debug on boot only when needed, or create only specific kmem caches with debugging for testing purposes. It would thus be more efficient if stackdepot's table was allocated only when actually going to be used. This patch thus makes the allocation (and whole stack_depot_init() call) optional: - Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN select this flag. - Other users have to call stack_depot_init() as part of their own init when it's determined that stack depot will actually be used. This may depend on both config and runtime conditions. Convert current users which are page_owner and several in the DRM subsystem. Same will be done for SLUB later. - Because the init might now be called after the boot-time memblock allocation has given all memory to the buddy allocator, change stack_depot_init() to allocate stack_table with kvmalloc() when memblock is no longer available. Also handle allocation failure by disabling stackdepot (could have theoretically happened even with memblock allocation previously), and don't unnecessarily align the memblock allocation to its own size anymore. [1] https://lore.kernel.org/all/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/ Link: https://lkml.kernel.org/r/20211013073005.11351-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka Acked-by: Dmitry Vyukov Reviewed-by: Marco Elver # stackdepot Cc: Marco Elver Cc: Vijayanand Jitta Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Thomas Zimmermann Cc: David Airlie Cc: Daniel Vetter Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Andrey Konovalov Cc: Dmitry Vyukov Cc: Geert Uytterhoeven Cc: Oliver Glitta Cc: Imran Khan Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- drivers/gpu/drm/drm_dp_mst_topology.c | 1 + drivers/gpu/drm/drm_mm.c | 4 +++ drivers/gpu/drm/i915/intel_runtime_pm.c | 3 +++ include/linux/stackdepot.h | 25 ++++++++++++------- init/main.c | 2 +- lib/Kconfig | 4 +++ lib/Kconfig.kasan | 2 +- lib/stackdepot.c | 33 +++++++++++++++++++++---- mm/page_owner.c | 2 ++ 9 files changed, 60 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index f3d79eda94bb0..8b3822142fed8 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -5511,6 +5511,7 @@ int drm_dp_mst_topology_mgr_init(struct drm_dp_mst_topology_mgr *mgr, mutex_init(&mgr->probe_lock); #if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS) mutex_init(&mgr->topology_ref_history_lock); + stack_depot_init(); #endif INIT_LIST_HEAD(&mgr->tx_msg_downq); INIT_LIST_HEAD(&mgr->destroy_port_list); diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index 7d1c578388d33..8257f9d4f6190 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -980,6 +980,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size) add_hole(&mm->head_node); mm->scan_active = 0; + +#ifdef CONFIG_DRM_DEBUG_MM + stack_depot_init(); +#endif } EXPORT_SYMBOL(drm_mm_init); diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index 22dab36afcb62..53f1ccb78849a 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -68,6 +68,9 @@ static noinline depot_stack_handle_t __save_depot_stack(void) static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm) { spin_lock_init(&rpm->debug.lock); + + if (rpm->available) + stack_depot_init(); } static noinline depot_stack_handle_t diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h index c34b55a6e5540..17f992fe6355b 100644 --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -19,6 +19,22 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries, unsigned int nr_entries, gfp_t gfp_flags, bool can_alloc); +/* + * Every user of stack depot has to call this during its own init when it's + * decided that it will be calling stack_depot_save() later. + * + * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot + * enabled as part of mm_init(), for subsystems where it's known at compile time + * that stack depot will be used. + */ +int stack_depot_init(void); + +#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT +static inline int stack_depot_early_init(void) { return stack_depot_init(); } +#else +static inline int stack_depot_early_init(void) { return 0; } +#endif + depot_stack_handle_t stack_depot_save(unsigned long *entries, unsigned int nr_entries, gfp_t gfp_flags); @@ -30,13 +46,4 @@ int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size, void stack_depot_print(depot_stack_handle_t stack); -#ifdef CONFIG_STACKDEPOT -int stack_depot_init(void); -#else -static inline int stack_depot_init(void) -{ - return 0; -} -#endif /* CONFIG_STACKDEPOT */ - #endif diff --git a/init/main.c b/init/main.c index f9065a9b17082..ff8f83c2da5a0 100644 --- a/init/main.c +++ b/init/main.c @@ -837,7 +837,7 @@ static void __init mm_init(void) init_mem_debugging_and_hardening(); kfence_alloc_pool(); report_meminit(); - stack_depot_init(); + stack_depot_early_init(); mem_init(); mem_init_print_info(); /* page_owner must be initialized after buddy is ready */ diff --git a/lib/Kconfig b/lib/Kconfig index 6a6ae5312fa05..6ff8f546e4a59 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -668,6 +668,10 @@ config STACKDEPOT bool select STACKTRACE +config STACKDEPOT_ALWAYS_INIT + bool + select STACKDEPOT + config STACK_HASH_ORDER int "stack depot hash size (12 => 4KB, 20 => 1024KB)" range 12 20 diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan index cdc842d090db3..879757b6dd149 100644 --- a/lib/Kconfig.kasan +++ b/lib/Kconfig.kasan @@ -38,7 +38,7 @@ menuconfig KASAN CC_HAS_WORKING_NOSANITIZE_ADDRESS) || \ HAVE_ARCH_KASAN_HW_TAGS depends on (SLUB && SYSFS) || (SLAB && !DEBUG_SLAB) - select STACKDEPOT + select STACKDEPOT_ALWAYS_INIT help Enables KASAN (KernelAddressSANitizer) - runtime memory debugger, designed to find out-of-bounds accesses and use-after-free bugs. diff --git a/lib/stackdepot.c b/lib/stackdepot.c index b437ae79aca14..8894a5171b25b 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -161,18 +162,40 @@ static int __init is_stack_depot_disabled(char *str) } early_param("stack_depot_disable", is_stack_depot_disabled); -int __init stack_depot_init(void) +/* + * __ref because of memblock_alloc(), which will not be actually called after + * the __init code is gone, because at that point slab_is_available() is true + */ +__ref int stack_depot_init(void) { - if (!stack_depot_disable) { + static DEFINE_MUTEX(stack_depot_init_mutex); + + mutex_lock(&stack_depot_init_mutex); + if (!stack_depot_disable && stack_table == NULL) { size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *)); int i; - stack_table = memblock_alloc(size, size); - for (i = 0; i < STACK_HASH_SIZE; i++) - stack_table[i] = NULL; + if (slab_is_available()) { + pr_info("Stack Depot allocating hash table with kvmalloc\n"); + stack_table = kvmalloc(size, GFP_KERNEL); + } else { + pr_info("Stack Depot allocating hash table with memblock_alloc\n"); + stack_table = memblock_alloc(size, SMP_CACHE_BYTES); + } + if (stack_table) { + for (i = 0; i < STACK_HASH_SIZE; i++) + stack_table[i] = NULL; + } else { + pr_err("Stack Depot failed hash table allocationg, disabling\n"); + stack_depot_disable = true; + mutex_unlock(&stack_depot_init_mutex); + return -ENOMEM; + } } + mutex_unlock(&stack_depot_init_mutex); return 0; } +EXPORT_SYMBOL_GPL(stack_depot_init); /* Calculate hash for a stack */ static inline u32 hash_stack(unsigned long *entries, unsigned int size) diff --git a/mm/page_owner.c b/mm/page_owner.c index 4f924957ce7a0..b1474e1431b44 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -80,6 +80,8 @@ static void init_page_owner(void) if (!page_owner_enabled) return; + stack_depot_init(); + register_dummy_stack(); register_failure_stack(); register_early_stack(); From 5884f4f48b5bbcc0c300f294ae5fe7795a8a2a8c Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Fri, 3 Dec 2021 12:58:56 +1100 Subject: [PATCH 55/58] lib/stackdepot: fix spelling mistake and grammar in pr_err message There is a spelling mistake of the work allocation so fix this and re-phrase the message to make it easier to read. Link: https://lkml.kernel.org/r/20211015104159.11282-1-colin.king@canonical.com Signed-off-by: Colin Ian King Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- lib/stackdepot.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 8894a5171b25b..8b6c415852035 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -186,7 +186,7 @@ __ref int stack_depot_init(void) for (i = 0; i < STACK_HASH_SIZE; i++) stack_table[i] = NULL; } else { - pr_err("Stack Depot failed hash table allocationg, disabling\n"); + pr_err("Stack Depot hash table allocation failed, disabling\n"); stack_depot_disable = true; mutex_unlock(&stack_depot_init_mutex); return -ENOMEM; From 57cdbe50a6b55f02f2ce8162ae369e0a30860149 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Fri, 3 Dec 2021 12:58:57 +1100 Subject: [PATCH 56/58] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup On FLATMEM, we call page_ext_init_flatmem_late() just before kmem_cache_init() which means stack_depot_init() (called by page owner init) will not recognize properly it should use kvmalloc() and not memblock_alloc(). memblock_alloc() will also not issue a warning and return a block memory that can be invalid and cause kernel page fault when saving stacks, as reported by the kernel test robot [1]. Fix this by moving page_ext_init_flatmem_late() below kmem_cache_init() so that slab_is_available() is true during stack_depot_init(). SPARSEMEM doesn't have this issue, as it doesn't do page_ext_init_flatmem_late(), but a different page_ext_init() even later in the boot process. Thanks to Mike Rapoport for pointing out the FLATMEM init ordering issue. While at it, also actually resolve a checkpatch warning in stack_depot_init() from DRM CI, which was supposed to be in the original patch already. [1] https://lore.kernel.org/all/20211014085450.GC18719@xsang-OptiPlex-9020/ Link: https://lkml.kernel.org/r/6abd9213-19a9-6d58-cedc-2414386d2d81@suse.cz Signed-off-by: Vlastimil Babka Reported-by: kernel test robot Cc: Mike Rapoport Cc: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- init/main.c | 7 +++++-- lib/stackdepot.c | 2 +- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/init/main.c b/init/main.c index ff8f83c2da5a0..cf6452628ca1f 100644 --- a/init/main.c +++ b/init/main.c @@ -840,9 +840,12 @@ static void __init mm_init(void) stack_depot_early_init(); mem_init(); mem_init_print_info(); - /* page_owner must be initialized after buddy is ready */ - page_ext_init_flatmem_late(); kmem_cache_init(); + /* + * page_owner must be initialized after buddy is ready, and also after + * slab is ready so that stack_depot_init() works properly + */ + page_ext_init_flatmem_late(); kmemleak_init(); pgtable_init(); debug_objects_mem_init(); diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 8b6c415852035..00ccb106f1a8a 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -171,7 +171,7 @@ __ref int stack_depot_init(void) static DEFINE_MUTEX(stack_depot_init_mutex); mutex_lock(&stack_depot_init_mutex); - if (!stack_depot_disable && stack_table == NULL) { + if (!stack_depot_disable && !stack_table) { size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *)); int i; From edb059b9c66d41a725d89294777ad6de106ff61b Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Fri, 3 Dec 2021 12:58:57 +1100 Subject: [PATCH 57/58] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup3 Due to cd06ab2fd48f ("drm/locking: add backtrace for locking contended locks without backoff") landing recently to -next adding a new stack depot user in drivers/gpu/drm/drm_modeset_lock.c we need to add an appropriate call to stack_depot_init() there as well. Link: https://lkml.kernel.org/r/2a692365-cfa1-64f2-34e0-8aa5674dce5e@suse.cz Signed-off-by: Vlastimil Babka Cc: Jani Nikula Cc: Naresh Kamboju Cc: Marco Elver Cc: Vijayanand Jitta Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Thomas Zimmermann Cc: David Airlie Cc: Daniel Vetter Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Andrey Konovalov Cc: Dmitry Vyukov Cc: Geert Uytterhoeven Cc: Oliver Glitta Cc: Imran Khan Cc: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- drivers/gpu/drm/drm_modeset_lock.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c index c97323365675c..918065982db46 100644 --- a/drivers/gpu/drm/drm_modeset_lock.c +++ b/drivers/gpu/drm/drm_modeset_lock.c @@ -107,6 +107,11 @@ static void __drm_stack_depot_print(depot_stack_handle_t stack_depot) kfree(buf); } + +static void __drm_stack_depot_init(void) +{ + stack_depot_init(); +} #else /* CONFIG_DRM_DEBUG_MODESET_LOCK */ static depot_stack_handle_t __drm_stack_depot_save(void) { @@ -115,6 +120,9 @@ static depot_stack_handle_t __drm_stack_depot_save(void) static void __drm_stack_depot_print(depot_stack_handle_t stack_depot) { } +static void __drm_stack_depot_init(void) +{ +} #endif /* CONFIG_DRM_DEBUG_MODESET_LOCK */ /** @@ -359,6 +367,7 @@ void drm_modeset_lock_init(struct drm_modeset_lock *lock) { ww_mutex_init(&lock->mutex, &crtc_ww_class); INIT_LIST_HEAD(&lock->head); + __drm_stack_depot_init(); } EXPORT_SYMBOL(drm_modeset_lock_init); From 4a6df865acade7f2cbdaddeee4b710dcc6462e5b Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Fri, 3 Dec 2021 12:58:58 +1100 Subject: [PATCH 58/58] lib/stackdepot: always do filter_irq_stacks() in stack_depot_save() The non-interrupt portion of interrupt stack traces before interrupt entry is usually arbitrary. Therefore, saving stack traces of interrupts (that include entries before interrupt entry) to stack depot leads to unbounded stackdepot growth. As such, use of filter_irq_stacks() is a requirement to ensure stackdepot can efficiently deduplicate interrupt stacks. Looking through all current users of stack_depot_save(), none (except KASAN) pass the stack trace through filter_irq_stacks() before passing it on to stack_depot_save(). Rather than adding filter_irq_stacks() to all current users of stack_depot_save(), it became clear that stack_depot_save() should simply do filter_irq_stacks(). Link: https://lkml.kernel.org/r/20211130095727.2378739-1-elver@google.com Signed-off-by: Marco Elver Reviewed-by: Alexander Potapenko Acked-by: Vlastimil Babka Reviewed-by: Andrey Konovalov Cc: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Vijayanand Jitta Cc: "Gustavo A. R. Silva" Cc: Imran Khan Cc: Chris Wilson Cc: Jani Nikula Cc: Mika Kuoppala Signed-off-by: Andrew Morton Signed-off-by: Stephen Rothwell --- lib/stackdepot.c | 13 +++++++++++++ mm/kasan/common.c | 1 - 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 00ccb106f1a8a..bf5ba9af05009 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -328,6 +328,9 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch); * (allocates using GFP flags of @alloc_flags). If @can_alloc is %false, avoids * any allocations and will fail if no space is left to store the stack trace. * + * If the stack trace in @entries is from an interrupt, only the portion up to + * interrupt entry is saved. + * * Context: Any context, but setting @can_alloc to %false is required if * alloc_pages() cannot be used from the current context. Currently * this is the case from contexts where neither %GFP_ATOMIC nor @@ -346,6 +349,16 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries, unsigned long flags; u32 hash; + /* + * If this stack trace is from an interrupt, including anything before + * interrupt entry usually leads to unbounded stackdepot growth. + * + * Because use of filter_irq_stacks() is a requirement to ensure + * stackdepot can efficiently deduplicate interrupt stacks, always + * filter_irq_stacks() to simplify all callers' use of stackdepot. + */ + nr_entries = filter_irq_stacks(entries, nr_entries); + if (unlikely(nr_entries == 0) || stack_depot_disable) goto fast_exit; diff --git a/mm/kasan/common.c b/mm/kasan/common.c index 7c06db78a76cb..92196562687b6 100644 --- a/mm/kasan/common.c +++ b/mm/kasan/common.c @@ -36,7 +36,6 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc) unsigned int nr_entries; nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0); - nr_entries = filter_irq_stacks(entries, nr_entries); return __stack_depot_save(entries, nr_entries, flags, can_alloc); }