From 421033deb91521aa6a9255e495cb106741a52275 Mon Sep 17 00:00:00 2001 From: Paul Fertser Date: Mon, 5 Jun 2023 10:34:07 +0300 Subject: [PATCH 01/98] wifi: mt76: mt7615: do not advertise 5 GHz on first phy of MT7615D (DBDC) On DBDC devices the first (internal) phy is only capable of using 2.4 GHz band, and the 5 GHz band is exposed via a separate phy object, so avoid the false advertising. Reported-by: Rani Hod Closes: https://github.com/openwrt/openwrt/pull/12361 Fixes: 7660a1bd0c22 ("mt76: mt7615: register ext_phy if DBDC is detected") Cc: stable@vger.kernel.org Signed-off-by: Paul Fertser Reviewed-by: Simon Horman Acked-by: Felix Fietkau Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230605073408.8699-1-fercerpav@gmail.com --- drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c b/drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c index 68e88224b8b1f..ccedea7e8a50d 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c +++ b/drivers/net/wireless/mediatek/mt76/mt7615/eeprom.c @@ -128,12 +128,12 @@ mt7615_eeprom_parse_hw_band_cap(struct mt7615_dev *dev) case MT_EE_5GHZ: dev->mphy.cap.has_5ghz = true; break; - case MT_EE_2GHZ: - dev->mphy.cap.has_2ghz = true; - break; case MT_EE_DBDC: dev->dbdc_support = true; fallthrough; + case MT_EE_2GHZ: + dev->mphy.cap.has_2ghz = true; + break; default: dev->mphy.cap.has_2ghz = true; dev->mphy.cap.has_5ghz = true; From a2777be03236c00466326acba8d39ac4f9c3e971 Mon Sep 17 00:00:00 2001 From: Brian Norris Date: Fri, 21 Jul 2023 16:06:04 -0700 Subject: [PATCH 02/98] MAINTAINERS: Update mwifiex maintainer list We haven't heard anything from these folks in years. I've been reviewing many submissions and plan to keep doing so. Cc: Amitkumar Karwar Cc: Ganapathi Bhat Cc: Sharvari Harisangam Cc: Xinming Hu Signed-off-by: Brian Norris Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230721160603.1.Idf0e8025f59c62d73c08960638249b58cf215acc@changeid --- MAINTAINERS | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index dfbb271f1667e..42e78a696be66 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12579,12 +12579,9 @@ F: Documentation/devicetree/bindings/net/marvell,pp2.yaml F: drivers/net/ethernet/marvell/mvpp2/ MARVELL MWIFIEX WIRELESS DRIVER -M: Amitkumar Karwar -M: Ganapathi Bhat -M: Sharvari Harisangam -M: Xinming Hu +M: Brian Norris L: linux-wireless@vger.kernel.org -S: Maintained +S: Odd Fixes F: drivers/net/wireless/marvell/mwifiex/ MARVELL MWL8K WIRELESS DRIVER From f2c67a3e60d1071b65848efaa8c3b66c363dd025 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Tue, 25 Jul 2023 10:42:05 +0200 Subject: [PATCH 03/98] bpf: Disable preemption in bpf_perf_event_output The nesting protection in bpf_perf_event_output relies on disabled preemption, which is guaranteed for kprobes and tracepoints. However bpf_perf_event_output can be also called from uprobes context through bpf_prog_run_array_sleepable function which disables migration, but keeps preemption enabled. This can cause task to be preempted by another one inside the nesting protection and lead eventually to two tasks using same perf_sample_data buffer and cause crashes like: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle page fault for address: ffffffff82be3eea ... Call Trace: ? __die+0x1f/0x70 ? page_fault_oops+0x176/0x4d0 ? exc_page_fault+0x132/0x230 ? asm_exc_page_fault+0x22/0x30 ? perf_output_sample+0x12b/0x910 ? perf_event_output+0xd0/0x1d0 ? bpf_perf_event_output+0x162/0x1d0 ? bpf_prog_c6271286d9a4c938_krava1+0x76/0x87 ? __uprobe_perf_func+0x12b/0x540 ? uprobe_dispatcher+0x2c4/0x430 ? uprobe_notify_resume+0x2da/0xce0 ? atomic_notifier_call_chain+0x7b/0x110 ? exit_to_user_mode_prepare+0x13e/0x290 ? irqentry_exit_to_user_mode+0x5/0x30 ? asm_exc_int3+0x35/0x40 Fixing this by disabling preemption in bpf_perf_event_output. Cc: stable@vger.kernel.org Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps") Acked-by: Hou Tao Signed-off-by: Jiri Olsa Link: https://lore.kernel.org/r/20230725084206.580930-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov --- kernel/trace/bpf_trace.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 5f2dcabad202c..bf18a53731b04 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -661,8 +661,7 @@ static DEFINE_PER_CPU(int, bpf_trace_nest_level); BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map, u64, flags, void *, data, u64, size) { - struct bpf_trace_sample_data *sds = this_cpu_ptr(&bpf_trace_sds); - int nest_level = this_cpu_inc_return(bpf_trace_nest_level); + struct bpf_trace_sample_data *sds; struct perf_raw_record raw = { .frag = { .size = size, @@ -670,7 +669,11 @@ BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map, }, }; struct perf_sample_data *sd; - int err; + int nest_level, err; + + preempt_disable(); + sds = this_cpu_ptr(&bpf_trace_sds); + nest_level = this_cpu_inc_return(bpf_trace_nest_level); if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(sds->sds))) { err = -EBUSY; @@ -688,9 +691,9 @@ BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map, perf_sample_save_raw_data(sd, &raw); err = __bpf_perf_event_output(regs, map, flags, sd); - out: this_cpu_dec(bpf_trace_nest_level); + preempt_enable(); return err; } From d62cc390c2e99ae267ffe4b8d7e2e08b6c758c32 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Tue, 25 Jul 2023 10:42:06 +0200 Subject: [PATCH 04/98] bpf: Disable preemption in bpf_event_output We received report [1] of kernel crash, which is caused by using nesting protection without disabled preemption. The bpf_event_output can be called by programs executed by bpf_prog_run_array_cg function that disabled migration but keeps preemption enabled. This can cause task to be preempted by another one inside the nesting protection and lead eventually to two tasks using same perf_sample_data buffer and cause crashes like: BUG: kernel NULL pointer dereference, address: 0000000000000001 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page ... ? perf_output_sample+0x12a/0x9a0 ? finish_task_switch.isra.0+0x81/0x280 ? perf_event_output+0x66/0xa0 ? bpf_event_output+0x13a/0x190 ? bpf_event_output_data+0x22/0x40 ? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb ? xa_load+0x87/0xe0 ? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0 ? release_sock+0x3e/0x90 ? sk_setsockopt+0x1a1/0x12f0 ? udp_pre_connect+0x36/0x50 ? inet_dgram_connect+0x93/0xa0 ? __sys_connect+0xb4/0xe0 ? udp_setsockopt+0x27/0x40 ? __pfx_udp_push_pending_frames+0x10/0x10 ? __sys_setsockopt+0xdf/0x1a0 ? __x64_sys_connect+0xf/0x20 ? do_syscall_64+0x3a/0x90 ? entry_SYSCALL_64_after_hwframe+0x72/0xdc Fixing this by disabling preemption in bpf_event_output. [1] https://github.com/cilium/cilium/issues/26756 Cc: stable@vger.kernel.org Reported-by: Oleg "livelace" Popov Closes: https://github.com/cilium/cilium/issues/26756 Fixes: 2a916f2f546c ("bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.") Acked-by: Hou Tao Signed-off-by: Jiri Olsa Link: https://lore.kernel.org/r/20230725084206.580930-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov --- kernel/trace/bpf_trace.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index bf18a53731b04..bd1a42b23f3ff 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -718,7 +718,6 @@ static DEFINE_PER_CPU(struct bpf_trace_sample_data, bpf_misc_sds); u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy) { - int nest_level = this_cpu_inc_return(bpf_event_output_nest_level); struct perf_raw_frag frag = { .copy = ctx_copy, .size = ctx_size, @@ -735,8 +734,12 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, }; struct perf_sample_data *sd; struct pt_regs *regs; + int nest_level; u64 ret; + preempt_disable(); + nest_level = this_cpu_inc_return(bpf_event_output_nest_level); + if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(bpf_misc_sds.sds))) { ret = -EBUSY; goto out; @@ -751,6 +754,7 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, ret = __bpf_perf_event_output(regs, map, flags, sd); out: this_cpu_dec(bpf_event_output_nest_level); + preempt_enable(); return ret; } From d265ebe41c911314bd273c218a37088835959fa1 Mon Sep 17 00:00:00 2001 From: Kalle Valo Date: Thu, 20 Jul 2023 18:14:44 +0300 Subject: [PATCH 05/98] Revert "wifi: ath11k: Enable threaded NAPI" This reverts commit 13aa2fb692d3717767303817f35b3e650109add3. This commit broke QCN9074 initialisation: [ 358.960477] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 36866 [ 358.960481] ath11k_pci 0000:04:00.0: failed to send WMI_STA_POWERSAVE_PARAM_CMDID [ 358.960484] ath11k_pci 0000:04:00.0: could not set uapsd params -105 As there's no fix available let's just revert it to get QCN9074 working again. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217536 Signed-off-by: Kalle Valo Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230720151444.2016637-1-kvalo@kernel.org --- drivers/net/wireless/ath/ath11k/ahb.c | 1 - drivers/net/wireless/ath/ath11k/pcic.c | 1 - 2 files changed, 2 deletions(-) diff --git a/drivers/net/wireless/ath/ath11k/ahb.c b/drivers/net/wireless/ath/ath11k/ahb.c index 1cebba7889d78..139da578831a4 100644 --- a/drivers/net/wireless/ath/ath11k/ahb.c +++ b/drivers/net/wireless/ath/ath11k/ahb.c @@ -376,7 +376,6 @@ static void ath11k_ahb_ext_irq_enable(struct ath11k_base *ab) struct ath11k_ext_irq_grp *irq_grp = &ab->ext_irq_grp[i]; if (!irq_grp->napi_enabled) { - dev_set_threaded(&irq_grp->napi_ndev, true); napi_enable(&irq_grp->napi); irq_grp->napi_enabled = true; } diff --git a/drivers/net/wireless/ath/ath11k/pcic.c b/drivers/net/wireless/ath/ath11k/pcic.c index c899616fbee4b..c63083633b371 100644 --- a/drivers/net/wireless/ath/ath11k/pcic.c +++ b/drivers/net/wireless/ath/ath11k/pcic.c @@ -466,7 +466,6 @@ void ath11k_pcic_ext_irq_enable(struct ath11k_base *ab) struct ath11k_ext_irq_grp *irq_grp = &ab->ext_irq_grp[i]; if (!irq_grp->napi_enabled) { - dev_set_threaded(&irq_grp->napi_ndev, true); napi_enable(&irq_grp->napi); irq_grp->napi_enabled = true; } From fd7f08d92fcd7cc3eca0dd6c853f722a4c6176df Mon Sep 17 00:00:00 2001 From: Ilan Peer Date: Sun, 23 Jul 2023 23:10:43 +0300 Subject: [PATCH 06/98] wifi: cfg80211: Fix return value in scan logic The reporter noticed a warning when running iwlwifi: WARNING: CPU: 8 PID: 659 at mm/page_alloc.c:4453 __alloc_pages+0x329/0x340 As cfg80211_parse_colocated_ap() is not expected to return a negative value return 0 and not a negative value if cfg80211_calc_short_ssid() fails. Fixes: c8cb5b854b40f ("nl80211/cfg80211: support 6 GHz scanning") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217675 Signed-off-by: Ilan Peer Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230723201043.3007430-1-ilan.peer@intel.com --- net/wireless/scan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/wireless/scan.c b/net/wireless/scan.c index 8bf00caf5d297..0cf1ce7b69342 100644 --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -657,7 +657,7 @@ static int cfg80211_parse_colocated_ap(const struct cfg80211_bss_ies *ies, ret = cfg80211_calc_short_ssid(ies, &ssid_elem, &s_ssid_tmp); if (ret) - return ret; + return 0; for_each_element_id(elem, WLAN_EID_REDUCED_NEIGHBOR_REPORT, ies->data, ies->len) { From a1ce186db7f0e449f35d12fb55ae0da2a1b400e2 Mon Sep 17 00:00:00 2001 From: Kalle Valo Date: Mon, 24 Jul 2023 13:08:23 +0300 Subject: [PATCH 07/98] Revert "wifi: ath6k: silence false positive -Wno-dangling-pointer warning on GCC 12" This reverts commit bd1d129daa3ede265a880e2c6a7f91eab0f4dc62. The dangling-pointer warnings were disabled kernel-wide by commit 49beadbd47c2 ("gcc-12: disable '-Wdangling-pointer' warning for now") for v5.19. So this hack in ath6kl is not needed anymore. Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230724100823.2948804-1-kvalo@kernel.org --- drivers/net/wireless/ath/ath6kl/Makefile | 5 ----- 1 file changed, 5 deletions(-) diff --git a/drivers/net/wireless/ath/ath6kl/Makefile b/drivers/net/wireless/ath/ath6kl/Makefile index a75bfa9fd1cfd..dc2b3b46781e1 100644 --- a/drivers/net/wireless/ath/ath6kl/Makefile +++ b/drivers/net/wireless/ath/ath6kl/Makefile @@ -36,11 +36,6 @@ ath6kl_core-y += wmi.o ath6kl_core-y += core.o ath6kl_core-y += recovery.o -# FIXME: temporarily silence -Wdangling-pointer on non W=1+ builds -ifndef KBUILD_EXTRA_WARN -CFLAGS_htc_mbox.o += $(call cc-disable-warning, dangling-pointer) -endif - ath6kl_core-$(CONFIG_NL80211_TESTMODE) += testmode.o ath6kl_core-$(CONFIG_ATH6KL_TRACING) += trace.o From 96839282edc28996809b2101afef8ae971b9dad7 Mon Sep 17 00:00:00 2001 From: Kalle Valo Date: Mon, 24 Jul 2023 13:45:39 +0300 Subject: [PATCH 08/98] MAINTAINERS: wifi: rtw88: change Ping as the maintainer Yan-Hsuan has been away since 2021 and Ping has been the de facto maintainer the past year, actively reviewing patches and doing all other maintainer duties. So fix the MAINTAINERS file to show the current situation. Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230724104547.3061709-2-kvalo@kernel.org --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 42e78a696be66..58b399c7fb5e2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17969,7 +17969,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-testing.g F: drivers/net/wireless/realtek/rtlwifi/ REALTEK WIRELESS DRIVER (rtw88) -M: Yan-Hsuan Chuang +M: Ping-Ke Shih L: linux-wireless@vger.kernel.org S: Maintained F: drivers/net/wireless/realtek/rtw88/ From 25700d4916fefd27f83b933149c4da3d32db8585 Mon Sep 17 00:00:00 2001 From: Kalle Valo Date: Mon, 24 Jul 2023 13:45:40 +0300 Subject: [PATCH 09/98] MAINTAINERS: wifi: atmel: mark as orphan Last activity from Simon is from 2005 so mark the driver is orphan. Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230724104547.3061709-3-kvalo@kernel.org --- MAINTAINERS | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 58b399c7fb5e2..35de04265644c 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3260,9 +3260,8 @@ F: Documentation/devicetree/bindings/input/atmel,maxtouch.yaml F: drivers/input/touchscreen/atmel_mxt_ts.c ATMEL WIRELESS DRIVER -M: Simon Kelley L: linux-wireless@vger.kernel.org -S: Maintained +S: Orphan W: http://www.thekelleys.org.uk/atmel W: http://atmelwlandriver.sourceforge.net/ F: drivers/net/wireless/atmel/atmel* From 74b81eac2dda55584aa6102feb920142ab08721a Mon Sep 17 00:00:00 2001 From: Kalle Valo Date: Mon, 24 Jul 2023 13:45:41 +0300 Subject: [PATCH 10/98] MAINTAINERS: wifi: mark cw1200 as orphan Last activity from Solomon is from 2013 so mark the driver orphan. Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230724104547.3061709-4-kvalo@kernel.org --- MAINTAINERS | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 35de04265644c..fc3c83c42efb9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5451,8 +5451,7 @@ F: Documentation/devicetree/bindings/net/can/ctu,ctucanfd.yaml F: drivers/net/can/ctucanfd/ CW1200 WLAN driver -M: Solomon Peachy -S: Maintained +S: Orphan F: drivers/net/wireless/st/cw1200/ CX18 VIDEO4LINUX DRIVER From e76983151dc66717e43c2a77abcfdb36318d2255 Mon Sep 17 00:00:00 2001 From: Kalle Valo Date: Mon, 24 Jul 2023 13:45:42 +0300 Subject: [PATCH 11/98] MAINTAINERS: wifi: mark ar5523 as orphan Last activity from Pontus for this driver is from 2013 so mark the driver orphan. Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230724104547.3061709-5-kvalo@kernel.org --- MAINTAINERS | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index fc3c83c42efb9..3792e3de4822d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -21884,9 +21884,8 @@ S: Maintained F: drivers/usb/misc/apple-mfi-fastcharge.c USB AR5523 WIRELESS DRIVER -M: Pontus Fuchs L: linux-wireless@vger.kernel.org -S: Maintained +S: Orphan F: drivers/net/wireless/ath/ar5523/ USB ATTACHED SCSI From bc5dee3ce7c06d2c3d7a3b135a24f85de45a8362 Mon Sep 17 00:00:00 2001 From: Kalle Valo Date: Mon, 24 Jul 2023 13:45:43 +0300 Subject: [PATCH 12/98] MAINTAINERS: wifi: mark rndis_wlan as orphan Last activity from Jussi for this driver is from 2013 so mark the driver orphan. Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230724104547.3061709-6-kvalo@kernel.org --- MAINTAINERS | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 3792e3de4822d..3d890a82149a0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -22162,9 +22162,8 @@ F: drivers/usb/gadget/legacy/webcam.c F: include/uapi/linux/usb/g_uvc.h USB WIRELESS RNDIS DRIVER (rndis_wlan) -M: Jussi Kivilinna L: linux-wireless@vger.kernel.org -S: Maintained +S: Orphan F: drivers/net/wireless/legacy/rndis_wlan.c USB XHCI DRIVER From 0566ec90515c50352442db467d2a8c8950d35dc9 Mon Sep 17 00:00:00 2001 From: Kalle Valo Date: Mon, 24 Jul 2023 13:45:44 +0300 Subject: [PATCH 13/98] MAINTAINERS: wifi: mark wl3501 as orphan There's no maintainer for this driver so mark it as orphan. Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230724104547.3061709-7-kvalo@kernel.org --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 3d890a82149a0..9ad555da5e365 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -22938,7 +22938,7 @@ F: drivers/input/misc/wistron_btns.c WL3501 WIRELESS PCMCIA CARD DRIVER L: linux-wireless@vger.kernel.org -S: Odd fixes +S: Orphan F: drivers/net/wireless/legacy/wl3501* WMI BINARY MOF DRIVER From c1e0a70de12dea642c696a81fd1104f5377fb203 Mon Sep 17 00:00:00 2001 From: Kalle Valo Date: Mon, 24 Jul 2023 13:45:45 +0300 Subject: [PATCH 14/98] MAINTAINERS: wifi: mark zd1211rw as orphan Last activity from Ulrich is from 2007 so mark the driver orphan. Remove the zd1211-devs list, I doubt anyone uses it anymore. The webpage seems to be down so remove that as well. Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230724104547.3061709-8-kvalo@kernel.org --- MAINTAINERS | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 9ad555da5e365..2f988aa03f2e9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23509,11 +23509,8 @@ S: Maintained F: mm/zbud.c ZD1211RW WIRELESS DRIVER -M: Ulrich Kunitz L: linux-wireless@vger.kernel.org -L: zd1211-devs@lists.sourceforge.net (subscribers-only) -S: Maintained -W: http://zd1211.ath.cx/wiki/DriverRewrite +S: Orphan F: drivers/net/wireless/zydas/zd1211rw/ ZD1301 MEDIA DRIVER From 3ccbc99c152fded21ff3661d7685095e14065e96 Mon Sep 17 00:00:00 2001 From: Kalle Valo Date: Mon, 24 Jul 2023 13:45:46 +0300 Subject: [PATCH 15/98] MAINTAINERS: wifi: mark b43 as orphan There's no maintainer for b43 so mark it as orphan. Signed-off-by: Kalle Valo Acked-by: Larry Finger Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230724104547.3061709-9-kvalo@kernel.org --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 2f988aa03f2e9..eccc3794e6e29 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3391,7 +3391,7 @@ F: drivers/media/radio/radio-aztech* B43 WIRELESS DRIVER L: linux-wireless@vger.kernel.org L: b43-dev@lists.infradead.org -S: Odd Fixes +S: Orphan W: https://wireless.wiki.kernel.org/en/users/Drivers/b43 F: drivers/net/wireless/broadcom/b43/ From cc326aae03c37d393e8dfba950caa33493dcdfad Mon Sep 17 00:00:00 2001 From: Kalle Valo Date: Mon, 24 Jul 2023 13:45:47 +0300 Subject: [PATCH 16/98] MAINTAINERS: wifi: mark mlw8k as orphan Last activity from Lennert is from 2012 so mark the driver as orphan. Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230724104547.3061709-10-kvalo@kernel.org --- MAINTAINERS | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index eccc3794e6e29..ab9a31f4954c7 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12583,9 +12583,8 @@ S: Odd Fixes F: drivers/net/wireless/marvell/mwifiex/ MARVELL MWL8K WIRELESS DRIVER -M: Lennert Buytenhek L: linux-wireless@vger.kernel.org -S: Odd Fixes +S: Orphan F: drivers/net/wireless/marvell/mwl8k.c MARVELL NAND CONTROLLER DRIVER From 456b5e85d8a56e5563573b10e0840c7ae59373da Mon Sep 17 00:00:00 2001 From: Kalle Valo Date: Tue, 25 Jul 2023 12:42:48 +0300 Subject: [PATCH 17/98] MAINTAINERS: add Jeff as ath10k, ath11k and ath12k maintainer Jeff will now start maintaining these drivers together with me. Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230725094248.3205486-1-kvalo@kernel.org --- MAINTAINERS | 3 +++ 1 file changed, 3 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index ab9a31f4954c7..71edae152e3a4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17430,6 +17430,7 @@ F: drivers/media/tuners/qt1010* QUALCOMM ATH12K WIRELESS DRIVER M: Kalle Valo +M: Jeff Johnson L: ath12k@lists.infradead.org S: Supported T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git @@ -17437,6 +17438,7 @@ F: drivers/net/wireless/ath/ath12k/ QUALCOMM ATHEROS ATH10K WIRELESS DRIVER M: Kalle Valo +M: Jeff Johnson L: ath10k@lists.infradead.org S: Supported W: https://wireless.wiki.kernel.org/en/users/Drivers/ath10k @@ -17446,6 +17448,7 @@ F: drivers/net/wireless/ath/ath10k/ QUALCOMM ATHEROS ATH11K WIRELESS DRIVER M: Kalle Valo +M: Jeff Johnson L: ath11k@lists.infradead.org S: Supported W: https://wireless.wiki.kernel.org/en/users/Drivers/ath11k From aeb660171b0663847fa04806a96302ac6112ad26 Mon Sep 17 00:00:00 2001 From: Zhengchao Shao Date: Tue, 4 Jul 2023 15:06:40 +0800 Subject: [PATCH 18/98] net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups In function macsec_fs_tx_create_crypto_table_groups(), when the ft->g memory is successfully allocated but the 'in' memory fails to be allocated, the memory pointed to by ft->g is released once. And in function macsec_fs_tx_create(), macsec_fs_tx_destroy() is called to release the memory pointed to by ft->g again. This will cause double free problem. Fixes: e467b283ffd5 ("net/mlx5e: Add MACsec TX steering rules") Signed-off-by: Zhengchao Shao Reviewed-by: Simon Horman Reviewed-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c index 7fc901a6ec5fc..414e285848813 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c @@ -161,6 +161,7 @@ static int macsec_fs_tx_create_crypto_table_groups(struct mlx5e_flow_table *ft) if (!in) { kfree(ft->g); + ft->g = NULL; return -ENOMEM; } From 5dd77585dd9d0e03dd1bceb95f0269a7eaf6b936 Mon Sep 17 00:00:00 2001 From: Zhengchao Shao Date: Wed, 5 Jul 2023 20:15:27 +0800 Subject: [PATCH 19/98] net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx when mlx5_cmd_exec failed in mlx5dr_cmd_create_reformat_ctx, the memory pointed by 'in' is not released, which will cause memory leak. Move memory release after mlx5_cmd_exec. Fixes: 1d9186476e12 ("net/mlx5: DR, Add direct rule command utilities") Signed-off-by: Zhengchao Shao Reviewed-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c index 7491911ebcb50..8c2a34a0d6be8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_cmd.c @@ -564,11 +564,12 @@ int mlx5dr_cmd_create_reformat_ctx(struct mlx5_core_dev *mdev, err = mlx5_cmd_exec(mdev, in, inlen, out, sizeof(out)); if (err) - return err; + goto err_free_in; *reformat_id = MLX5_GET(alloc_packet_reformat_context_out, out, packet_reformat_id); - kvfree(in); +err_free_in: + kvfree(in); return err; } From c6cf0b6097bf1bf1b2a89b521e9ecd26b581a93a Mon Sep 17 00:00:00 2001 From: Zhengchao Shao Date: Sat, 8 Jul 2023 15:13:07 +0800 Subject: [PATCH 20/98] net/mlx5: fix potential memory leak in mlx5e_init_rep_rx The memory pointed to by the priv->rx_res pointer is not freed in the error path of mlx5e_init_rep_rx, which can lead to a memory leak. Fix by freeing the memory in the error path, thereby making the error path identical to mlx5e_cleanup_rep_rx(). Fixes: af8bbf730068 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer") Signed-off-by: Zhengchao Shao Reviewed-by: Simon Horman Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 152b62138450f..0b265a3f9b766 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -1012,7 +1012,7 @@ static int mlx5e_init_rep_rx(struct mlx5e_priv *priv) err = mlx5e_open_drop_rq(priv, &priv->drop_rq); if (err) { mlx5_core_err(mdev, "open drop rq failed, %d\n", err); - return err; + goto err_rx_res_free; } err = mlx5e_rx_res_init(priv->rx_res, priv->mdev, 0, @@ -1046,6 +1046,7 @@ static int mlx5e_init_rep_rx(struct mlx5e_priv *priv) mlx5e_rx_res_destroy(priv->rx_res); err_close_drop_rq: mlx5e_close_drop_rq(&priv->drop_rq); +err_rx_res_free: mlx5e_rx_res_free(priv->rx_res); priv->rx_res = NULL; err_free_fs: From e5bcb7564d3bd0c88613c76963c5349be9c511c5 Mon Sep 17 00:00:00 2001 From: Yuanjun Gong Date: Tue, 25 Jul 2023 14:56:55 +0800 Subject: [PATCH 21/98] net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer() mlx5e_ipsec_remove_trailer() should return an error code if function pskb_trim() returns an unexpected value. Fixes: 2ac9cfe78223 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path") Signed-off-by: Yuanjun Gong Reviewed-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c index eab5bc718771f..8d995e3048692 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c @@ -58,7 +58,9 @@ static int mlx5e_ipsec_remove_trailer(struct sk_buff *skb, struct xfrm_state *x) trailer_len = alen + plen + 2; - pskb_trim(skb, skb->len - trailer_len); + ret = pskb_trim(skb, skb->len - trailer_len); + if (unlikely(ret)) + return ret; if (skb->protocol == htons(ETH_P_IP)) { ipv4hdr->tot_len = htons(ntohs(ipv4hdr->tot_len) - trailer_len); ip_send_check(ipv4hdr); From 0507f2c8be0d345fe7014147c027cea6dc1c00a4 Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Mon, 3 Jul 2023 17:34:44 +0300 Subject: [PATCH 22/98] net/mlx5: Honor user input for migratable port fn attr Currently, whenever a user is setting migratable port fn attr, the driver is always turn migratable capability on. Fix it by honor the user input Fixes: e5b9642a33be ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable") Signed-off-by: Shay Drory Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index bdfe609cc9ec4..93b2b94d41cdd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -4196,7 +4196,7 @@ int mlx5_devlink_port_fn_migratable_set(struct devlink_port *port, bool enable, } hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability); - MLX5_SET(cmd_hca_cap_2, hca_caps, migratable, 1); + MLX5_SET(cmd_hca_cap_2, hca_caps, migratable, enable); err = mlx5_vport_set_other_func_cap(esw->dev, hca_caps, vport->vport, MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE2); From 93a331939d1d1c6c3422bc09ec43cac658594b34 Mon Sep 17 00:00:00 2001 From: Chris Mi Date: Thu, 29 Jun 2023 11:32:03 +0300 Subject: [PATCH 23/98] net/mlx5e: Don't hold encap tbl lock if there is no encap action The cited commit holds encap tbl lock unconditionally when setting up dests. But it may cause the following deadlock: PID: 1063722 TASK: ffffa062ca5d0000 CPU: 13 COMMAND: "handler8" #0 [ffffb14de05b7368] __schedule at ffffffffa1d5aa91 #1 [ffffb14de05b7410] schedule at ffffffffa1d5afdb #2 [ffffb14de05b7430] schedule_preempt_disabled at ffffffffa1d5b528 #3 [ffffb14de05b7440] __mutex_lock at ffffffffa1d5d6cb #4 [ffffb14de05b74e8] mutex_lock_nested at ffffffffa1d5ddeb #5 [ffffb14de05b74f8] mlx5e_tc_tun_encap_dests_set at ffffffffc12f2096 [mlx5_core] #6 [ffffb14de05b7568] post_process_attr at ffffffffc12d9fc5 [mlx5_core] #7 [ffffb14de05b75a0] mlx5e_tc_add_fdb_flow at ffffffffc12de877 [mlx5_core] #8 [ffffb14de05b75f0] __mlx5e_add_fdb_flow at ffffffffc12e0eef [mlx5_core] #9 [ffffb14de05b7660] mlx5e_tc_add_flow at ffffffffc12e12f7 [mlx5_core] #10 [ffffb14de05b76b8] mlx5e_configure_flower at ffffffffc12e1686 [mlx5_core] #11 [ffffb14de05b7720] mlx5e_rep_indr_offload at ffffffffc12e3817 [mlx5_core] #12 [ffffb14de05b7730] mlx5e_rep_indr_setup_tc_cb at ffffffffc12e388a [mlx5_core] #13 [ffffb14de05b7740] tc_setup_cb_add at ffffffffa1ab2ba8 #14 [ffffb14de05b77a0] fl_hw_replace_filter at ffffffffc0bdec2f [cls_flower] #15 [ffffb14de05b7868] fl_change at ffffffffc0be6caa [cls_flower] #16 [ffffb14de05b7908] tc_new_tfilter at ffffffffa1ab71f0 [1031218.028143] wait_for_completion+0x24/0x30 [1031218.028589] mlx5e_update_route_decap_flows+0x9a/0x1e0 [mlx5_core] [1031218.029256] mlx5e_tc_fib_event_work+0x1ad/0x300 [mlx5_core] [1031218.029885] process_one_work+0x24e/0x510 Actually no need to hold encap tbl lock if there is no encap action. Fix it by checking if encap action exists or not before holding encap tbl lock. Fixes: 37c3b9fa7ccf ("net/mlx5e: Prevent encap offload when neigh update is running") Signed-off-by: Chris Mi Reviewed-by: Vlad Buslov Signed-off-by: Saeed Mahameed --- .../mellanox/mlx5/core/en/tc_tun_encap.c | 3 --- .../net/ethernet/mellanox/mlx5/core/en_tc.c | 21 ++++++++++++++++--- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c index f0c3464f037f4..0c88cf47af01b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c @@ -1030,9 +1030,6 @@ int mlx5e_tc_tun_encap_dests_set(struct mlx5e_priv *priv, int out_index; int err = 0; - if (!mlx5e_is_eswitch_flow(flow)) - return 0; - parse_attr = attr->parse_attr; esw_attr = attr->esw_attr; *vf_tun = false; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 8d0a3f69693e1..92377632f9e07 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1725,6 +1725,19 @@ verify_attr_actions(u32 actions, struct netlink_ext_ack *extack) return 0; } +static bool +has_encap_dests(struct mlx5_flow_attr *attr) +{ + struct mlx5_esw_flow_attr *esw_attr = attr->esw_attr; + int out_index; + + for (out_index = 0; out_index < MLX5_MAX_FLOW_FWD_VPORTS; out_index++) + if (esw_attr->dests[out_index].flags & MLX5_ESW_DEST_ENCAP) + return true; + + return false; +} + static int post_process_attr(struct mlx5e_tc_flow *flow, struct mlx5_flow_attr *attr, @@ -1737,9 +1750,11 @@ post_process_attr(struct mlx5e_tc_flow *flow, if (err) goto err_out; - err = mlx5e_tc_tun_encap_dests_set(flow->priv, flow, attr, extack, &vf_tun); - if (err) - goto err_out; + if (mlx5e_is_eswitch_flow(flow) && has_encap_dests(attr)) { + err = mlx5e_tc_tun_encap_dests_set(flow->priv, flow, attr, extack, &vf_tun); + if (err) + goto err_out; + } if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) { err = mlx5e_tc_attach_mod_hdr(flow->priv, flow, attr); From 3ec43c1b082a8804472430e1253544d75f4b540e Mon Sep 17 00:00:00 2001 From: Amir Tzin Date: Tue, 30 May 2023 20:11:14 +0300 Subject: [PATCH 24/98] net/mlx5e: Fix crash moving to switchdev mode when ntuple offload is set Moving to switchdev mode with ntuple offload on causes the kernel to crash since fs->arfs is freed during nic profile cleanup flow. Ntuple offload is not supported in switchdev mode and it is already unset by mlx5 fix feature ndo in switchdev mode. Verify fs->arfs is valid before disabling it. trace: [] RIP: 0010:_raw_spin_lock_bh+0x17/0x30 [] arfs_del_rules+0x44/0x1a0 [mlx5_core] [] mlx5e_arfs_disable+0xe/0x20 [mlx5_core] [] mlx5e_handle_feature+0x3d/0xb0 [mlx5_core] [] ? __rtnl_unlock+0x25/0x50 [] mlx5e_set_features+0xfe/0x160 [mlx5_core] [] __netdev_update_features+0x278/0xa50 [] ? netdev_run_todo+0x5e/0x2a0 [] netdev_update_features+0x22/0x70 [] ? _cond_resched+0x15/0x30 [] mlx5e_attach_netdev+0x12a/0x1e0 [mlx5_core] [] mlx5e_netdev_attach_profile+0xa1/0xc0 [mlx5_core] [] mlx5e_netdev_change_profile+0x77/0xe0 [mlx5_core] [] mlx5e_vport_rep_load+0x1ed/0x290 [mlx5_core] [] mlx5_esw_offloads_rep_load+0x88/0xd0 [mlx5_core] [] esw_offloads_load_rep.part.38+0x31/0x50 [mlx5_core] [] esw_offloads_enable+0x6c5/0x710 [mlx5_core] [] mlx5_eswitch_enable_locked+0x1bb/0x290 [mlx5_core] [] mlx5_devlink_eswitch_mode_set+0x14f/0x320 [mlx5_core] [] devlink_nl_cmd_eswitch_set_doit+0x94/0x120 [] genl_family_rcv_msg_doit.isra.17+0x113/0x150 [] genl_family_rcv_msg+0xb7/0x170 [] ? devlink_nl_cmd_port_split_doit+0x100/0x100 [] genl_rcv_msg+0x47/0xa0 [] ? genl_family_rcv_msg+0x170/0x170 [] netlink_rcv_skb+0x4c/0x130 [] genl_rcv+0x24/0x40 [] netlink_unicast+0x19a/0x230 [] netlink_sendmsg+0x204/0x3d0 [] sock_sendmsg+0x50/0x60 Fixes: 90b22b9bcd24 ("net/mlx5e: Disable Rx ntuple offload for uplink representor") Signed-off-by: Amir Tzin Reviewed-by: Aya Levin Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c index 933a7772a7a39..5aa51d74f8b43 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c @@ -135,6 +135,16 @@ static void arfs_del_rules(struct mlx5e_flow_steering *fs); int mlx5e_arfs_disable(struct mlx5e_flow_steering *fs) { + /* Moving to switchdev mode, fs->arfs is freed by mlx5e_nic_profile + * cleanup_rx callback and it is not recreated when + * mlx5e_uplink_rep_profile is loaded as mlx5e_create_flow_steering() + * is not called by the uplink_rep profile init_rx callback. Thus, if + * ntuple is set, moving to switchdev flow will enter this function + * with fs->arfs nullified. + */ + if (!mlx5e_fs_get_arfs(fs)) + return 0; + arfs_del_rules(fs); return arfs_disable(fs); From d03b6e6f31820b84f7449cca022047f36c42bc3f Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 3 Jul 2023 08:28:16 +0000 Subject: [PATCH 25/98] net/mlx5e: Move representor neigh cleanup to profile cleanup_tx For IP tunnel encapsulation in ECMP (Equal-Cost Multipath) mode, as the flow is duplicated to the peer eswitch, the related neighbour information on the peer uplink representor is created as well. In the cited commit, eswitch devcom unpair is moved to uplink unload API, specifically the profile->cleanup_tx. If there is a encap rule offloaded in ECMP mode, when one eswitch does unpair (because of unloading the driver, for instance), and the peer rule from the peer eswitch is going to be deleted, the use-after-free error is triggered while accessing neigh info, as it is already cleaned up in uplink's profile->disable, which is before its profile->cleanup_tx. To fix this issue, move the neigh cleanup to profile's cleanup_tx callback, and after mlx5e_cleanup_uplink_rep_tx is called. The neigh init is moved to init_tx for symmeter. [ 2453.376299] BUG: KASAN: slab-use-after-free in mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core] [ 2453.379125] Read of size 4 at addr ffff888127af9008 by task modprobe/2496 [ 2453.381542] CPU: 7 PID: 2496 Comm: modprobe Tainted: G B 6.4.0-rc7+ #15 [ 2453.383386] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 2453.384335] Call Trace: [ 2453.384625] [ 2453.384891] dump_stack_lvl+0x33/0x50 [ 2453.385285] print_report+0xc2/0x610 [ 2453.385667] ? __virt_addr_valid+0xb1/0x130 [ 2453.386091] ? mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core] [ 2453.386757] kasan_report+0xae/0xe0 [ 2453.387123] ? mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core] [ 2453.387798] mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core] [ 2453.388465] mlx5e_rep_encap_entry_detach+0xa6/0xe0 [mlx5_core] [ 2453.389111] mlx5e_encap_dealloc+0xa7/0x100 [mlx5_core] [ 2453.389706] mlx5e_tc_tun_encap_dests_unset+0x61/0xb0 [mlx5_core] [ 2453.390361] mlx5_free_flow_attr_actions+0x11e/0x340 [mlx5_core] [ 2453.391015] ? complete_all+0x43/0xd0 [ 2453.391398] ? free_flow_post_acts+0x38/0x120 [mlx5_core] [ 2453.392004] mlx5e_tc_del_fdb_flow+0x4ae/0x690 [mlx5_core] [ 2453.392618] mlx5e_tc_del_fdb_peers_flow+0x308/0x370 [mlx5_core] [ 2453.393276] mlx5e_tc_clean_fdb_peer_flows+0xf5/0x140 [mlx5_core] [ 2453.393925] mlx5_esw_offloads_unpair+0x86/0x540 [mlx5_core] [ 2453.394546] ? mlx5_esw_offloads_set_ns_peer.isra.0+0x180/0x180 [mlx5_core] [ 2453.395268] ? down_write+0xaa/0x100 [ 2453.395652] mlx5_esw_offloads_devcom_event+0x203/0x530 [mlx5_core] [ 2453.396317] mlx5_devcom_send_event+0xbb/0x190 [mlx5_core] [ 2453.396917] mlx5_esw_offloads_devcom_cleanup+0xb0/0xd0 [mlx5_core] [ 2453.397582] mlx5e_tc_esw_cleanup+0x42/0x120 [mlx5_core] [ 2453.398182] mlx5e_rep_tc_cleanup+0x15/0x30 [mlx5_core] [ 2453.398768] mlx5e_cleanup_rep_tx+0x6c/0x80 [mlx5_core] [ 2453.399367] mlx5e_detach_netdev+0xee/0x120 [mlx5_core] [ 2453.399957] mlx5e_netdev_change_profile+0x84/0x170 [mlx5_core] [ 2453.400598] mlx5e_vport_rep_unload+0xe0/0xf0 [mlx5_core] [ 2453.403781] mlx5_eswitch_unregister_vport_reps+0x15e/0x190 [mlx5_core] [ 2453.404479] ? mlx5_eswitch_register_vport_reps+0x200/0x200 [mlx5_core] [ 2453.405170] ? up_write+0x39/0x60 [ 2453.405529] ? kernfs_remove_by_name_ns+0xb7/0xe0 [ 2453.405985] auxiliary_bus_remove+0x2e/0x40 [ 2453.406405] device_release_driver_internal+0x243/0x2d0 [ 2453.406900] ? kobject_put+0x42/0x2d0 [ 2453.407284] bus_remove_device+0x128/0x1d0 [ 2453.407687] device_del+0x240/0x550 [ 2453.408053] ? waiting_for_supplier_show+0xe0/0xe0 [ 2453.408511] ? kobject_put+0xfa/0x2d0 [ 2453.408889] ? __kmem_cache_free+0x14d/0x280 [ 2453.409310] mlx5_rescan_drivers_locked.part.0+0xcd/0x2b0 [mlx5_core] [ 2453.409973] mlx5_unregister_device+0x40/0x50 [mlx5_core] [ 2453.410561] mlx5_uninit_one+0x3d/0x110 [mlx5_core] [ 2453.411111] remove_one+0x89/0x130 [mlx5_core] [ 2453.411628] pci_device_remove+0x59/0xf0 [ 2453.412026] device_release_driver_internal+0x243/0x2d0 [ 2453.412511] ? parse_option_str+0x14/0x90 [ 2453.412915] driver_detach+0x7b/0xf0 [ 2453.413289] bus_remove_driver+0xb5/0x160 [ 2453.413685] pci_unregister_driver+0x3f/0xf0 [ 2453.414104] mlx5_cleanup+0xc/0x20 [mlx5_core] Fixes: 2be5bd42a5bb ("net/mlx5: Handle pairing of E-switch via uplink un/load APIs") Signed-off-by: Jianbo Liu Reviewed-by: Vlad Buslov Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/en_rep.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 0b265a3f9b766..99b3843396f33 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -1160,6 +1160,10 @@ static int mlx5e_init_rep_tx(struct mlx5e_priv *priv) return err; } + err = mlx5e_rep_neigh_init(rpriv); + if (err) + goto err_neigh_init; + if (rpriv->rep->vport == MLX5_VPORT_UPLINK) { err = mlx5e_init_uplink_rep_tx(rpriv); if (err) @@ -1176,6 +1180,8 @@ static int mlx5e_init_rep_tx(struct mlx5e_priv *priv) if (rpriv->rep->vport == MLX5_VPORT_UPLINK) mlx5e_cleanup_uplink_rep_tx(rpriv); err_init_tx: + mlx5e_rep_neigh_cleanup(rpriv); +err_neigh_init: mlx5e_destroy_tises(priv); return err; } @@ -1189,22 +1195,17 @@ static void mlx5e_cleanup_rep_tx(struct mlx5e_priv *priv) if (rpriv->rep->vport == MLX5_VPORT_UPLINK) mlx5e_cleanup_uplink_rep_tx(rpriv); + mlx5e_rep_neigh_cleanup(rpriv); mlx5e_destroy_tises(priv); } static void mlx5e_rep_enable(struct mlx5e_priv *priv) { - struct mlx5e_rep_priv *rpriv = priv->ppriv; - mlx5e_set_netdev_mtu_boundaries(priv); - mlx5e_rep_neigh_init(rpriv); } static void mlx5e_rep_disable(struct mlx5e_priv *priv) { - struct mlx5e_rep_priv *rpriv = priv->ppriv; - - mlx5e_rep_neigh_cleanup(rpriv); } static int mlx5e_update_rep_rx(struct mlx5e_priv *priv) @@ -1254,7 +1255,6 @@ static int uplink_rep_async_event(struct notifier_block *nb, unsigned long event static void mlx5e_uplink_rep_enable(struct mlx5e_priv *priv) { - struct mlx5e_rep_priv *rpriv = priv->ppriv; struct net_device *netdev = priv->netdev; struct mlx5_core_dev *mdev = priv->mdev; u16 max_mtu; @@ -1276,7 +1276,6 @@ static void mlx5e_uplink_rep_enable(struct mlx5e_priv *priv) mlx5_notifier_register(mdev, &priv->events_nb); mlx5e_dcbnl_initialize(priv); mlx5e_dcbnl_init_app(priv); - mlx5e_rep_neigh_init(rpriv); mlx5e_rep_bridge_init(priv); netdev->wanted_features |= NETIF_F_HW_TC; @@ -1291,7 +1290,6 @@ static void mlx5e_uplink_rep_enable(struct mlx5e_priv *priv) static void mlx5e_uplink_rep_disable(struct mlx5e_priv *priv) { - struct mlx5e_rep_priv *rpriv = priv->ppriv; struct mlx5_core_dev *mdev = priv->mdev; rtnl_lock(); @@ -1301,7 +1299,6 @@ static void mlx5e_uplink_rep_disable(struct mlx5e_priv *priv) rtnl_unlock(); mlx5e_rep_bridge_cleanup(priv); - mlx5e_rep_neigh_cleanup(rpriv); mlx5e_dcbnl_delete_app(priv); mlx5_notifier_unregister(mdev, &priv->events_nb); mlx5e_rep_tc_disable(priv); From e0f52298fee449fec37e3e3c32df60008b509b16 Mon Sep 17 00:00:00 2001 From: Dragos Tatulea Date: Tue, 18 Jul 2023 11:13:33 +0300 Subject: [PATCH 26/98] net/mlx5e: xsk: Fix invalid buffer access for legacy rq The below crash can be encountered when using xdpsock in rx mode for legacy rq: the buffer gets released in the XDP_REDIRECT path, and then once again in the driver. This fix sets the flag to avoid releasing on the driver side. XSK handling of buffers for legacy rq was relying on the caller to set the skip release flag. But the referenced fix started using fragment counts for pages instead of the skip flag. Crash log: general protection fault, probably for non-canonical address 0xffff8881217e3a: 0000 [#1] SMP CPU: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 6.5.0-rc1+ #31 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:bpf_prog_03b13f331978c78c+0xf/0x28 Code: ... RSP: 0018:ffff88810082fc98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888138404901 RCX: c0ffffc900027cbc RDX: ffffffffa000b514 RSI: 00ffff8881217e32 RDI: ffff888138404901 RBP: ffff88810082fc98 R08: 0000000000091100 R09: 0000000000000006 R10: 0000000000000800 R11: 0000000000000800 R12: ffffc9000027a000 R13: ffff8881217e2dc0 R14: ffff8881217e2910 R15: ffff8881217e2f00 FS: 0000000000000000(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000564cb2e2cde0 CR3: 000000010e603004 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? die_addr+0x32/0x80 ? exc_general_protection+0x192/0x390 ? asm_exc_general_protection+0x22/0x30 ? 0xffffffffa000b514 ? bpf_prog_03b13f331978c78c+0xf/0x28 mlx5e_xdp_handle+0x48/0x670 [mlx5_core] ? dev_gro_receive+0x3b5/0x6e0 mlx5e_xsk_skb_from_cqe_linear+0x6e/0x90 [mlx5_core] mlx5e_handle_rx_cqe+0x55/0x100 [mlx5_core] mlx5e_poll_rx_cq+0x87/0x6e0 [mlx5_core] mlx5e_napi_poll+0x45e/0x6b0 [mlx5_core] __napi_poll+0x25/0x1a0 net_rx_action+0x28a/0x300 __do_softirq+0xcd/0x279 ? sort_range+0x20/0x20 run_ksoftirqd+0x1a/0x20 smpboot_thread_fn+0xa2/0x130 kthread+0xc9/0xf0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 Modules linked in: mlx5_ib mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter overlay zram zsmalloc fuse [last unloaded: mlx5_core] ---[ end trace 0000000000000000 ]--- Fixes: 7abd955a58fb ("net/mlx5e: RX, Fix page_pool page fragment tracking for XDP") Signed-off-by: Dragos Tatulea Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c index d97e6df66f454..b8dd744536553 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c @@ -323,8 +323,11 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq, net_prefetch(mxbuf->xdp.data); prog = rcu_dereference(rq->xdp_prog); - if (likely(prog && mlx5e_xdp_handle(rq, prog, mxbuf))) + if (likely(prog && mlx5e_xdp_handle(rq, prog, mxbuf))) { + if (likely(__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags))) + wi->flags |= BIT(MLX5E_WQE_FRAG_SKIP_RELEASE); return NULL; /* page/packet was consumed by XDP */ + } /* XDP_PASS: copy the data from the UMEM to a new SKB. The frame reuse * will be handled by mlx5e_free_rx_wqe. From 39646d9bcd1a65d2396328026626859a1dab59d7 Mon Sep 17 00:00:00 2001 From: Dragos Tatulea Date: Mon, 24 Apr 2023 18:19:00 +0300 Subject: [PATCH 27/98] net/mlx5e: xsk: Fix crash on regular rq reactivation When the regular rq is reactivated after the XSK socket is closed it could be reading stale cqes which eventually corrupts the rq. This leads to no more traffic being received on the regular rq and a crash on the next close or deactivation of the rq. Kal Cuttler Conely reported this issue as a crash on the release path when the xdpsock sample program is stopped (killed) and restarted in sequence while traffic is running. This patch flushes all cqes when during the rq flush. The cqe flushing is done in the reset state of the rq. mlx5e_rq_to_ready code is moved into the flush function to allow for this. Fixes: 082a9edf12fe ("net/mlx5e: xsk: Flush RQ on XSK activation to save memory") Reported-by: Kal Cutter Conley Closes: https://lore.kernel.org/xdp-newbies/CAHApi-nUAs4TeFWUDV915CZJo07XVg2Vp63-no7UDfj6wur9nQ@mail.gmail.com Signed-off-by: Dragos Tatulea Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- .../net/ethernet/mellanox/mlx5/core/en_main.c | 29 ++++++++++++++----- 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index defb1efccb78f..1c820119e438f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1036,7 +1036,23 @@ static int mlx5e_modify_rq_state(struct mlx5e_rq *rq, int curr_state, int next_s return err; } -static int mlx5e_rq_to_ready(struct mlx5e_rq *rq, int curr_state) +static void mlx5e_flush_rq_cq(struct mlx5e_rq *rq) +{ + struct mlx5_cqwq *cqwq = &rq->cq.wq; + struct mlx5_cqe64 *cqe; + + if (test_bit(MLX5E_RQ_STATE_MINI_CQE_ENHANCED, &rq->state)) { + while ((cqe = mlx5_cqwq_get_cqe_enahnced_comp(cqwq))) + mlx5_cqwq_pop(cqwq); + } else { + while ((cqe = mlx5_cqwq_get_cqe(cqwq))) + mlx5_cqwq_pop(cqwq); + } + + mlx5_cqwq_update_db_record(cqwq); +} + +int mlx5e_flush_rq(struct mlx5e_rq *rq, int curr_state) { struct net_device *dev = rq->netdev; int err; @@ -1046,6 +1062,10 @@ static int mlx5e_rq_to_ready(struct mlx5e_rq *rq, int curr_state) netdev_err(dev, "Failed to move rq 0x%x to reset\n", rq->rqn); return err; } + + mlx5e_free_rx_descs(rq); + mlx5e_flush_rq_cq(rq); + err = mlx5e_modify_rq_state(rq, MLX5_RQC_STATE_RST, MLX5_RQC_STATE_RDY); if (err) { netdev_err(dev, "Failed to move rq 0x%x to ready\n", rq->rqn); @@ -1055,13 +1075,6 @@ static int mlx5e_rq_to_ready(struct mlx5e_rq *rq, int curr_state) return 0; } -int mlx5e_flush_rq(struct mlx5e_rq *rq, int curr_state) -{ - mlx5e_free_rx_descs(rq); - - return mlx5e_rq_to_ready(rq, curr_state); -} - static int mlx5e_modify_rq_vsd(struct mlx5e_rq *rq, bool vsd) { struct mlx5_core_dev *mdev = rq->mdev; From eb02b93aad952008f1692cee5c5b13001e908407 Mon Sep 17 00:00:00 2001 From: Vlad Buslov Date: Tue, 27 Jun 2023 09:26:56 +0200 Subject: [PATCH 28/98] net/mlx5: Bridge, set debugfs access right to root-only As suggested during code review set the access rights for bridge 'fdb' debugfs file to root-only. Fixes: 791eb78285e8 ("net/mlx5: Bridge, expose FDB state via debugfs") Reported-by: Jakub Kicinski Link: https://lore.kernel.org/netdev/20230619120515.5045132a@kernel.org/ Signed-off-by: Vlad Buslov Reviewed-by: Gal Pressman Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_debugfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_debugfs.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_debugfs.c index b6a45eff28f55..dbd7cbe6cbf36 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_debugfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_debugfs.c @@ -64,7 +64,7 @@ void mlx5_esw_bridge_debugfs_init(struct net_device *br_netdev, struct mlx5_esw_ bridge->debugfs_dir = debugfs_create_dir(br_netdev->name, bridge->br_offloads->debugfs_root); - debugfs_create_file("fdb", 0444, bridge->debugfs_dir, bridge, + debugfs_create_file("fdb", 0400, bridge->debugfs_dir, bridge, &mlx5_esw_bridge_debugfs_fops); } From 3e4cf1dd2ce413f4be3e2c9062fb470e2ad2be88 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 8 May 2023 03:36:10 +0000 Subject: [PATCH 29/98] net/mlx5e: kTLS, Fix protection domain in use syndrome when devlink reload There are DEK objects cached in DEK pool after kTLS is used, and they are freed only in mlx5e_ktls_cleanup(). mlx5e_destroy_mdev_resources() is called in mlx5e_suspend() to free mdev resources, including protection domain (PD). However, PD is still referenced by the cached DEK objects in this case, because profile->cleanup() (and therefore mlx5e_ktls_cleanup()) is called after mlx5e_suspend() during devlink reload. So the following FW syndrome is generated: mlx5_cmd_out_err:803:(pid 12948): DEALLOC_PD(0x801) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0xef0c8a), err(-22) To avoid this syndrome, move DEK pool destruction to mlx5e_ktls_cleanup_tx(), which is called by profile->cleanup_tx(). And move pool creation to mlx5e_ktls_init_tx() for symmetry. Fixes: f741db1a5171 ("net/mlx5e: kTLS, Improve connection rate by using fast update encryption key") Signed-off-by: Jianbo Liu Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- .../mellanox/mlx5/core/en_accel/ktls.c | 8 ----- .../mellanox/mlx5/core/en_accel/ktls_tx.c | 29 +++++++++++++++++-- 2 files changed, 26 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c index cf704f106b7c2..984fa04bd331b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls.c @@ -188,7 +188,6 @@ static void mlx5e_tls_debugfs_init(struct mlx5e_tls *tls, int mlx5e_ktls_init(struct mlx5e_priv *priv) { - struct mlx5_crypto_dek_pool *dek_pool; struct mlx5e_tls *tls; if (!mlx5e_is_ktls_device(priv->mdev)) @@ -199,12 +198,6 @@ int mlx5e_ktls_init(struct mlx5e_priv *priv) return -ENOMEM; tls->mdev = priv->mdev; - dek_pool = mlx5_crypto_dek_pool_create(priv->mdev, MLX5_ACCEL_OBJ_TLS_KEY); - if (IS_ERR(dek_pool)) { - kfree(tls); - return PTR_ERR(dek_pool); - } - tls->dek_pool = dek_pool; priv->tls = tls; mlx5e_tls_debugfs_init(tls, priv->dfs_root); @@ -222,7 +215,6 @@ void mlx5e_ktls_cleanup(struct mlx5e_priv *priv) debugfs_remove_recursive(tls->debugfs.dfs); tls->debugfs.dfs = NULL; - mlx5_crypto_dek_pool_destroy(tls->dek_pool); kfree(priv->tls); priv->tls = NULL; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c index efb2cf74ad6a3..d61be26a4df1a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c @@ -908,28 +908,51 @@ static void mlx5e_tls_tx_debugfs_init(struct mlx5e_tls *tls, int mlx5e_ktls_init_tx(struct mlx5e_priv *priv) { + struct mlx5_crypto_dek_pool *dek_pool; struct mlx5e_tls *tls = priv->tls; + int err; + + if (!mlx5e_is_ktls_device(priv->mdev)) + return 0; + + /* DEK pool could be used by either or both of TX and RX. But we have to + * put the creation here to avoid syndrome when doing devlink reload. + */ + dek_pool = mlx5_crypto_dek_pool_create(priv->mdev, MLX5_ACCEL_OBJ_TLS_KEY); + if (IS_ERR(dek_pool)) + return PTR_ERR(dek_pool); + tls->dek_pool = dek_pool; if (!mlx5e_is_ktls_tx(priv->mdev)) return 0; priv->tls->tx_pool = mlx5e_tls_tx_pool_init(priv->mdev, &priv->tls->sw_stats); - if (!priv->tls->tx_pool) - return -ENOMEM; + if (!priv->tls->tx_pool) { + err = -ENOMEM; + goto err_tx_pool_init; + } mlx5e_tls_tx_debugfs_init(tls, tls->debugfs.dfs); return 0; + +err_tx_pool_init: + mlx5_crypto_dek_pool_destroy(dek_pool); + return err; } void mlx5e_ktls_cleanup_tx(struct mlx5e_priv *priv) { if (!mlx5e_is_ktls_tx(priv->mdev)) - return; + goto dek_pool_destroy; debugfs_remove_recursive(priv->tls->debugfs.dfs_tx); priv->tls->debugfs.dfs_tx = NULL; mlx5e_tls_tx_pool_cleanup(priv->tls->tx_pool); priv->tls->tx_pool = NULL; + +dek_pool_destroy: + if (mlx5e_is_ktls_device(priv->mdev)) + mlx5_crypto_dek_pool_destroy(priv->tls->dek_pool); } From 61eab651f6e96791cfad6db45f1107c398699b2d Mon Sep 17 00:00:00 2001 From: Chris Mi Date: Mon, 17 Jul 2023 08:32:51 +0300 Subject: [PATCH 30/98] net/mlx5: fs_chains: Fix ft prio if ignore_flow_level is not supported The cited commit sets ft prio to fs_base_prio. But if ignore_flow_level it not supported, ft prio must be set based on tc filter prio. Otherwise, all the ft prio are the same on the same chain. It is invalid if ignore_flow_level is not supported. Fix it by setting ft prio based on tc filter prio and setting fs_base_prio to 0 for fdb. Fixes: 8e80e5648092 ("net/mlx5: fs_chains: Refactor to detach chains from tc usage") Signed-off-by: Chris Mi Reviewed-by: Paul Blakey Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 1 - drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index 93b2b94d41cdd..bc04abb31f9c8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -1436,7 +1436,6 @@ esw_chains_create(struct mlx5_eswitch *esw, struct mlx5_flow_table *miss_fdb) esw_init_chains_offload_flags(esw, &attr.flags); attr.ns = MLX5_FLOW_NAMESPACE_FDB; - attr.fs_base_prio = FDB_TC_OFFLOAD; attr.max_grp_num = esw->params.large_group_num; attr.default_ft = miss_fdb; attr.mapping = esw->offloads.reg_c0_obj_pool; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c index db9df9798ffac..a80ecb672f33d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c @@ -178,7 +178,7 @@ mlx5_chains_create_table(struct mlx5_fs_chains *chains, if (!mlx5_chains_ignore_flow_level_supported(chains) || (chain == 0 && prio == 1 && level == 0)) { ft_attr.level = chains->fs_base_level; - ft_attr.prio = chains->fs_base_prio; + ft_attr.prio = chains->fs_base_prio + prio - 1; ns = (chains->ns == MLX5_FLOW_NAMESPACE_FDB) ? mlx5_get_fdb_sub_ns(chains->dev, chain) : mlx5_get_flow_namespace(chains->dev, chains->ns); From 62752c0bc67f79f064cbe2605054f99d52809e7b Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Wed, 14 Jun 2023 09:03:32 +0300 Subject: [PATCH 31/98] net/mlx5: DR, Fix peer domain namespace setting The offending patch is based on the assumption that for PFs, mlx5_get_dev_index() is the same as vhca_id. However, this assumption is wrong in case of DPU (ECPF). Fix it by using vhca_id directly, and switch the array of peers to xarray. Fixes: 6d5b7321d8af ("net/mlx5: DR, handle more than one peer domain") Signed-off-by: Shay Drory Reviewed-by: Yevgeny Kliteynik Signed-off-by: Saeed Mahameed --- .../mellanox/mlx5/core/eswitch_offloads.c | 14 +++++++------- .../net/ethernet/mellanox/mlx5/core/fs_cmd.c | 2 +- .../net/ethernet/mellanox/mlx5/core/fs_cmd.h | 2 +- .../net/ethernet/mellanox/mlx5/core/fs_core.c | 4 ++-- .../net/ethernet/mellanox/mlx5/core/fs_core.h | 2 +- .../mellanox/mlx5/core/steering/dr_action.c | 2 +- .../mellanox/mlx5/core/steering/dr_domain.c | 19 +++++++++++++------ .../mellanox/mlx5/core/steering/dr_ste_v0.c | 7 ++++--- .../mellanox/mlx5/core/steering/dr_ste_v1.c | 7 ++++--- .../mellanox/mlx5/core/steering/dr_types.h | 2 +- .../mellanox/mlx5/core/steering/fs_dr.c | 4 ++-- .../mellanox/mlx5/core/steering/mlx5dr.h | 2 +- 12 files changed, 38 insertions(+), 29 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index bc04abb31f9c8..e59380ee1ead3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -2778,9 +2778,9 @@ static int mlx5_esw_offloads_set_ns_peer(struct mlx5_eswitch *esw, struct mlx5_eswitch *peer_esw, bool pair) { - u8 peer_idx = mlx5_get_dev_index(peer_esw->dev); + u16 peer_vhca_id = MLX5_CAP_GEN(peer_esw->dev, vhca_id); + u16 vhca_id = MLX5_CAP_GEN(esw->dev, vhca_id); struct mlx5_flow_root_namespace *peer_ns; - u8 idx = mlx5_get_dev_index(esw->dev); struct mlx5_flow_root_namespace *ns; int err; @@ -2788,18 +2788,18 @@ static int mlx5_esw_offloads_set_ns_peer(struct mlx5_eswitch *esw, ns = esw->dev->priv.steering->fdb_root_ns; if (pair) { - err = mlx5_flow_namespace_set_peer(ns, peer_ns, peer_idx); + err = mlx5_flow_namespace_set_peer(ns, peer_ns, peer_vhca_id); if (err) return err; - err = mlx5_flow_namespace_set_peer(peer_ns, ns, idx); + err = mlx5_flow_namespace_set_peer(peer_ns, ns, vhca_id); if (err) { - mlx5_flow_namespace_set_peer(ns, NULL, peer_idx); + mlx5_flow_namespace_set_peer(ns, NULL, peer_vhca_id); return err; } } else { - mlx5_flow_namespace_set_peer(ns, NULL, peer_idx); - mlx5_flow_namespace_set_peer(peer_ns, NULL, idx); + mlx5_flow_namespace_set_peer(ns, NULL, peer_vhca_id); + mlx5_flow_namespace_set_peer(peer_ns, NULL, vhca_id); } return 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c index 91dcb0dcad101..aab7059bf6e97 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c @@ -140,7 +140,7 @@ static void mlx5_cmd_stub_modify_header_dealloc(struct mlx5_flow_root_namespace static int mlx5_cmd_stub_set_peer(struct mlx5_flow_root_namespace *ns, struct mlx5_flow_root_namespace *peer_ns, - u8 peer_idx) + u16 peer_vhca_id) { return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h index b6b9a5a20591d..7790ae5531e1d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h @@ -94,7 +94,7 @@ struct mlx5_flow_cmds { int (*set_peer)(struct mlx5_flow_root_namespace *ns, struct mlx5_flow_root_namespace *peer_ns, - u8 peer_idx); + u16 peer_vhca_id); int (*create_ns)(struct mlx5_flow_root_namespace *ns); int (*destroy_ns)(struct mlx5_flow_root_namespace *ns); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 4ef04aa287719..b4eb27e7f28b1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -3621,7 +3621,7 @@ void mlx5_destroy_match_definer(struct mlx5_core_dev *dev, int mlx5_flow_namespace_set_peer(struct mlx5_flow_root_namespace *ns, struct mlx5_flow_root_namespace *peer_ns, - u8 peer_idx) + u16 peer_vhca_id) { if (peer_ns && ns->mode != peer_ns->mode) { mlx5_core_err(ns->dev, @@ -3629,7 +3629,7 @@ int mlx5_flow_namespace_set_peer(struct mlx5_flow_root_namespace *ns, return -EINVAL; } - return ns->cmds->set_peer(ns, peer_ns, peer_idx); + return ns->cmds->set_peer(ns, peer_ns, peer_vhca_id); } /* This function should be called only at init stage of the namespace. diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h index 03e64c4c245da..4aed1768b85fb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h @@ -303,7 +303,7 @@ const struct mlx5_flow_cmds *mlx5_fs_cmd_get_fw_cmds(void); int mlx5_flow_namespace_set_peer(struct mlx5_flow_root_namespace *ns, struct mlx5_flow_root_namespace *peer_ns, - u8 peer_idx); + u16 peer_vhca_id); int mlx5_flow_namespace_set_mode(struct mlx5_flow_namespace *ns, enum mlx5_flow_steering_mode mode); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c index e739ec6cdf901..54bb0866ed727 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c @@ -2079,7 +2079,7 @@ mlx5dr_action_create_dest_vport(struct mlx5dr_domain *dmn, peer_vport = vhca_id_valid && mlx5_core_is_pf(dmn->mdev) && (vhca_id != dmn->info.caps.gvmi); - vport_dmn = peer_vport ? dmn->peer_dmn[vhca_id] : dmn; + vport_dmn = peer_vport ? xa_load(&dmn->peer_dmn_xa, vhca_id) : dmn; if (!vport_dmn) { mlx5dr_dbg(dmn, "No peer vport domain for given vhca_id\n"); return NULL; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c index 75dc85dc24ef2..3d74109f82300 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_domain.c @@ -475,6 +475,7 @@ mlx5dr_domain_create(struct mlx5_core_dev *mdev, enum mlx5dr_domain_type type) mutex_init(&dmn->info.rx.mutex); mutex_init(&dmn->info.tx.mutex); xa_init(&dmn->definers_xa); + xa_init(&dmn->peer_dmn_xa); if (dr_domain_caps_init(mdev, dmn)) { mlx5dr_err(dmn, "Failed init domain, no caps\n"); @@ -507,6 +508,7 @@ mlx5dr_domain_create(struct mlx5_core_dev *mdev, enum mlx5dr_domain_type type) uninit_caps: dr_domain_caps_uninit(dmn); def_xa_destroy: + xa_destroy(&dmn->peer_dmn_xa); xa_destroy(&dmn->definers_xa); kfree(dmn); return NULL; @@ -547,6 +549,7 @@ int mlx5dr_domain_destroy(struct mlx5dr_domain *dmn) dr_domain_uninit_csum_recalc_fts(dmn); dr_domain_uninit_resources(dmn); dr_domain_caps_uninit(dmn); + xa_destroy(&dmn->peer_dmn_xa); xa_destroy(&dmn->definers_xa); mutex_destroy(&dmn->info.tx.mutex); mutex_destroy(&dmn->info.rx.mutex); @@ -556,17 +559,21 @@ int mlx5dr_domain_destroy(struct mlx5dr_domain *dmn) void mlx5dr_domain_set_peer(struct mlx5dr_domain *dmn, struct mlx5dr_domain *peer_dmn, - u8 peer_idx) + u16 peer_vhca_id) { + struct mlx5dr_domain *peer; + mlx5dr_domain_lock(dmn); - if (dmn->peer_dmn[peer_idx]) - refcount_dec(&dmn->peer_dmn[peer_idx]->refcount); + peer = xa_load(&dmn->peer_dmn_xa, peer_vhca_id); + if (peer) + refcount_dec(&peer->refcount); - dmn->peer_dmn[peer_idx] = peer_dmn; + WARN_ON(xa_err(xa_store(&dmn->peer_dmn_xa, peer_vhca_id, peer_dmn, GFP_KERNEL))); - if (dmn->peer_dmn[peer_idx]) - refcount_inc(&dmn->peer_dmn[peer_idx]->refcount); + peer = xa_load(&dmn->peer_dmn_xa, peer_vhca_id); + if (peer) + refcount_inc(&peer->refcount); mlx5dr_domain_unlock(dmn); } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v0.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v0.c index 69d7a8f3c402e..f708b029425ac 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v0.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v0.c @@ -1652,17 +1652,18 @@ dr_ste_v0_build_src_gvmi_qpn_tag(struct mlx5dr_match_param *value, struct mlx5dr_domain *dmn = sb->dmn; struct mlx5dr_domain *vport_dmn; u8 *bit_mask = sb->bit_mask; + struct mlx5dr_domain *peer; bool source_gvmi_set; DR_STE_SET_TAG(src_gvmi_qp, tag, source_qp, misc, source_sqn); if (sb->vhca_id_valid) { + peer = xa_load(&dmn->peer_dmn_xa, id); /* Find port GVMI based on the eswitch_owner_vhca_id */ if (id == dmn->info.caps.gvmi) vport_dmn = dmn; - else if (id < MLX5_MAX_PORTS && dmn->peer_dmn[id] && - (id == dmn->peer_dmn[id]->info.caps.gvmi)) - vport_dmn = dmn->peer_dmn[id]; + else if (peer && (id == peer->info.caps.gvmi)) + vport_dmn = peer; else return -EINVAL; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v1.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v1.c index f4ef0b22b9912..dd856cde188de 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v1.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste_v1.c @@ -1984,16 +1984,17 @@ static int dr_ste_v1_build_src_gvmi_qpn_tag(struct mlx5dr_match_param *value, struct mlx5dr_domain *dmn = sb->dmn; struct mlx5dr_domain *vport_dmn; u8 *bit_mask = sb->bit_mask; + struct mlx5dr_domain *peer; DR_STE_SET_TAG(src_gvmi_qp_v1, tag, source_qp, misc, source_sqn); if (sb->vhca_id_valid) { + peer = xa_load(&dmn->peer_dmn_xa, id); /* Find port GVMI based on the eswitch_owner_vhca_id */ if (id == dmn->info.caps.gvmi) vport_dmn = dmn; - else if (id < MLX5_MAX_PORTS && dmn->peer_dmn[id] && - (id == dmn->peer_dmn[id]->info.caps.gvmi)) - vport_dmn = dmn->peer_dmn[id]; + else if (peer && (id == peer->info.caps.gvmi)) + vport_dmn = peer; else return -EINVAL; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h index 1622dbbe6b970..6c59de3e28f65 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_types.h @@ -935,7 +935,6 @@ struct mlx5dr_domain_info { }; struct mlx5dr_domain { - struct mlx5dr_domain *peer_dmn[MLX5_MAX_PORTS]; struct mlx5_core_dev *mdev; u32 pdn; struct mlx5_uars_page *uar; @@ -956,6 +955,7 @@ struct mlx5dr_domain { struct list_head dbg_tbl_list; struct mlx5dr_dbg_dump_info dump_info; struct xarray definers_xa; + struct xarray peer_dmn_xa; /* memory management statistics */ u32 num_buddies[DR_ICM_TYPE_MAX]; }; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.c index 6aac5f006bf8b..feb307fb3440b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/fs_dr.c @@ -781,14 +781,14 @@ static int mlx5_cmd_dr_update_fte(struct mlx5_flow_root_namespace *ns, static int mlx5_cmd_dr_set_peer(struct mlx5_flow_root_namespace *ns, struct mlx5_flow_root_namespace *peer_ns, - u8 peer_idx) + u16 peer_vhca_id) { struct mlx5dr_domain *peer_domain = NULL; if (peer_ns) peer_domain = peer_ns->fs_dr_domain.dr_domain; mlx5dr_domain_set_peer(ns->fs_dr_domain.dr_domain, - peer_domain, peer_idx); + peer_domain, peer_vhca_id); return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h index 24cbb33ecd6c9..89fced86936f6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/mlx5dr.h @@ -49,7 +49,7 @@ int mlx5dr_domain_sync(struct mlx5dr_domain *domain, u32 flags); void mlx5dr_domain_set_peer(struct mlx5dr_domain *dmn, struct mlx5dr_domain *peer_dmn, - u8 peer_idx); + u16 peer_vhca_id); struct mlx5dr_table * mlx5dr_table_create(struct mlx5dr_domain *domain, u32 level, u32 flags, From 53d737dfd3d7b023fa9fa445ea3f3db0ac9da402 Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Sun, 25 Jun 2023 11:07:38 +0300 Subject: [PATCH 32/98] net/mlx5: Unregister devlink params in case interface is down Currently, in case an interface is down, mlx5 driver doesn't unregister its devlink params, which leads to this WARN[1]. Fix it by unregistering devlink params in that case as well. [1] [ 295.244769 ] WARNING: CPU: 15 PID: 1 at net/core/devlink.c:9042 devlink_free+0x174/0x1fc [ 295.488379 ] CPU: 15 PID: 1 Comm: shutdown Tainted: G S OE 5.15.0-1017.19.3.g0677e61-bluefield #g0677e61 [ 295.509330 ] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.2.0.12761 Jun 6 2023 [ 295.543096 ] pc : devlink_free+0x174/0x1fc [ 295.551104 ] lr : mlx5_devlink_free+0x18/0x2c [mlx5_core] [ 295.561816 ] sp : ffff80000809b850 [ 295.711155 ] Call trace: [ 295.716030 ] devlink_free+0x174/0x1fc [ 295.723346 ] mlx5_devlink_free+0x18/0x2c [mlx5_core] [ 295.733351 ] mlx5_sf_dev_remove+0x98/0xb0 [mlx5_core] [ 295.743534 ] auxiliary_bus_remove+0x2c/0x50 [ 295.751893 ] __device_release_driver+0x19c/0x280 [ 295.761120 ] device_release_driver+0x34/0x50 [ 295.769649 ] bus_remove_device+0xdc/0x170 [ 295.777656 ] device_del+0x17c/0x3a4 [ 295.784620 ] mlx5_sf_dev_remove+0x28/0xf0 [mlx5_core] [ 295.794800 ] mlx5_sf_dev_table_destroy+0x98/0x110 [mlx5_core] [ 295.806375 ] mlx5_unload+0x34/0xd0 [mlx5_core] [ 295.815339 ] mlx5_unload_one+0x70/0xe4 [mlx5_core] [ 295.824998 ] shutdown+0xb0/0xd8 [mlx5_core] [ 295.833439 ] pci_device_shutdown+0x3c/0xa0 [ 295.841651 ] device_shutdown+0x170/0x340 [ 295.849486 ] __do_sys_reboot+0x1f4/0x2a0 [ 295.857322 ] __arm64_sys_reboot+0x2c/0x40 [ 295.865329 ] invoke_syscall+0x78/0x100 [ 295.872817 ] el0_svc_common.constprop.0+0x54/0x184 [ 295.882392 ] do_el0_svc+0x30/0xac [ 295.889008 ] el0_svc+0x48/0x160 [ 295.895278 ] el0t_64_sync_handler+0xa4/0x130 [ 295.903807 ] el0t_64_sync+0x1a4/0x1a8 [ 295.911120 ] ---[ end trace 4f1d2381d00d9dce ]--- Fixes: fe578cbb2f05 ("net/mlx5: Move devlink registration before mlx5_load") Signed-off-by: Shay Drory Reviewed-by: Maher Sanalla Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 88dbea6631d50..f42abc2ea73c3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1506,6 +1506,7 @@ void mlx5_uninit_one(struct mlx5_core_dev *dev) if (!test_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state)) { mlx5_core_warn(dev, "%s: interface is down, NOP\n", __func__); + mlx5_devlink_params_unregister(priv_to_devlink(dev)); mlx5_cleanup_once(dev); goto out; } From bcc29b7f5af6797702c2306a7aacb831fc5ce9cb Mon Sep 17 00:00:00 2001 From: Lin Ma Date: Tue, 25 Jul 2023 10:33:30 +0800 Subject: [PATCH 33/98] bpf: Add length check for SK_DIAG_BPF_STORAGE_REQ_MAP_FD parsing The nla_for_each_nested parsing in function bpf_sk_storage_diag_alloc does not check the length of the nested attribute. This can lead to an out-of-attribute read and allow a malformed nlattr (e.g., length 0) to be viewed as a 4 byte integer. This patch adds an additional check when the nlattr is getting counted. This makes sure the latter nla_get_u32 can access the attributes with the correct length. Fixes: 1ed4d92458a9 ("bpf: INET_DIAG support in bpf_sk_storage") Suggested-by: Jakub Kicinski Signed-off-by: Lin Ma Reviewed-by: Jakub Kicinski Link: https://lore.kernel.org/r/20230725023330.422856-1-linma@zju.edu.cn Signed-off-by: Martin KaFai Lau --- net/core/bpf_sk_storage.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index d4172534dfa8d..cca7594be92ec 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -496,8 +496,11 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs) return ERR_PTR(-EPERM); nla_for_each_nested(nla, nla_stgs, rem) { - if (nla_type(nla) == SK_DIAG_BPF_STORAGE_REQ_MAP_FD) + if (nla_type(nla) == SK_DIAG_BPF_STORAGE_REQ_MAP_FD) { + if (nla_len(nla) != sizeof(u32)) + return ERR_PTR(-EINVAL); nr_maps++; + } } diag = kzalloc(struct_size(diag, maps, nr_maps), GFP_KERNEL); From d73ef2d69c0dba5f5a1cb9600045c873bab1fb7f Mon Sep 17 00:00:00 2001 From: Lin Ma Date: Wed, 26 Jul 2023 15:53:14 +0800 Subject: [PATCH 34/98] rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length There are totally 9 ndo_bridge_setlink handlers in the current kernel, which are 1) bnxt_bridge_setlink, 2) be_ndo_bridge_setlink 3) i40e_ndo_bridge_setlink 4) ice_bridge_setlink 5) ixgbe_ndo_bridge_setlink 6) mlx5e_bridge_setlink 7) nfp_net_bridge_setlink 8) qeth_l2_bridge_setlink 9) br_setlink. By investigating the code, we find that 1-7 parse and use nlattr IFLA_BRIDGE_MODE but 3 and 4 forget to do the nla_len check. This can lead to an out-of-attribute read and allow a malformed nlattr (e.g., length 0) to be viewed as a 2 byte integer. To avoid such issues, also for other ndo_bridge_setlink handlers in the future. This patch adds the nla_len check in rtnl_bridge_setlink and does an early error return if length mismatches. To make it works, the break is removed from the parsing for IFLA_BRIDGE_FLAGS to make sure this nla_for_each_nested iterates every attribute. Fixes: b1edc14a3fbf ("ice: Implement ice_bridge_getlink and ice_bridge_setlink") Fixes: 51616018dd1b ("i40e: Add support for getlink, setlink ndo ops") Suggested-by: Jakub Kicinski Signed-off-by: Lin Ma Acked-by: Nikolay Aleksandrov Reviewed-by: Hangbin Liu Link: https://lore.kernel.org/r/20230726075314.1059224-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski --- net/core/rtnetlink.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 3ad4e030846d8..aef25aa5cf1dd 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -5140,13 +5140,17 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC); if (br_spec) { nla_for_each_nested(attr, br_spec, rem) { - if (nla_type(attr) == IFLA_BRIDGE_FLAGS) { + if (nla_type(attr) == IFLA_BRIDGE_FLAGS && !have_flags) { if (nla_len(attr) < sizeof(flags)) return -EINVAL; have_flags = true; flags = nla_get_u16(attr); - break; + } + + if (nla_type(attr) == IFLA_BRIDGE_MODE) { + if (nla_len(attr) < sizeof(u16)) + return -EINVAL; } } } From 9945c1fb03a3c9f7e0dcf9aa17041a70e551387a Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Wed, 26 Jul 2023 15:45:16 +0100 Subject: [PATCH 35/98] net: dsa: fix older DSA drivers using phylink Older DSA drivers that do not provide an dsa_ops adjust_link method end up using phylink. Unfortunately, a recent phylink change that requires its supported_interfaces bitmap to be filled breaks these drivers because the bitmap remains empty. Rather than fixing each driver individually, fix it in the core code so we have a sensible set of defaults. Reported-by: Sergei Antonov Fixes: de5c9bf40c45 ("net: phylink: require supported_interfaces to be filled") Signed-off-by: Russell King (Oracle) Reviewed-by: Vladimir Oltean Tested-by: Vladimir Oltean # dsa_loop Reviewed-by: Florian Fainelli Link: https://lore.kernel.org/r/E1qOflM-001AEz-D3@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski --- net/dsa/port.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/net/dsa/port.c b/net/dsa/port.c index 0ce8fd311c787..2f6195d7b7412 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -1727,8 +1727,15 @@ int dsa_port_phylink_create(struct dsa_port *dp) ds->ops->phylink_mac_an_restart) dp->pl_config.legacy_pre_march2020 = true; - if (ds->ops->phylink_get_caps) + if (ds->ops->phylink_get_caps) { ds->ops->phylink_get_caps(ds, dp->index, &dp->pl_config); + } else { + /* For legacy drivers */ + __set_bit(PHY_INTERFACE_MODE_INTERNAL, + dp->pl_config.supported_interfaces); + __set_bit(PHY_INTERFACE_MODE_GMII, + dp->pl_config.supported_interfaces); + } pl = phylink_create(&dp->pl_config, of_fwnode_handle(dp->dn), mode, &dsa_port_phylink_mac_ops); From fa467226669c09520bfb3e67fca5aeff947cdf17 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 26 Jul 2023 08:11:20 -0700 Subject: [PATCH 36/98] MAINTAINERS: stmmac: retire Giuseppe Cavallaro I tried to get stmmac maintainers to be more active by agreeing with them off-list on a review rotation. I pinged Peppe 3 times over 2 weeks during his "shift month", no reviews are flowing. All the contributions are much appreciated! But stmmac is quite active, we need participating maintainers :( Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230726151120.1649474-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- MAINTAINERS | 1 - 1 file changed, 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index d516295978a4c..20f6174d9747a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -20403,7 +20403,6 @@ F: drivers/pwm/pwm-stm32* F: include/linux/*/stm32-*tim* STMMAC ETHERNET DRIVER -M: Giuseppe Cavallaro M: Alexandre Torgue M: Jose Abreu L: netdev@vger.kernel.org From 4d50e50045aa46d9f3e578ed2edea9bd0a123d24 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 26 Jul 2023 14:58:15 +0000 Subject: [PATCH 37/98] net: flower: fix stack-out-of-bounds in fl_set_key_cfm() Typical misuse of nla_parse_nested(array, XXX_MAX, ...); array must be declared as struct nlattr *array[XXX_MAX + 1]; v2: Based on feedbacks from Ido Schimmel and Zahari Doychev, I also changed TCA_FLOWER_KEY_CFM_OPT_MAX and cfm_opt_policy definitions. syzbot reported: BUG: KASAN: stack-out-of-bounds in __nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588 Write of size 32 at addr ffffc90003a0ee20 by task syz-executor296/5014 CPU: 0 PID: 5014 Comm: syz-executor296 Not tainted 6.5.0-rc2-syzkaller-00307-gd192f5382581 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2023 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:364 [inline] print_report+0x163/0x540 mm/kasan/report.c:475 kasan_report+0x175/0x1b0 mm/kasan/report.c:588 kasan_check_range+0x27e/0x290 mm/kasan/generic.c:187 __asan_memset+0x23/0x40 mm/kasan/shadow.c:84 __nla_validate_parse+0x136/0x2bd0 lib/nlattr.c:588 __nla_parse+0x40/0x50 lib/nlattr.c:700 nla_parse_nested include/net/netlink.h:1262 [inline] fl_set_key_cfm+0x1e3/0x440 net/sched/cls_flower.c:1718 fl_set_key+0x2168/0x6620 net/sched/cls_flower.c:1884 fl_tmplt_create+0x1fe/0x510 net/sched/cls_flower.c:2666 tc_chain_tmplt_add net/sched/cls_api.c:2959 [inline] tc_ctl_chain+0x131d/0x1ac0 net/sched/cls_api.c:3068 rtnetlink_rcv_msg+0x82b/0xf50 net/core/rtnetlink.c:6424 netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2549 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x7c3/0x990 net/netlink/af_netlink.c:1365 netlink_sendmsg+0xa2a/0xd60 net/netlink/af_netlink.c:1914 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg net/socket.c:748 [inline] ____sys_sendmsg+0x592/0x890 net/socket.c:2494 ___sys_sendmsg net/socket.c:2548 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2577 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f54c6150759 Code: 48 83 c4 28 c3 e8 d7 19 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffe06c30578 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f54c619902d RCX: 00007f54c6150759 RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003 RBP: 00007ffe06c30590 R08: 0000000000000000 R09: 00007ffe06c305f0 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f54c61c35f0 R13: 00007ffe06c30778 R14: 0000000000000001 R15: 0000000000000001 The buggy address belongs to stack of task syz-executor296/5014 and is located at offset 32 in frame: fl_set_key_cfm+0x0/0x440 net/sched/cls_flower.c:374 This frame has 1 object: [32, 56) 'nla_cfm_opt' The buggy address belongs to the virtual mapping at [ffffc90003a08000, ffffc90003a11000) created by: copy_process+0x5c8/0x4290 kernel/fork.c:2330 Fixes: 7cfffd5fed3e ("net: flower: add support for matching cfm fields") Reported-by: syzbot Signed-off-by: Eric Dumazet Cc: Simon Horman Reviewed-by: Ido Schimmel Reviewed-by: Zahari Doychev Link: https://lore.kernel.org/r/20230726145815.943910-1-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/uapi/linux/pkt_cls.h | 4 +++- net/sched/cls_flower.c | 5 +++-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index 7865f5a9885b9..4f3932bb712dc 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -710,9 +710,11 @@ enum { TCA_FLOWER_KEY_CFM_OPT_UNSPEC, TCA_FLOWER_KEY_CFM_MD_LEVEL, TCA_FLOWER_KEY_CFM_OPCODE, - TCA_FLOWER_KEY_CFM_OPT_MAX, + __TCA_FLOWER_KEY_CFM_OPT_MAX, }; +#define TCA_FLOWER_KEY_CFM_OPT_MAX (__TCA_FLOWER_KEY_CFM_OPT_MAX - 1) + #define TCA_FLOWER_MASK_FLAGS_RANGE (1 << 0) /* Range-based match */ /* Match-all classifier */ diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 8da9d039d964e..9f0711da9c959 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -776,7 +776,8 @@ mpls_stack_entry_policy[TCA_FLOWER_KEY_MPLS_OPT_LSE_MAX + 1] = { [TCA_FLOWER_KEY_MPLS_OPT_LSE_LABEL] = { .type = NLA_U32 }, }; -static const struct nla_policy cfm_opt_policy[TCA_FLOWER_KEY_CFM_OPT_MAX] = { +static const struct nla_policy +cfm_opt_policy[TCA_FLOWER_KEY_CFM_OPT_MAX + 1] = { [TCA_FLOWER_KEY_CFM_MD_LEVEL] = NLA_POLICY_MAX(NLA_U8, FLOW_DIS_CFM_MDL_MAX), [TCA_FLOWER_KEY_CFM_OPCODE] = { .type = NLA_U8 }, @@ -1709,7 +1710,7 @@ static int fl_set_key_cfm(struct nlattr **tb, struct fl_flow_key *mask, struct netlink_ext_ack *extack) { - struct nlattr *nla_cfm_opt[TCA_FLOWER_KEY_CFM_OPT_MAX]; + struct nlattr *nla_cfm_opt[TCA_FLOWER_KEY_CFM_OPT_MAX + 1]; int err; if (!tb[TCA_FLOWER_KEY_CFM]) From dadc5b86cc9459581f37fe755b431adc399ea393 Mon Sep 17 00:00:00 2001 From: Yuanjun Gong Date: Thu, 27 Jul 2023 01:05:06 +0800 Subject: [PATCH 38/98] net: dsa: fix value check in bcm_sf2_sw_probe() in bcm_sf2_sw_probe(), check the return value of clk_prepare_enable() and return the error code if clk_prepare_enable() returns an unexpected value. Fixes: e9ec5c3bd238 ("net: dsa: bcm_sf2: request and handle clocks") Signed-off-by: Yuanjun Gong Reviewed-by: Florian Fainelli Link: https://lore.kernel.org/r/20230726170506.16547-1-ruc_gongyuanjun@163.com Signed-off-by: Jakub Kicinski --- drivers/net/dsa/bcm_sf2.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c index cde253d27bd08..72374b066f64a 100644 --- a/drivers/net/dsa/bcm_sf2.c +++ b/drivers/net/dsa/bcm_sf2.c @@ -1436,7 +1436,9 @@ static int bcm_sf2_sw_probe(struct platform_device *pdev) if (IS_ERR(priv->clk)) return PTR_ERR(priv->clk); - clk_prepare_enable(priv->clk); + ret = clk_prepare_enable(priv->clk); + if (ret) + return ret; priv->clk_mdiv = devm_clk_get_optional(&pdev->dev, "sw_switch_mdiv"); if (IS_ERR(priv->clk_mdiv)) { @@ -1444,7 +1446,9 @@ static int bcm_sf2_sw_probe(struct platform_device *pdev) goto out_clk; } - clk_prepare_enable(priv->clk_mdiv); + ret = clk_prepare_enable(priv->clk_mdiv); + if (ret) + goto out_clk; ret = bcm_sf2_sw_rst(priv); if (ret) { From 5416d7925e6ee72bf1d35cad1957c9a194554da4 Mon Sep 17 00:00:00 2001 From: Eugen Hristev Date: Wed, 26 Jul 2023 10:06:15 +0300 Subject: [PATCH 39/98] dt-bindings: net: rockchip-dwmac: fix {tx|rx}-delay defaults/range in schema The range and the defaults are specified in the description instead of being specified in the schema. Fix it by adding the default value in the `default` field and specifying the range as `minimum` and `maximum`. Fixes: b331b8ef86f0 ("dt-bindings: net: convert rockchip-dwmac to json-schema") Signed-off-by: Eugen Hristev Signed-off-by: David S. Miller --- .../devicetree/bindings/net/rockchip-dwmac.yaml | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml index 176ea5f90251a..7f324c6da915d 100644 --- a/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml +++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.yaml @@ -91,12 +91,18 @@ properties: $ref: /schemas/types.yaml#/definitions/phandle tx_delay: - description: Delay value for TXD timing. Range value is 0~0x7F, 0x30 as default. + description: Delay value for TXD timing. $ref: /schemas/types.yaml#/definitions/uint32 + minimum: 0 + maximum: 0x7F + default: 0x30 rx_delay: - description: Delay value for RXD timing. Range value is 0~0x7F, 0x10 as default. + description: Delay value for RXD timing. $ref: /schemas/types.yaml#/definitions/uint32 + minimum: 0 + maximum: 0x7F + default: 0x10 phy-supply: description: PHY regulator From e68409db995380d1badacba41ff24996bd396171 Mon Sep 17 00:00:00 2001 From: Jamal Hadi Salim Date: Wed, 26 Jul 2023 09:51:51 -0400 Subject: [PATCH 40/98] net: sched: cls_u32: Fix match key mis-addressing A match entry is uniquely identified with an "address" or "path" in the form of: hashtable ID(12b):bucketid(8b):nodeid(12b). When creating table match entries all of hash table id, bucket id and node (match entry id) are needed to be either specified by the user or reasonable in-kernel defaults are used. The in-kernel default for a table id is 0x800(omnipresent root table); for bucketid it is 0x0. Prior to this fix there was none for a nodeid i.e. the code assumed that the user passed the correct nodeid and if the user passes a nodeid of 0 (as Mingi Cho did) then that is what was used. But nodeid of 0 is reserved for identifying the table. This is not a problem until we dump. The dump code notices that the nodeid is zero and assumes it is referencing a table and therefore references table struct tc_u_hnode instead of what was created i.e match entry struct tc_u_knode. Ming does an equivalent of: tc filter add dev dummy0 parent 10: prio 1 handle 0x1000 \ protocol ip u32 match ip src 10.0.0.1/32 classid 10:1 action ok Essentially specifying a table id 0, bucketid 1 and nodeid of zero Tableid 0 is remapped to the default of 0x800. Bucketid 1 is ignored and defaults to 0x00. Nodeid was assumed to be what Ming passed - 0x000 dumping before fix shows: ~$ tc filter ls dev dummy0 parent 10: filter protocol ip pref 1 u32 chain 0 filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1 filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor -30591 Note that the last line reports a table instead of a match entry (you can tell this because it says "ht divisor..."). As a result of reporting the wrong data type (misinterpretting of struct tc_u_knode as being struct tc_u_hnode) the divisor is reported with value of -30591. Ming identified this as part of the heap address (physmap_base is 0xffff8880 (-30591 - 1)). The fix is to ensure that when table entry matches are added and no nodeid is specified (i.e nodeid == 0) then we get the next available nodeid from the table's pool. After the fix, this is what the dump shows: $ tc filter ls dev dummy0 parent 10: filter protocol ip pref 1 u32 chain 0 filter protocol ip pref 1 u32 chain 0 fh 800: ht divisor 1 filter protocol ip pref 1 u32 chain 0 fh 800::800 order 2048 key ht 800 bkt 0 flowid 10:1 not_in_hw match 0a000001/ffffffff at 12 action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1 Reported-by: Mingi Cho Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jamal Hadi Salim Link: https://lore.kernel.org/r/20230726135151.416917-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski --- net/sched/cls_u32.c | 56 ++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 50 insertions(+), 6 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 5abf31e432caf..907e58841fe80 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -1024,18 +1024,62 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, return -EINVAL; } + /* At this point, we need to derive the new handle that will be used to + * uniquely map the identity of this table match entry. The + * identity of the entry that we need to construct is 32 bits made of: + * htid(12b):bucketid(8b):node/entryid(12b) + * + * At this point _we have the table(ht)_ in which we will insert this + * entry. We carry the table's id in variable "htid". + * Note that earlier code picked the ht selection either by a) the user + * providing the htid specified via TCA_U32_HASH attribute or b) when + * no such attribute is passed then the root ht, is default to at ID + * 0x[800][00][000]. Rule: the root table has a single bucket with ID 0. + * If OTOH the user passed us the htid, they may also pass a bucketid of + * choice. 0 is fine. For example a user htid is 0x[600][01][000] it is + * indicating hash bucketid of 1. Rule: the entry/node ID _cannot_ be + * passed via the htid, so even if it was non-zero it will be ignored. + * + * We may also have a handle, if the user passed one. The handle also + * carries the same addressing of htid(12b):bucketid(8b):node/entryid(12b). + * Rule: the bucketid on the handle is ignored even if one was passed; + * rather the value on "htid" is always assumed to be the bucketid. + */ if (handle) { + /* Rule: The htid from handle and tableid from htid must match */ if (TC_U32_HTID(handle) && TC_U32_HTID(handle ^ htid)) { NL_SET_ERR_MSG_MOD(extack, "Handle specified hash table address mismatch"); return -EINVAL; } - handle = htid | TC_U32_NODE(handle); - err = idr_alloc_u32(&ht->handle_idr, NULL, &handle, handle, - GFP_KERNEL); - if (err) - return err; - } else + /* Ok, so far we have a valid htid(12b):bucketid(8b) but we + * need to finalize the table entry identification with the last + * part - the node/entryid(12b)). Rule: Nodeid _cannot be 0_ for + * entries. Rule: nodeid of 0 is reserved only for tables(see + * earlier code which processes TC_U32_DIVISOR attribute). + * Rule: The nodeid can only be derived from the handle (and not + * htid). + * Rule: if the handle specified zero for the node id example + * 0x60000000, then pick a new nodeid from the pool of IDs + * this hash table has been allocating from. + * If OTOH it is specified (i.e for example the user passed a + * handle such as 0x60000123), then we use it generate our final + * handle which is used to uniquely identify the match entry. + */ + if (!TC_U32_NODE(handle)) { + handle = gen_new_kid(ht, htid); + } else { + handle = htid | TC_U32_NODE(handle); + err = idr_alloc_u32(&ht->handle_idr, NULL, &handle, + handle, GFP_KERNEL); + if (err) + return err; + } + } else { + /* The user did not give us a handle; lets just generate one + * from the table's pool of nodeids. + */ handle = gen_new_kid(ht, htid); + } if (tb[TCA_U32_SEL] == NULL) { NL_SET_ERR_MSG_MOD(extack, "Selector not specified"); From 56c6be35fcbed54279df0a2c9e60480a61841d6f Mon Sep 17 00:00:00 2001 From: Chengfeng Ye Date: Thu, 27 Jul 2023 08:56:19 +0000 Subject: [PATCH 41/98] mISDN: hfcpci: Fix potential deadlock on &hc->lock As &hc->lock is acquired by both timer _hfcpci_softirq() and hardirq hfcpci_int(), the timer should disable irq before lock acquisition otherwise deadlock could happen if the timmer is preemtped by the hadr irq. Possible deadlock scenario: hfcpci_softirq() (timer) -> _hfcpci_softirq() -> spin_lock(&hc->lock); -> hfcpci_int() -> spin_lock(&hc->lock); (deadlock here) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. The tentative patch fixes the potential deadlock by spin_lock_irq() in timer. Fixes: b36b654a7e82 ("mISDN: Create /sys/class/mISDN") Signed-off-by: Chengfeng Ye Link: https://lore.kernel.org/r/20230727085619.7419-1-dg573847474@gmail.com Signed-off-by: Jakub Kicinski --- drivers/isdn/hardware/mISDN/hfcpci.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/isdn/hardware/mISDN/hfcpci.c b/drivers/isdn/hardware/mISDN/hfcpci.c index c0331b2680108..fe391de1aba32 100644 --- a/drivers/isdn/hardware/mISDN/hfcpci.c +++ b/drivers/isdn/hardware/mISDN/hfcpci.c @@ -839,7 +839,7 @@ hfcpci_fill_fifo(struct bchannel *bch) *z1t = cpu_to_le16(new_z1); /* now send data */ if (bch->tx_idx < bch->tx_skb->len) return; - dev_kfree_skb(bch->tx_skb); + dev_kfree_skb_any(bch->tx_skb); if (get_next_bframe(bch)) goto next_t_frame; return; @@ -895,7 +895,7 @@ hfcpci_fill_fifo(struct bchannel *bch) } bz->za[new_f1].z1 = cpu_to_le16(new_z1); /* for next buffer */ bz->f1 = new_f1; /* next frame */ - dev_kfree_skb(bch->tx_skb); + dev_kfree_skb_any(bch->tx_skb); get_next_bframe(bch); } @@ -1119,7 +1119,7 @@ tx_birq(struct bchannel *bch) if (bch->tx_skb && bch->tx_idx < bch->tx_skb->len) hfcpci_fill_fifo(bch); else { - dev_kfree_skb(bch->tx_skb); + dev_kfree_skb_any(bch->tx_skb); if (get_next_bframe(bch)) hfcpci_fill_fifo(bch); } @@ -2277,7 +2277,7 @@ _hfcpci_softirq(struct device *dev, void *unused) return 0; if (hc->hw.int_m2 & HFCPCI_IRQ_ENABLE) { - spin_lock(&hc->lock); + spin_lock_irq(&hc->lock); bch = Sel_BCS(hc, hc->hw.bswapped ? 2 : 1); if (bch && bch->state == ISDN_P_B_RAW) { /* B1 rx&tx */ main_rec_hfcpci(bch); @@ -2288,7 +2288,7 @@ _hfcpci_softirq(struct device *dev, void *unused) main_rec_hfcpci(bch); tx_birq(bch); } - spin_unlock(&hc->lock); + spin_unlock_irq(&hc->lock); } return 0; } From a0b1b2055be34c0ec1371764d040164cde1ead79 Mon Sep 17 00:00:00 2001 From: Thierry Reding Date: Wed, 26 Jul 2023 18:32:00 +0200 Subject: [PATCH 42/98] net: stmmac: tegra: Properly allocate clock bulk data The clock data is an array of struct clk_bulk_data, so make sure to allocate enough memory. Fixes: d8ca113724e7 ("net: stmmac: tegra: Add MGBE support") Signed-off-by: Thierry Reding Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c index f8367c5b490ba..fbb0ccf84afc4 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c @@ -234,7 +234,8 @@ static int tegra_mgbe_probe(struct platform_device *pdev) res.addr = mgbe->regs; res.irq = irq; - mgbe->clks = devm_kzalloc(&pdev->dev, sizeof(*mgbe->clks), GFP_KERNEL); + mgbe->clks = devm_kcalloc(&pdev->dev, ARRAY_SIZE(mgbe_clks), + sizeof(*mgbe->clks), GFP_KERNEL); if (!mgbe->clks) return -ENOMEM; From 8d7ae22ae9f8c8a4407f8e993df64440bdbd0cee Mon Sep 17 00:00:00 2001 From: Lukasz Majewski Date: Thu, 27 Jul 2023 10:13:42 +0200 Subject: [PATCH 43/98] net: dsa: microchip: KSZ9477 register regmap alignment to 32 bit boundaries The commit (SHA1: 5c844d57aa7894154e49cf2fc648bfe2f1aefc1c) provided code to apply "Module 6: Certain PHY registers must be written as pairs instead of singly" errata for KSZ9477 as this chip for certain PHY registers (0xN120 to 0xN13F, N=1,2,3,4,5) must be accesses as 32 bit words instead of 16 or 8 bit access. Otherwise, adjacent registers (no matter if reserved or not) are overwritten with 0x0. Without this patch some registers (e.g. 0x113c or 0x1134) required for 32 bit access are out of valid regmap ranges. As a result, following error is observed and KSZ9477 is not properly configured: ksz-switch spi1.0: can't rmw 32bit reg 0x113c: -EIO ksz-switch spi1.0: can't rmw 32bit reg 0x1134: -EIO ksz-switch spi1.0 lan1 (uninitialized): failed to connect to PHY: -EIO ksz-switch spi1.0 lan1 (uninitialized): error -5 setting up PHY for tree 0, switch 0, port 0 The solution is to modify regmap_reg_range to allow accesses with 4 bytes boundaries. Signed-off-by: Lukasz Majewski Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- drivers/net/dsa/microchip/ksz_common.c | 35 +++++++++++--------------- 1 file changed, 15 insertions(+), 20 deletions(-) diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c index b18cd170ec06c..6c0623f88654e 100644 --- a/drivers/net/dsa/microchip/ksz_common.c +++ b/drivers/net/dsa/microchip/ksz_common.c @@ -635,10 +635,9 @@ static const struct regmap_range ksz9477_valid_regs[] = { regmap_reg_range(0x1030, 0x1030), regmap_reg_range(0x1100, 0x1115), regmap_reg_range(0x111a, 0x111f), - regmap_reg_range(0x1122, 0x1127), - regmap_reg_range(0x112a, 0x112b), - regmap_reg_range(0x1136, 0x1139), - regmap_reg_range(0x113e, 0x113f), + regmap_reg_range(0x1120, 0x112b), + regmap_reg_range(0x1134, 0x113b), + regmap_reg_range(0x113c, 0x113f), regmap_reg_range(0x1400, 0x1401), regmap_reg_range(0x1403, 0x1403), regmap_reg_range(0x1410, 0x1417), @@ -669,10 +668,9 @@ static const struct regmap_range ksz9477_valid_regs[] = { regmap_reg_range(0x2030, 0x2030), regmap_reg_range(0x2100, 0x2115), regmap_reg_range(0x211a, 0x211f), - regmap_reg_range(0x2122, 0x2127), - regmap_reg_range(0x212a, 0x212b), - regmap_reg_range(0x2136, 0x2139), - regmap_reg_range(0x213e, 0x213f), + regmap_reg_range(0x2120, 0x212b), + regmap_reg_range(0x2134, 0x213b), + regmap_reg_range(0x213c, 0x213f), regmap_reg_range(0x2400, 0x2401), regmap_reg_range(0x2403, 0x2403), regmap_reg_range(0x2410, 0x2417), @@ -703,10 +701,9 @@ static const struct regmap_range ksz9477_valid_regs[] = { regmap_reg_range(0x3030, 0x3030), regmap_reg_range(0x3100, 0x3115), regmap_reg_range(0x311a, 0x311f), - regmap_reg_range(0x3122, 0x3127), - regmap_reg_range(0x312a, 0x312b), - regmap_reg_range(0x3136, 0x3139), - regmap_reg_range(0x313e, 0x313f), + regmap_reg_range(0x3120, 0x312b), + regmap_reg_range(0x3134, 0x313b), + regmap_reg_range(0x313c, 0x313f), regmap_reg_range(0x3400, 0x3401), regmap_reg_range(0x3403, 0x3403), regmap_reg_range(0x3410, 0x3417), @@ -737,10 +734,9 @@ static const struct regmap_range ksz9477_valid_regs[] = { regmap_reg_range(0x4030, 0x4030), regmap_reg_range(0x4100, 0x4115), regmap_reg_range(0x411a, 0x411f), - regmap_reg_range(0x4122, 0x4127), - regmap_reg_range(0x412a, 0x412b), - regmap_reg_range(0x4136, 0x4139), - regmap_reg_range(0x413e, 0x413f), + regmap_reg_range(0x4120, 0x412b), + regmap_reg_range(0x4134, 0x413b), + regmap_reg_range(0x413c, 0x413f), regmap_reg_range(0x4400, 0x4401), regmap_reg_range(0x4403, 0x4403), regmap_reg_range(0x4410, 0x4417), @@ -771,10 +767,9 @@ static const struct regmap_range ksz9477_valid_regs[] = { regmap_reg_range(0x5030, 0x5030), regmap_reg_range(0x5100, 0x5115), regmap_reg_range(0x511a, 0x511f), - regmap_reg_range(0x5122, 0x5127), - regmap_reg_range(0x512a, 0x512b), - regmap_reg_range(0x5136, 0x5139), - regmap_reg_range(0x513e, 0x513f), + regmap_reg_range(0x5120, 0x512b), + regmap_reg_range(0x5134, 0x513b), + regmap_reg_range(0x513c, 0x513f), regmap_reg_range(0x5400, 0x5401), regmap_reg_range(0x5403, 0x5403), regmap_reg_range(0x5410, 0x5417), From e346e231b42bcae6822a6326acfb7b741e9e6026 Mon Sep 17 00:00:00 2001 From: Konstantin Khorenko Date: Thu, 27 Jul 2023 18:26:09 +0300 Subject: [PATCH 44/98] qed: Fix scheduling in a tasklet while getting stats Here we've got to a situation when tasklet called usleep_range() in PTT acquire logic, thus welcome to the "scheduling while atomic" BUG(). BUG: scheduling while atomic: swapper/24/0/0x00000100 [] schedule+0x29/0x70 [] schedule_hrtimeout_range_clock+0xb2/0x150 [] schedule_hrtimeout_range+0x13/0x20 [] usleep_range+0x4f/0x70 [] qed_ptt_acquire+0x38/0x100 [qed] [] _qed_get_vport_stats+0x458/0x580 [qed] [] qed_get_vport_stats+0x1c/0xd0 [qed] [] qed_get_protocol_stats+0x93/0x100 [qed] qed_mcp_send_protocol_stats case MFW_DRV_MSG_GET_LAN_STATS: case MFW_DRV_MSG_GET_FCOE_STATS: case MFW_DRV_MSG_GET_ISCSI_STATS: case MFW_DRV_MSG_GET_RDMA_STATS: [] qed_mcp_handle_events+0x2d8/0x890 [qed] qed_int_assertion qed_int_attentions [] qed_int_sp_dpc+0xa50/0xdc0 [qed] [] tasklet_action+0x83/0x140 [] __do_softirq+0x125/0x2bb [] call_softirq+0x1c/0x30 [] do_softirq+0x65/0xa0 [] irq_exit+0x105/0x110 [] do_IRQ+0x56/0xf0 Fix this by making caller to provide the context whether it could be in atomic context flow or not when getting stats from QED driver. QED driver based on the context provided decide to schedule out or not when acquiring the PTT BAR window. We faced the BUG_ON() while getting vport stats, but according to the code same issue could happen for fcoe and iscsi statistics as well, so fixing them too. Fixes: 6c75424612a7 ("qed: Add support for NCSI statistics.") Fixes: 1e128c81290a ("qed: Add support for hardware offloaded FCoE.") Fixes: 2f2b2614e893 ("qed: Provide iSCSI statistics to management") Cc: Sudarsana Kalluru Cc: David Miller Cc: Manish Chopra Signed-off-by: Konstantin Khorenko Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- drivers/net/ethernet/qlogic/qed/qed_dev_api.h | 16 ++++++++++++ drivers/net/ethernet/qlogic/qed/qed_fcoe.c | 19 ++++++++++---- drivers/net/ethernet/qlogic/qed/qed_fcoe.h | 17 ++++++++++-- drivers/net/ethernet/qlogic/qed/qed_hw.c | 26 ++++++++++++++++--- drivers/net/ethernet/qlogic/qed/qed_iscsi.c | 19 ++++++++++---- drivers/net/ethernet/qlogic/qed/qed_iscsi.h | 8 ++++-- drivers/net/ethernet/qlogic/qed/qed_l2.c | 19 ++++++++++---- drivers/net/ethernet/qlogic/qed/qed_l2.h | 24 +++++++++++++++++ drivers/net/ethernet/qlogic/qed/qed_main.c | 6 ++--- 9 files changed, 128 insertions(+), 26 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev_api.h b/drivers/net/ethernet/qlogic/qed/qed_dev_api.h index f8682356d0cf4..94d4f9413ab7a 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev_api.h +++ b/drivers/net/ethernet/qlogic/qed/qed_dev_api.h @@ -193,6 +193,22 @@ void qed_hw_remove(struct qed_dev *cdev); */ struct qed_ptt *qed_ptt_acquire(struct qed_hwfn *p_hwfn); +/** + * qed_ptt_acquire_context(): Allocate a PTT window honoring the context + * atomicy. + * + * @p_hwfn: HW device data. + * @is_atomic: Hint from the caller - if the func can sleep or not. + * + * Context: The function should not sleep in case is_atomic == true. + * Return: struct qed_ptt. + * + * Should be called at the entry point to the driver + * (at the beginning of an exported function). + */ +struct qed_ptt *qed_ptt_acquire_context(struct qed_hwfn *p_hwfn, + bool is_atomic); + /** * qed_ptt_release(): Release PTT Window. * diff --git a/drivers/net/ethernet/qlogic/qed/qed_fcoe.c b/drivers/net/ethernet/qlogic/qed/qed_fcoe.c index 3764190b948eb..04602ac947087 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_fcoe.c +++ b/drivers/net/ethernet/qlogic/qed/qed_fcoe.c @@ -693,13 +693,14 @@ static void _qed_fcoe_get_pstats(struct qed_hwfn *p_hwfn, } static int qed_fcoe_get_stats(struct qed_hwfn *p_hwfn, - struct qed_fcoe_stats *p_stats) + struct qed_fcoe_stats *p_stats, + bool is_atomic) { struct qed_ptt *p_ptt; memset(p_stats, 0, sizeof(*p_stats)); - p_ptt = qed_ptt_acquire(p_hwfn); + p_ptt = qed_ptt_acquire_context(p_hwfn, is_atomic); if (!p_ptt) { DP_ERR(p_hwfn, "Failed to acquire ptt\n"); @@ -973,19 +974,27 @@ static int qed_fcoe_destroy_conn(struct qed_dev *cdev, QED_SPQ_MODE_EBLOCK, NULL); } +static int qed_fcoe_stats_context(struct qed_dev *cdev, + struct qed_fcoe_stats *stats, + bool is_atomic) +{ + return qed_fcoe_get_stats(QED_AFFIN_HWFN(cdev), stats, is_atomic); +} + static int qed_fcoe_stats(struct qed_dev *cdev, struct qed_fcoe_stats *stats) { - return qed_fcoe_get_stats(QED_AFFIN_HWFN(cdev), stats); + return qed_fcoe_stats_context(cdev, stats, false); } void qed_get_protocol_stats_fcoe(struct qed_dev *cdev, - struct qed_mcp_fcoe_stats *stats) + struct qed_mcp_fcoe_stats *stats, + bool is_atomic) { struct qed_fcoe_stats proto_stats; /* Retrieve FW statistics */ memset(&proto_stats, 0, sizeof(proto_stats)); - if (qed_fcoe_stats(cdev, &proto_stats)) { + if (qed_fcoe_stats_context(cdev, &proto_stats, is_atomic)) { DP_VERBOSE(cdev, QED_MSG_STORAGE, "Failed to collect FCoE statistics\n"); return; diff --git a/drivers/net/ethernet/qlogic/qed/qed_fcoe.h b/drivers/net/ethernet/qlogic/qed/qed_fcoe.h index 19c85adf4ceb1..214e8299ecb4e 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_fcoe.h +++ b/drivers/net/ethernet/qlogic/qed/qed_fcoe.h @@ -28,8 +28,20 @@ int qed_fcoe_alloc(struct qed_hwfn *p_hwfn); void qed_fcoe_setup(struct qed_hwfn *p_hwfn); void qed_fcoe_free(struct qed_hwfn *p_hwfn); +/** + * qed_get_protocol_stats_fcoe(): Fills provided statistics + * struct with statistics. + * + * @cdev: Qed dev pointer. + * @stats: Points to struct that will be filled with statistics. + * @is_atomic: Hint from the caller - if the func can sleep or not. + * + * Context: The function should not sleep in case is_atomic == true. + * Return: Void. + */ void qed_get_protocol_stats_fcoe(struct qed_dev *cdev, - struct qed_mcp_fcoe_stats *stats); + struct qed_mcp_fcoe_stats *stats, + bool is_atomic); #else /* CONFIG_QED_FCOE */ static inline int qed_fcoe_alloc(struct qed_hwfn *p_hwfn) { @@ -40,7 +52,8 @@ static inline void qed_fcoe_setup(struct qed_hwfn *p_hwfn) {} static inline void qed_fcoe_free(struct qed_hwfn *p_hwfn) {} static inline void qed_get_protocol_stats_fcoe(struct qed_dev *cdev, - struct qed_mcp_fcoe_stats *stats) + struct qed_mcp_fcoe_stats *stats, + bool is_atomic) { } #endif /* CONFIG_QED_FCOE */ diff --git a/drivers/net/ethernet/qlogic/qed/qed_hw.c b/drivers/net/ethernet/qlogic/qed/qed_hw.c index 554f30b0cfd5e..6263f847b6b92 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_hw.c +++ b/drivers/net/ethernet/qlogic/qed/qed_hw.c @@ -23,7 +23,10 @@ #include "qed_reg_addr.h" #include "qed_sriov.h" -#define QED_BAR_ACQUIRE_TIMEOUT 1000 +#define QED_BAR_ACQUIRE_TIMEOUT_USLEEP_CNT 1000 +#define QED_BAR_ACQUIRE_TIMEOUT_USLEEP 1000 +#define QED_BAR_ACQUIRE_TIMEOUT_UDELAY_CNT 100000 +#define QED_BAR_ACQUIRE_TIMEOUT_UDELAY 10 /* Invalid values */ #define QED_BAR_INVALID_OFFSET (cpu_to_le32(-1)) @@ -84,12 +87,22 @@ void qed_ptt_pool_free(struct qed_hwfn *p_hwfn) } struct qed_ptt *qed_ptt_acquire(struct qed_hwfn *p_hwfn) +{ + return qed_ptt_acquire_context(p_hwfn, false); +} + +struct qed_ptt *qed_ptt_acquire_context(struct qed_hwfn *p_hwfn, bool is_atomic) { struct qed_ptt *p_ptt; - unsigned int i; + unsigned int i, count; + + if (is_atomic) + count = QED_BAR_ACQUIRE_TIMEOUT_UDELAY_CNT; + else + count = QED_BAR_ACQUIRE_TIMEOUT_USLEEP_CNT; /* Take the free PTT from the list */ - for (i = 0; i < QED_BAR_ACQUIRE_TIMEOUT; i++) { + for (i = 0; i < count; i++) { spin_lock_bh(&p_hwfn->p_ptt_pool->lock); if (!list_empty(&p_hwfn->p_ptt_pool->free_list)) { @@ -105,7 +118,12 @@ struct qed_ptt *qed_ptt_acquire(struct qed_hwfn *p_hwfn) } spin_unlock_bh(&p_hwfn->p_ptt_pool->lock); - usleep_range(1000, 2000); + + if (is_atomic) + udelay(QED_BAR_ACQUIRE_TIMEOUT_UDELAY); + else + usleep_range(QED_BAR_ACQUIRE_TIMEOUT_USLEEP, + QED_BAR_ACQUIRE_TIMEOUT_USLEEP * 2); } DP_NOTICE(p_hwfn, "PTT acquire timeout - failed to allocate PTT\n"); diff --git a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c index 511ab214eb9c8..980e7289b4814 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c +++ b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c @@ -999,13 +999,14 @@ static void _qed_iscsi_get_pstats(struct qed_hwfn *p_hwfn, } static int qed_iscsi_get_stats(struct qed_hwfn *p_hwfn, - struct qed_iscsi_stats *stats) + struct qed_iscsi_stats *stats, + bool is_atomic) { struct qed_ptt *p_ptt; memset(stats, 0, sizeof(*stats)); - p_ptt = qed_ptt_acquire(p_hwfn); + p_ptt = qed_ptt_acquire_context(p_hwfn, is_atomic); if (!p_ptt) { DP_ERR(p_hwfn, "Failed to acquire ptt\n"); return -EAGAIN; @@ -1336,9 +1337,16 @@ static int qed_iscsi_destroy_conn(struct qed_dev *cdev, QED_SPQ_MODE_EBLOCK, NULL); } +static int qed_iscsi_stats_context(struct qed_dev *cdev, + struct qed_iscsi_stats *stats, + bool is_atomic) +{ + return qed_iscsi_get_stats(QED_AFFIN_HWFN(cdev), stats, is_atomic); +} + static int qed_iscsi_stats(struct qed_dev *cdev, struct qed_iscsi_stats *stats) { - return qed_iscsi_get_stats(QED_AFFIN_HWFN(cdev), stats); + return qed_iscsi_stats_context(cdev, stats, false); } static int qed_iscsi_change_mac(struct qed_dev *cdev, @@ -1358,13 +1366,14 @@ static int qed_iscsi_change_mac(struct qed_dev *cdev, } void qed_get_protocol_stats_iscsi(struct qed_dev *cdev, - struct qed_mcp_iscsi_stats *stats) + struct qed_mcp_iscsi_stats *stats, + bool is_atomic) { struct qed_iscsi_stats proto_stats; /* Retrieve FW statistics */ memset(&proto_stats, 0, sizeof(proto_stats)); - if (qed_iscsi_stats(cdev, &proto_stats)) { + if (qed_iscsi_stats_context(cdev, &proto_stats, is_atomic)) { DP_VERBOSE(cdev, QED_MSG_STORAGE, "Failed to collect ISCSI statistics\n"); return; diff --git a/drivers/net/ethernet/qlogic/qed/qed_iscsi.h b/drivers/net/ethernet/qlogic/qed/qed_iscsi.h index dec2b00259d42..974cb8d26608c 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_iscsi.h +++ b/drivers/net/ethernet/qlogic/qed/qed_iscsi.h @@ -39,11 +39,14 @@ void qed_iscsi_free(struct qed_hwfn *p_hwfn); * * @cdev: Qed dev pointer. * @stats: Points to struct that will be filled with statistics. + * @is_atomic: Hint from the caller - if the func can sleep or not. * + * Context: The function should not sleep in case is_atomic == true. * Return: Void. */ void qed_get_protocol_stats_iscsi(struct qed_dev *cdev, - struct qed_mcp_iscsi_stats *stats); + struct qed_mcp_iscsi_stats *stats, + bool is_atomic); #else /* IS_ENABLED(CONFIG_QED_ISCSI) */ static inline int qed_iscsi_alloc(struct qed_hwfn *p_hwfn) { @@ -56,7 +59,8 @@ static inline void qed_iscsi_free(struct qed_hwfn *p_hwfn) {} static inline void qed_get_protocol_stats_iscsi(struct qed_dev *cdev, - struct qed_mcp_iscsi_stats *stats) {} + struct qed_mcp_iscsi_stats *stats, + bool is_atomic) {} #endif /* IS_ENABLED(CONFIG_QED_ISCSI) */ #endif diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c b/drivers/net/ethernet/qlogic/qed/qed_l2.c index 7776d3bdd459a..970b9aabbc3d7 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_l2.c +++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c @@ -1863,7 +1863,8 @@ static void __qed_get_vport_stats(struct qed_hwfn *p_hwfn, } static void _qed_get_vport_stats(struct qed_dev *cdev, - struct qed_eth_stats *stats) + struct qed_eth_stats *stats, + bool is_atomic) { u8 fw_vport = 0; int i; @@ -1872,10 +1873,11 @@ static void _qed_get_vport_stats(struct qed_dev *cdev, for_each_hwfn(cdev, i) { struct qed_hwfn *p_hwfn = &cdev->hwfns[i]; - struct qed_ptt *p_ptt = IS_PF(cdev) ? qed_ptt_acquire(p_hwfn) - : NULL; + struct qed_ptt *p_ptt; bool b_get_port_stats; + p_ptt = IS_PF(cdev) ? qed_ptt_acquire_context(p_hwfn, is_atomic) + : NULL; if (IS_PF(cdev)) { /* The main vport index is relative first */ if (qed_fw_vport(p_hwfn, 0, &fw_vport)) { @@ -1900,6 +1902,13 @@ static void _qed_get_vport_stats(struct qed_dev *cdev, } void qed_get_vport_stats(struct qed_dev *cdev, struct qed_eth_stats *stats) +{ + qed_get_vport_stats_context(cdev, stats, false); +} + +void qed_get_vport_stats_context(struct qed_dev *cdev, + struct qed_eth_stats *stats, + bool is_atomic) { u32 i; @@ -1908,7 +1917,7 @@ void qed_get_vport_stats(struct qed_dev *cdev, struct qed_eth_stats *stats) return; } - _qed_get_vport_stats(cdev, stats); + _qed_get_vport_stats(cdev, stats, is_atomic); if (!cdev->reset_stats) return; @@ -1960,7 +1969,7 @@ void qed_reset_vport_stats(struct qed_dev *cdev) if (!cdev->reset_stats) { DP_INFO(cdev, "Reset stats not allocated\n"); } else { - _qed_get_vport_stats(cdev, cdev->reset_stats); + _qed_get_vport_stats(cdev, cdev->reset_stats, false); cdev->reset_stats->common.link_change_count = 0; } } diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.h b/drivers/net/ethernet/qlogic/qed/qed_l2.h index a538cf478c14e..2d2f82c785ad2 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_l2.h +++ b/drivers/net/ethernet/qlogic/qed/qed_l2.h @@ -249,8 +249,32 @@ qed_sp_eth_rx_queues_update(struct qed_hwfn *p_hwfn, enum spq_mode comp_mode, struct qed_spq_comp_cb *p_comp_data); +/** + * qed_get_vport_stats(): Fills provided statistics + * struct with statistics. + * + * @cdev: Qed dev pointer. + * @stats: Points to struct that will be filled with statistics. + * + * Return: Void. + */ void qed_get_vport_stats(struct qed_dev *cdev, struct qed_eth_stats *stats); +/** + * qed_get_vport_stats_context(): Fills provided statistics + * struct with statistics. + * + * @cdev: Qed dev pointer. + * @stats: Points to struct that will be filled with statistics. + * @is_atomic: Hint from the caller - if the func can sleep or not. + * + * Context: The function should not sleep in case is_atomic == true. + * Return: Void. + */ +void qed_get_vport_stats_context(struct qed_dev *cdev, + struct qed_eth_stats *stats, + bool is_atomic); + void qed_reset_vport_stats(struct qed_dev *cdev); /** diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index f5af83342856f..c278f8893042b 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -3092,7 +3092,7 @@ void qed_get_protocol_stats(struct qed_dev *cdev, switch (type) { case QED_MCP_LAN_STATS: - qed_get_vport_stats(cdev, ð_stats); + qed_get_vport_stats_context(cdev, ð_stats, true); stats->lan_stats.ucast_rx_pkts = eth_stats.common.rx_ucast_pkts; stats->lan_stats.ucast_tx_pkts = @@ -3100,10 +3100,10 @@ void qed_get_protocol_stats(struct qed_dev *cdev, stats->lan_stats.fcs_err = -1; break; case QED_MCP_FCOE_STATS: - qed_get_protocol_stats_fcoe(cdev, &stats->fcoe_stats); + qed_get_protocol_stats_fcoe(cdev, &stats->fcoe_stats, true); break; case QED_MCP_ISCSI_STATS: - qed_get_protocol_stats_iscsi(cdev, &stats->iscsi_stats); + qed_get_protocol_stats_iscsi(cdev, &stats->iscsi_stats, true); break; default: DP_VERBOSE(cdev, QED_MSG_SP, From 7938cd15436873f649f31cb867bac2d88ca564d0 Mon Sep 17 00:00:00 2001 From: Richard Gobert Date: Thu, 27 Jul 2023 17:33:56 +0200 Subject: [PATCH 45/98] net: gro: fix misuse of CB in udp socket lookup This patch fixes a misuse of IP{6}CB(skb) in GRO, while calling to `udp6_lib_lookup2` when handling udp tunnels. `udp6_lib_lookup2` fetch the device from CB. The fix changes it to fetch the device from `skb->dev`. l3mdev case requires special attention since it has a master and a slave device. Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket") Reported-by: Gal Pressman Signed-off-by: Richard Gobert Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/net/gro.h | 43 ++++++++++++++++++++++++++++++++++++++++++ net/ipv4/udp.c | 8 ++++++-- net/ipv4/udp_offload.c | 7 +++++-- net/ipv6/udp.c | 8 ++++++-- net/ipv6/udp_offload.c | 7 +++++-- 5 files changed, 65 insertions(+), 8 deletions(-) diff --git a/include/net/gro.h b/include/net/gro.h index 75efa6fb84413..88644b3ca6600 100644 --- a/include/net/gro.h +++ b/include/net/gro.h @@ -452,6 +452,49 @@ static inline void gro_normal_one(struct napi_struct *napi, struct sk_buff *skb, gro_normal_list(napi); } +/* This function is the alternative of 'inet_iif' and 'inet_sdif' + * functions in case we can not rely on fields of IPCB. + * + * The caller must verify skb_valid_dst(skb) is false and skb->dev is initialized. + * The caller must hold the RCU read lock. + */ +static inline void inet_get_iif_sdif(const struct sk_buff *skb, int *iif, int *sdif) +{ + *iif = inet_iif(skb) ?: skb->dev->ifindex; + *sdif = 0; + +#if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV) + if (netif_is_l3_slave(skb->dev)) { + struct net_device *master = netdev_master_upper_dev_get_rcu(skb->dev); + + *sdif = *iif; + *iif = master ? master->ifindex : 0; + } +#endif +} + +/* This function is the alternative of 'inet6_iif' and 'inet6_sdif' + * functions in case we can not rely on fields of IP6CB. + * + * The caller must verify skb_valid_dst(skb) is false and skb->dev is initialized. + * The caller must hold the RCU read lock. + */ +static inline void inet6_get_iif_sdif(const struct sk_buff *skb, int *iif, int *sdif) +{ + /* using skb->dev->ifindex because skb_dst(skb) is not initialized */ + *iif = skb->dev->ifindex; + *sdif = 0; + +#if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV) + if (netif_is_l3_slave(skb->dev)) { + struct net_device *master = netdev_master_upper_dev_get_rcu(skb->dev); + + *sdif = *iif; + *iif = master ? master->ifindex : 0; + } +#endif +} + extern struct list_head offload_base; #endif /* _NET_IPV6_GRO_H */ diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 42a96b3547c9f..abfa860367aa9 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -114,6 +114,7 @@ #include #include #include +#include #if IS_ENABLED(CONFIG_IPV6) #include #endif @@ -555,10 +556,13 @@ struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb, { const struct iphdr *iph = ip_hdr(skb); struct net *net = dev_net(skb->dev); + int iif, sdif; + + inet_get_iif_sdif(skb, &iif, &sdif); return __udp4_lib_lookup(net, iph->saddr, sport, - iph->daddr, dport, inet_iif(skb), - inet_sdif(skb), net->ipv4.udp_table, NULL); + iph->daddr, dport, iif, + sdif, net->ipv4.udp_table, NULL); } /* Must be called under rcu_read_lock(). diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index f402946da344b..0f46b3c2e4ac5 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -609,10 +609,13 @@ static struct sock *udp4_gro_lookup_skb(struct sk_buff *skb, __be16 sport, { const struct iphdr *iph = skb_gro_network_header(skb); struct net *net = dev_net(skb->dev); + int iif, sdif; + + inet_get_iif_sdif(skb, &iif, &sdif); return __udp4_lib_lookup(net, iph->saddr, sport, - iph->daddr, dport, inet_iif(skb), - inet_sdif(skb), net->ipv4.udp_table, NULL); + iph->daddr, dport, iif, + sdif, net->ipv4.udp_table, NULL); } INDIRECT_CALLABLE_SCOPE diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index b7c972aa09a75..e5da5d1cb2158 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include @@ -300,10 +301,13 @@ struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb, { const struct ipv6hdr *iph = ipv6_hdr(skb); struct net *net = dev_net(skb->dev); + int iif, sdif; + + inet6_get_iif_sdif(skb, &iif, &sdif); return __udp6_lib_lookup(net, &iph->saddr, sport, - &iph->daddr, dport, inet6_iif(skb), - inet6_sdif(skb), net->ipv4.udp_table, NULL); + &iph->daddr, dport, iif, + sdif, net->ipv4.udp_table, NULL); } /* Must be called under rcu_read_lock(). diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c index 09fa7a42cb937..6b95ba241ebe2 100644 --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -118,10 +118,13 @@ static struct sock *udp6_gro_lookup_skb(struct sk_buff *skb, __be16 sport, { const struct ipv6hdr *iph = skb_gro_network_header(skb); struct net *net = dev_net(skb->dev); + int iif, sdif; + + inet6_get_iif_sdif(skb, &iif, &sdif); return __udp6_lib_lookup(net, &iph->saddr, sport, - &iph->daddr, dport, inet6_iif(skb), - inet6_sdif(skb), net->ipv4.udp_table, NULL); + &iph->daddr, dport, iif, + sdif, net->ipv4.udp_table, NULL); } INDIRECT_CALLABLE_SCOPE From fe11fdcb4207907d80cda2e73777465d68131e66 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 28 Jul 2023 15:03:08 +0000 Subject: [PATCH 46/98] net: annotate data-races around sk->sk_reserved_mem sk_getsockopt() runs locklessly. This means sk->sk_reserved_mem can be read while other threads are changing its value. Add missing annotations where they are needed. Fixes: 2bb2f5fb21b0 ("net: add new socket option SO_RESERVE_MEM") Signed-off-by: Eric Dumazet Cc: Wei Wang Signed-off-by: David S. Miller --- net/core/sock.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 9370fd50aa2c9..bd201d15e72aa 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1007,7 +1007,7 @@ static void sock_release_reserved_memory(struct sock *sk, int bytes) bytes = round_down(bytes, PAGE_SIZE); WARN_ON(bytes > sk->sk_reserved_mem); - sk->sk_reserved_mem -= bytes; + WRITE_ONCE(sk->sk_reserved_mem, sk->sk_reserved_mem - bytes); sk_mem_reclaim(sk); } @@ -1044,7 +1044,8 @@ static int sock_reserve_memory(struct sock *sk, int bytes) } sk->sk_forward_alloc += pages << PAGE_SHIFT; - sk->sk_reserved_mem += pages << PAGE_SHIFT; + WRITE_ONCE(sk->sk_reserved_mem, + sk->sk_reserved_mem + (pages << PAGE_SHIFT)); return 0; } @@ -1973,7 +1974,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, break; case SO_RESERVE_MEM: - v.val = sk->sk_reserved_mem; + v.val = READ_ONCE(sk->sk_reserved_mem); break; case SO_TXREHASH: From c76a0328899bbe226f8adeb88b8da9e4167bd316 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 28 Jul 2023 15:03:09 +0000 Subject: [PATCH 47/98] net: annotate data-race around sk->sk_txrehash sk_getsockopt() runs locklessly. This means sk->sk_txrehash can be read while other threads are changing its value. Other locations were handled in commit cb6cd2cec799 ("tcp: Change SYN ACK retransmit behaviour to account for rehash") Fixes: 26859240e4ee ("txhash: Add socket option to control TX hash rethink behavior") Signed-off-by: Eric Dumazet Cc: Akhmat Karakotov Signed-off-by: David S. Miller --- net/core/sock.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index bd201d15e72aa..adec93dda56af 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1534,7 +1534,9 @@ int sk_setsockopt(struct sock *sk, int level, int optname, } if ((u8)val == SOCK_TXREHASH_DEFAULT) val = READ_ONCE(sock_net(sk)->core.sysctl_txrehash); - /* Paired with READ_ONCE() in tcp_rtx_synack() */ + /* Paired with READ_ONCE() in tcp_rtx_synack() + * and sk_getsockopt(). + */ WRITE_ONCE(sk->sk_txrehash, (u8)val); break; @@ -1978,7 +1980,8 @@ int sk_getsockopt(struct sock *sk, int level, int optname, break; case SO_TXREHASH: - v.val = sk->sk_txrehash; + /* Paired with WRITE_ONCE() in sk_setsockopt() */ + v.val = READ_ONCE(sk->sk_txrehash); break; default: From ea7f45ef77b39e72244d282e47f6cb1ef4135cd2 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 28 Jul 2023 15:03:10 +0000 Subject: [PATCH 48/98] net: annotate data-races around sk->sk_max_pacing_rate sk_getsockopt() runs locklessly. This means sk->sk_max_pacing_rate can be read while other threads are changing its value. Fixes: 62748f32d501 ("net: introduce SO_MAX_PACING_RATE") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/core/sock.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index adec93dda56af..fec18755f7727 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1439,7 +1439,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname, cmpxchg(&sk->sk_pacing_status, SK_PACING_NONE, SK_PACING_NEEDED); - sk->sk_max_pacing_rate = ulval; + /* Pairs with READ_ONCE() from sk_getsockopt() */ + WRITE_ONCE(sk->sk_max_pacing_rate, ulval); sk->sk_pacing_rate = min(sk->sk_pacing_rate, ulval); break; } @@ -1903,12 +1904,14 @@ int sk_getsockopt(struct sock *sk, int level, int optname, #endif case SO_MAX_PACING_RATE: + /* The READ_ONCE() pair with the WRITE_ONCE() in sk_setsockopt() */ if (sizeof(v.ulval) != sizeof(v.val) && len >= sizeof(v.ulval)) { lv = sizeof(v.ulval); - v.ulval = sk->sk_max_pacing_rate; + v.ulval = READ_ONCE(sk->sk_max_pacing_rate); } else { /* 32bit version */ - v.val = min_t(unsigned long, sk->sk_max_pacing_rate, ~0U); + v.val = min_t(unsigned long, ~0U, + READ_ONCE(sk->sk_max_pacing_rate)); } break; From e6d12bdb435d23ff6c1890c852d85408a2f496ee Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 28 Jul 2023 15:03:11 +0000 Subject: [PATCH 49/98] net: add missing READ_ONCE(sk->sk_rcvlowat) annotation In a prior commit, I forgot to change sk_getsockopt() when reading sk->sk_rcvlowat locklessly. Fixes: eac66402d1c3 ("net: annotate sk->sk_rcvlowat lockless reads") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/sock.c b/net/core/sock.c index fec18755f7727..08e6050016057 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1730,7 +1730,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, break; case SO_RCVLOWAT: - v.val = sk->sk_rcvlowat; + v.val = READ_ONCE(sk->sk_rcvlowat); break; case SO_SNDLOWAT: From 285975dd674258ccb33e77a1803e8f2015e67105 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 28 Jul 2023 15:03:12 +0000 Subject: [PATCH 50/98] net: annotate data-races around sk->sk_{rcv|snd}timeo sk_getsockopt() runs without locks, we must add annotations to sk->sk_rcvtimeo and sk->sk_sndtimeo. In the future we might allow fetching these fields before we lock the socket in TCP fast path. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/core/sock.c | 24 ++++++++++++++---------- net/sched/em_meta.c | 4 ++-- 2 files changed, 16 insertions(+), 12 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 08e6050016057..264c99c190ac9 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -429,6 +429,7 @@ static int sock_set_timeout(long *timeo_p, sockptr_t optval, int optlen, { struct __kernel_sock_timeval tv; int err = sock_copy_user_timeval(&tv, optval, optlen, old_timeval); + long val; if (err) return err; @@ -439,7 +440,7 @@ static int sock_set_timeout(long *timeo_p, sockptr_t optval, int optlen, if (tv.tv_sec < 0) { static int warned __read_mostly; - *timeo_p = 0; + WRITE_ONCE(*timeo_p, 0); if (warned < 10 && net_ratelimit()) { warned++; pr_info("%s: `%s' (pid %d) tries to set negative timeout\n", @@ -447,11 +448,12 @@ static int sock_set_timeout(long *timeo_p, sockptr_t optval, int optlen, } return 0; } - *timeo_p = MAX_SCHEDULE_TIMEOUT; - if (tv.tv_sec == 0 && tv.tv_usec == 0) - return 0; - if (tv.tv_sec < (MAX_SCHEDULE_TIMEOUT / HZ - 1)) - *timeo_p = tv.tv_sec * HZ + DIV_ROUND_UP((unsigned long)tv.tv_usec, USEC_PER_SEC / HZ); + val = MAX_SCHEDULE_TIMEOUT; + if ((tv.tv_sec || tv.tv_usec) && + (tv.tv_sec < (MAX_SCHEDULE_TIMEOUT / HZ - 1))) + val = tv.tv_sec * HZ + DIV_ROUND_UP((unsigned long)tv.tv_usec, + USEC_PER_SEC / HZ); + WRITE_ONCE(*timeo_p, val); return 0; } @@ -813,9 +815,9 @@ void sock_set_sndtimeo(struct sock *sk, s64 secs) { lock_sock(sk); if (secs && secs < MAX_SCHEDULE_TIMEOUT / HZ - 1) - sk->sk_sndtimeo = secs * HZ; + WRITE_ONCE(sk->sk_sndtimeo, secs * HZ); else - sk->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT; + WRITE_ONCE(sk->sk_sndtimeo, MAX_SCHEDULE_TIMEOUT); release_sock(sk); } EXPORT_SYMBOL(sock_set_sndtimeo); @@ -1721,12 +1723,14 @@ int sk_getsockopt(struct sock *sk, int level, int optname, case SO_RCVTIMEO_OLD: case SO_RCVTIMEO_NEW: - lv = sock_get_timeout(sk->sk_rcvtimeo, &v, SO_RCVTIMEO_OLD == optname); + lv = sock_get_timeout(READ_ONCE(sk->sk_rcvtimeo), &v, + SO_RCVTIMEO_OLD == optname); break; case SO_SNDTIMEO_OLD: case SO_SNDTIMEO_NEW: - lv = sock_get_timeout(sk->sk_sndtimeo, &v, SO_SNDTIMEO_OLD == optname); + lv = sock_get_timeout(READ_ONCE(sk->sk_sndtimeo), &v, + SO_SNDTIMEO_OLD == optname); break; case SO_RCVLOWAT: diff --git a/net/sched/em_meta.c b/net/sched/em_meta.c index af85a73c4c545..6fdba069f6bfd 100644 --- a/net/sched/em_meta.c +++ b/net/sched/em_meta.c @@ -568,7 +568,7 @@ META_COLLECTOR(int_sk_rcvtimeo) *err = -1; return; } - dst->value = sk->sk_rcvtimeo / HZ; + dst->value = READ_ONCE(sk->sk_rcvtimeo) / HZ; } META_COLLECTOR(int_sk_sndtimeo) @@ -579,7 +579,7 @@ META_COLLECTOR(int_sk_sndtimeo) *err = -1; return; } - dst->value = sk->sk_sndtimeo / HZ; + dst->value = READ_ONCE(sk->sk_sndtimeo) / HZ; } META_COLLECTOR(int_sk_sendmsg_off) From 74bc084327c643499474ba75df485607da37dd6e Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 28 Jul 2023 15:03:13 +0000 Subject: [PATCH 51/98] net: add missing READ_ONCE(sk->sk_sndbuf) annotation In a prior commit, I forgot to change sk_getsockopt() when reading sk->sk_sndbuf locklessly. Fixes: e292f05e0df7 ("tcp: annotate sk->sk_sndbuf lockless reads") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/sock.c b/net/core/sock.c index 264c99c190ac9..ca43f7a302199 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1639,7 +1639,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, break; case SO_SNDBUF: - v.val = sk->sk_sndbuf; + v.val = READ_ONCE(sk->sk_sndbuf); break; case SO_RCVBUF: From b4b553253091cafe9ec38994acf42795e073bef5 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 28 Jul 2023 15:03:14 +0000 Subject: [PATCH 52/98] net: add missing READ_ONCE(sk->sk_rcvbuf) annotation In a prior commit, I forgot to change sk_getsockopt() when reading sk->sk_rcvbuf locklessly. Fixes: ebb3b78db7bf ("tcp: annotate sk->sk_rcvbuf lockless reads") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/sock.c b/net/core/sock.c index ca43f7a302199..96616eb3869db 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1643,7 +1643,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, break; case SO_RCVBUF: - v.val = sk->sk_rcvbuf; + v.val = READ_ONCE(sk->sk_rcvbuf); break; case SO_REUSEADDR: From 3c5b4d69c358a9275a8de98f87caf6eda644b086 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 28 Jul 2023 15:03:15 +0000 Subject: [PATCH 53/98] net: annotate data-races around sk->sk_mark sk->sk_mark is often read while another thread could change the value. Fixes: 4a19ec5800fc ("[NET]: Introducing socket mark socket option.") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/net/inet_sock.h | 7 ++++--- include/net/ip.h | 2 +- include/net/route.h | 4 ++-- net/can/raw.c | 2 +- net/core/sock.c | 4 ++-- net/dccp/ipv6.c | 4 ++-- net/ipv4/inet_diag.c | 4 ++-- net/ipv4/ip_output.c | 4 ++-- net/ipv4/route.c | 4 ++-- net/ipv4/tcp_ipv4.c | 2 +- net/ipv6/ping.c | 2 +- net/ipv6/raw.c | 4 ++-- net/ipv6/route.c | 7 ++++--- net/ipv6/tcp_ipv6.c | 6 +++--- net/ipv6/udp.c | 4 ++-- net/l2tp/l2tp_ip6.c | 2 +- net/mptcp/sockopt.c | 2 +- net/netfilter/nft_socket.c | 2 +- net/netfilter/xt_socket.c | 4 ++-- net/packet/af_packet.c | 6 +++--- net/smc/af_smc.c | 2 +- net/xdp/xsk.c | 2 +- net/xfrm/xfrm_policy.c | 2 +- 23 files changed, 42 insertions(+), 40 deletions(-) diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index caa20a9055310..0bb32bfc61832 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -107,11 +107,12 @@ static inline struct inet_request_sock *inet_rsk(const struct request_sock *sk) static inline u32 inet_request_mark(const struct sock *sk, struct sk_buff *skb) { - if (!sk->sk_mark && - READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_fwmark_accept)) + u32 mark = READ_ONCE(sk->sk_mark); + + if (!mark && READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_fwmark_accept)) return skb->mark; - return sk->sk_mark; + return mark; } static inline int inet_request_bound_dev_if(const struct sock *sk, diff --git a/include/net/ip.h b/include/net/ip.h index 50d435855ae25..332521170d9b8 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -93,7 +93,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm, { ipcm_init(ipcm); - ipcm->sockc.mark = inet->sk.sk_mark; + ipcm->sockc.mark = READ_ONCE(inet->sk.sk_mark); ipcm->sockc.tsflags = inet->sk.sk_tsflags; ipcm->oif = READ_ONCE(inet->sk.sk_bound_dev_if); ipcm->addr = inet->inet_saddr; diff --git a/include/net/route.h b/include/net/route.h index 5a5c726472bd6..8c2a8e7d8f8e6 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -168,7 +168,7 @@ static inline struct rtable *ip_route_output_ports(struct net *net, struct flowi __be16 dport, __be16 sport, __u8 proto, __u8 tos, int oif) { - flowi4_init_output(fl4, oif, sk ? sk->sk_mark : 0, tos, + flowi4_init_output(fl4, oif, sk ? READ_ONCE(sk->sk_mark) : 0, tos, RT_SCOPE_UNIVERSE, proto, sk ? inet_sk_flowi_flags(sk) : 0, daddr, saddr, dport, sport, sock_net_uid(net, sk)); @@ -301,7 +301,7 @@ static inline void ip_route_connect_init(struct flowi4 *fl4, __be32 dst, if (inet_sk(sk)->transparent) flow_flags |= FLOWI_FLAG_ANYSRC; - flowi4_init_output(fl4, oif, sk->sk_mark, ip_sock_rt_tos(sk), + flowi4_init_output(fl4, oif, READ_ONCE(sk->sk_mark), ip_sock_rt_tos(sk), ip_sock_rt_scope(sk), protocol, flow_flags, dst, src, dport, sport, sk->sk_uid); } diff --git a/net/can/raw.c b/net/can/raw.c index ba6b52b1d7767..e10f59375659f 100644 --- a/net/can/raw.c +++ b/net/can/raw.c @@ -865,7 +865,7 @@ static int raw_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) skb->dev = dev; skb->priority = sk->sk_priority; - skb->mark = sk->sk_mark; + skb->mark = READ_ONCE(sk->sk_mark); skb->tstamp = sockc.transmit_time; skb_setup_tx_timestamp(skb, sockc.tsflags); diff --git a/net/core/sock.c b/net/core/sock.c index 96616eb3869db..d831a3df2cefc 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -990,7 +990,7 @@ EXPORT_SYMBOL(sock_set_rcvbuf); static void __sock_set_mark(struct sock *sk, u32 val) { if (val != sk->sk_mark) { - sk->sk_mark = val; + WRITE_ONCE(sk->sk_mark, val); sk_dst_reset(sk); } } @@ -1851,7 +1851,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, optval, optlen, len); case SO_MARK: - v.val = sk->sk_mark; + v.val = READ_ONCE(sk->sk_mark); break; case SO_RCVMARK: diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 7249ef2181787..d29d1163203d9 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -238,8 +238,8 @@ static int dccp_v6_send_response(const struct sock *sk, struct request_sock *req opt = ireq->ipv6_opt; if (!opt) opt = rcu_dereference(np->opt); - err = ip6_xmit(sk, skb, &fl6, sk->sk_mark, opt, np->tclass, - sk->sk_priority); + err = ip6_xmit(sk, skb, &fl6, READ_ONCE(sk->sk_mark), opt, + np->tclass, sk->sk_priority); rcu_read_unlock(); err = net_xmit_eval(err); } diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index b812eb36f0e36..f7426926a1041 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -150,7 +150,7 @@ int inet_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb, } #endif - if (net_admin && nla_put_u32(skb, INET_DIAG_MARK, sk->sk_mark)) + if (net_admin && nla_put_u32(skb, INET_DIAG_MARK, READ_ONCE(sk->sk_mark))) goto errout; if (ext & (1 << (INET_DIAG_CLASS_ID - 1)) || @@ -799,7 +799,7 @@ int inet_diag_bc_sk(const struct nlattr *bc, struct sock *sk) entry.ifindex = sk->sk_bound_dev_if; entry.userlocks = sk_fullsock(sk) ? sk->sk_userlocks : 0; if (sk_fullsock(sk)) - entry.mark = sk->sk_mark; + entry.mark = READ_ONCE(sk->sk_mark); else if (sk->sk_state == TCP_NEW_SYN_RECV) entry.mark = inet_rsk(inet_reqsk(sk))->ir_mark; else if (sk->sk_state == TCP_TIME_WAIT) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 6e70839257f74..bcdbf448324a9 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -186,7 +186,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk, skb->priority = sk->sk_priority; if (!skb->mark) - skb->mark = sk->sk_mark; + skb->mark = READ_ONCE(sk->sk_mark); /* Send it out. */ return ip_local_out(net, skb->sk, skb); @@ -529,7 +529,7 @@ int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl, /* TODO : should we use skb->sk here instead of sk ? */ skb->priority = sk->sk_priority; - skb->mark = sk->sk_mark; + skb->mark = READ_ONCE(sk->sk_mark); res = ip_local_out(net, sk, skb); rcu_read_unlock(); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 98d7e6ba7493b..92fede388d520 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -518,7 +518,7 @@ static void __build_flow_key(const struct net *net, struct flowi4 *fl4, const struct inet_sock *inet = inet_sk(sk); oif = sk->sk_bound_dev_if; - mark = sk->sk_mark; + mark = READ_ONCE(sk->sk_mark); tos = ip_sock_rt_tos(sk); scope = ip_sock_rt_scope(sk); prot = inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol; @@ -552,7 +552,7 @@ static void build_sk_flow_key(struct flowi4 *fl4, const struct sock *sk) inet_opt = rcu_dereference(inet->inet_opt); if (inet_opt && inet_opt->opt.srr) daddr = inet_opt->opt.faddr; - flowi4_init_output(fl4, sk->sk_bound_dev_if, sk->sk_mark, + flowi4_init_output(fl4, sk->sk_bound_dev_if, READ_ONCE(sk->sk_mark), ip_sock_rt_tos(sk) & IPTOS_RT_MASK, ip_sock_rt_scope(sk), inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol, diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 0696420146369..894653be033a5 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -931,7 +931,7 @@ static void tcp_v4_send_ack(const struct sock *sk, ctl_sk = this_cpu_read(ipv4_tcp_sk); sock_net_set(ctl_sk, net); ctl_sk->sk_mark = (sk->sk_state == TCP_TIME_WAIT) ? - inet_twsk(sk)->tw_mark : sk->sk_mark; + inet_twsk(sk)->tw_mark : READ_ONCE(sk->sk_mark); ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ? inet_twsk(sk)->tw_priority : sk->sk_priority; transmit_time = tcp_transmit_time(sk); diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c index f804c11e2146c..c2c291827a2ce 100644 --- a/net/ipv6/ping.c +++ b/net/ipv6/ping.c @@ -120,7 +120,7 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) ipcm6_init_sk(&ipc6, np); ipc6.sockc.tsflags = sk->sk_tsflags; - ipc6.sockc.mark = sk->sk_mark; + ipc6.sockc.mark = READ_ONCE(sk->sk_mark); fl6.flowi6_oif = oif; diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index ac1cef094c5f2..39b7d727ba407 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -774,12 +774,12 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) */ memset(&fl6, 0, sizeof(fl6)); - fl6.flowi6_mark = sk->sk_mark; + fl6.flowi6_mark = READ_ONCE(sk->sk_mark); fl6.flowi6_uid = sk->sk_uid; ipcm6_init(&ipc6); ipc6.sockc.tsflags = sk->sk_tsflags; - ipc6.sockc.mark = sk->sk_mark; + ipc6.sockc.mark = fl6.flowi6_mark; if (sin6) { if (addr_len < SIN6_LEN_RFC2133) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 64e873f5895f1..56a55585eb798 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2951,7 +2951,8 @@ void ip6_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, __be32 mtu) if (!oif && skb->dev) oif = l3mdev_master_ifindex(skb->dev); - ip6_update_pmtu(skb, sock_net(sk), mtu, oif, sk->sk_mark, sk->sk_uid); + ip6_update_pmtu(skb, sock_net(sk), mtu, oif, READ_ONCE(sk->sk_mark), + sk->sk_uid); dst = __sk_dst_get(sk); if (!dst || !dst->obsolete || @@ -3172,8 +3173,8 @@ void ip6_redirect_no_header(struct sk_buff *skb, struct net *net, int oif) void ip6_sk_redirect(struct sk_buff *skb, struct sock *sk) { - ip6_redirect(skb, sock_net(sk), sk->sk_bound_dev_if, sk->sk_mark, - sk->sk_uid); + ip6_redirect(skb, sock_net(sk), sk->sk_bound_dev_if, + READ_ONCE(sk->sk_mark), sk->sk_uid); } EXPORT_SYMBOL_GPL(ip6_sk_redirect); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 4714eb695913d..3ec563742ac4f 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -564,8 +564,8 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst, opt = ireq->ipv6_opt; if (!opt) opt = rcu_dereference(np->opt); - err = ip6_xmit(sk, skb, fl6, skb->mark ? : sk->sk_mark, opt, - tclass, sk->sk_priority); + err = ip6_xmit(sk, skb, fl6, skb->mark ? : READ_ONCE(sk->sk_mark), + opt, tclass, sk->sk_priority); rcu_read_unlock(); err = net_xmit_eval(err); } @@ -939,7 +939,7 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 if (sk->sk_state == TCP_TIME_WAIT) mark = inet_twsk(sk)->tw_mark; else - mark = sk->sk_mark; + mark = READ_ONCE(sk->sk_mark); skb_set_delivery_time(buff, tcp_transmit_time(sk), true); } if (txhash) { diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index e5da5d1cb2158..f787e6b8424c7 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -628,7 +628,7 @@ int __udp6_lib_err(struct sk_buff *skb, struct inet6_skb_parm *opt, if (type == NDISC_REDIRECT) { if (tunnel) { ip6_redirect(skb, sock_net(sk), inet6_iif(skb), - sk->sk_mark, sk->sk_uid); + READ_ONCE(sk->sk_mark), sk->sk_uid); } else { ip6_sk_redirect(skb, sk); } @@ -1360,7 +1360,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) ipcm6_init(&ipc6); ipc6.gso_size = READ_ONCE(up->gso_size); ipc6.sockc.tsflags = sk->sk_tsflags; - ipc6.sockc.mark = sk->sk_mark; + ipc6.sockc.mark = READ_ONCE(sk->sk_mark); /* destination address check */ if (sin6) { diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c index b1623f9c4f921..ff78217f0cb12 100644 --- a/net/l2tp/l2tp_ip6.c +++ b/net/l2tp/l2tp_ip6.c @@ -519,7 +519,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) /* Get and verify the address */ memset(&fl6, 0, sizeof(fl6)); - fl6.flowi6_mark = sk->sk_mark; + fl6.flowi6_mark = READ_ONCE(sk->sk_mark); fl6.flowi6_uid = sk->sk_uid; ipcm6_init(&ipc6); diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index 63f7a09335c5c..a3f1fe810cc96 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -103,7 +103,7 @@ static void mptcp_sol_socket_sync_intval(struct mptcp_sock *msk, int optname, in break; case SO_MARK: if (READ_ONCE(ssk->sk_mark) != sk->sk_mark) { - ssk->sk_mark = sk->sk_mark; + WRITE_ONCE(ssk->sk_mark, sk->sk_mark); sk_dst_reset(ssk); } break; diff --git a/net/netfilter/nft_socket.c b/net/netfilter/nft_socket.c index 84def74698b78..9ed85be79452d 100644 --- a/net/netfilter/nft_socket.c +++ b/net/netfilter/nft_socket.c @@ -107,7 +107,7 @@ static void nft_socket_eval(const struct nft_expr *expr, break; case NFT_SOCKET_MARK: if (sk_fullsock(sk)) { - *dest = sk->sk_mark; + *dest = READ_ONCE(sk->sk_mark); } else { regs->verdict.code = NFT_BREAK; return; diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c index 7013f55f05d1e..76e01f292aaff 100644 --- a/net/netfilter/xt_socket.c +++ b/net/netfilter/xt_socket.c @@ -77,7 +77,7 @@ socket_match(const struct sk_buff *skb, struct xt_action_param *par, if (info->flags & XT_SOCKET_RESTORESKMARK && !wildcard && transparent && sk_fullsock(sk)) - pskb->mark = sk->sk_mark; + pskb->mark = READ_ONCE(sk->sk_mark); if (sk != skb->sk) sock_gen_put(sk); @@ -138,7 +138,7 @@ socket_mt6_v1_v2_v3(const struct sk_buff *skb, struct xt_action_param *par) if (info->flags & XT_SOCKET_RESTORESKMARK && !wildcard && transparent && sk_fullsock(sk)) - pskb->mark = sk->sk_mark; + pskb->mark = READ_ONCE(sk->sk_mark); if (sk != skb->sk) sock_gen_put(sk); diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 8e3ddec4c3d57..d9aa21a2b3a16 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2051,7 +2051,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg, skb->protocol = proto; skb->dev = dev; skb->priority = sk->sk_priority; - skb->mark = sk->sk_mark; + skb->mark = READ_ONCE(sk->sk_mark); skb->tstamp = sockc.transmit_time; skb_setup_tx_timestamp(skb, sockc.tsflags); @@ -2586,7 +2586,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, skb->protocol = proto; skb->dev = dev; skb->priority = po->sk.sk_priority; - skb->mark = po->sk.sk_mark; + skb->mark = READ_ONCE(po->sk.sk_mark); skb->tstamp = sockc->transmit_time; skb_setup_tx_timestamp(skb, sockc->tsflags); skb_zcopy_set_nouarg(skb, ph.raw); @@ -2988,7 +2988,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) goto out_unlock; sockcm_init(&sockc, sk); - sockc.mark = sk->sk_mark; + sockc.mark = READ_ONCE(sk->sk_mark); if (msg->msg_controllen) { err = sock_cmsg_send(sk, msg, &sockc); if (unlikely(err)) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index a7f887d91d89b..0c013d2b5d8f4 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -445,7 +445,7 @@ static void smc_copy_sock_settings(struct sock *nsk, struct sock *osk, nsk->sk_rcvbuf = osk->sk_rcvbuf; nsk->sk_sndtimeo = osk->sk_sndtimeo; nsk->sk_rcvtimeo = osk->sk_rcvtimeo; - nsk->sk_mark = osk->sk_mark; + nsk->sk_mark = READ_ONCE(osk->sk_mark); nsk->sk_priority = osk->sk_priority; nsk->sk_rcvlowat = osk->sk_rcvlowat; nsk->sk_bound_dev_if = osk->sk_bound_dev_if; diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 31dca4ecb2c53..b89adb52a977e 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -505,7 +505,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, skb->dev = dev; skb->priority = xs->sk.sk_priority; - skb->mark = xs->sk.sk_mark; + skb->mark = READ_ONCE(xs->sk.sk_mark); skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr; skb->destructor = xsk_destruct_skb; diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index e7617c9959c31..d6b405782b636 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -2250,7 +2250,7 @@ static struct xfrm_policy *xfrm_sk_policy_lookup(const struct sock *sk, int dir, match = xfrm_selector_match(&pol->selector, fl, family); if (match) { - if ((sk->sk_mark & pol->mark.m) != pol->mark.v || + if ((READ_ONCE(sk->sk_mark) & pol->mark.m) != pol->mark.v || pol->if_id != if_id) { pol = NULL; goto out; From 11695c6e966b0ec7ed1d16777d294cef865a5c91 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 28 Jul 2023 15:03:16 +0000 Subject: [PATCH 54/98] net: add missing data-race annotations around sk->sk_peek_off sk_getsockopt() runs locklessly, thus we need to annotate the read of sk->sk_peek_off. While we are at it, add corresponding annotations to sk_set_peek_off() and unix_set_peek_off(). Fixes: b9bb53f3836f ("sock: convert sk_peek_offset functions to WRITE_ONCE") Signed-off-by: Eric Dumazet Cc: Willem de Bruijn Signed-off-by: David S. Miller --- net/core/sock.c | 4 ++-- net/unix/af_unix.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index d831a3df2cefc..d57acaee42d4b 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1870,7 +1870,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, if (!sock->ops->set_peek_off) return -EOPNOTSUPP; - v.val = sk->sk_peek_off; + v.val = READ_ONCE(sk->sk_peek_off); break; case SO_NOFCS: v.val = sock_flag(sk, SOCK_NOFCS); @@ -3179,7 +3179,7 @@ EXPORT_SYMBOL(__sk_mem_reclaim); int sk_set_peek_off(struct sock *sk, int val) { - sk->sk_peek_off = val; + WRITE_ONCE(sk->sk_peek_off, val); return 0; } EXPORT_SYMBOL_GPL(sk_set_peek_off); diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 78585217f61a6..86930a8ed012b 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -790,7 +790,7 @@ static int unix_set_peek_off(struct sock *sk, int val) if (mutex_lock_interruptible(&u->iolock)) return -EINTR; - sk->sk_peek_off = val; + WRITE_ONCE(sk->sk_peek_off, val); mutex_unlock(&u->iolock); return 0; From e5f0d2dd3c2faa671711dac6d3ff3cef307bcfe3 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 28 Jul 2023 15:03:17 +0000 Subject: [PATCH 55/98] net: add missing data-race annotation for sk_ll_usec In a prior commit I forgot that sk_getsockopt() reads sk->sk_ll_usec without holding a lock. Fixes: 0dbffbb5335a ("net: annotate data race around sk_ll_usec") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/sock.c b/net/core/sock.c index d57acaee42d4b..f11e19c7edfb4 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1900,7 +1900,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, #ifdef CONFIG_NET_RX_BUSY_POLL case SO_BUSY_POLL: - v.val = sk->sk_ll_usec; + v.val = READ_ONCE(sk->sk_ll_usec); break; case SO_PREFER_BUSY_POLL: v.val = READ_ONCE(sk->sk_prefer_busy_poll); From 8bf43be799d4b242ea552a14db10456446be843e Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 28 Jul 2023 15:03:18 +0000 Subject: [PATCH 56/98] net: annotate data-races around sk->sk_priority sk_getsockopt() runs locklessly. This means sk->sk_priority can be read while other threads are changing its value. Other reads also happen without socket lock being held. Add missing annotations where needed. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- net/core/sock.c | 6 +++--- net/ipv4/ip_output.c | 4 ++-- net/ipv4/ip_sockglue.c | 2 +- net/ipv4/raw.c | 2 +- net/ipv4/tcp_ipv4.c | 2 +- net/ipv6/raw.c | 2 +- net/ipv6/tcp_ipv6.c | 3 ++- net/packet/af_packet.c | 6 +++--- 8 files changed, 14 insertions(+), 13 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index f11e19c7edfb4..6d4f28efe29a7 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -806,7 +806,7 @@ EXPORT_SYMBOL(sock_no_linger); void sock_set_priority(struct sock *sk, u32 priority) { lock_sock(sk); - sk->sk_priority = priority; + WRITE_ONCE(sk->sk_priority, priority); release_sock(sk); } EXPORT_SYMBOL(sock_set_priority); @@ -1216,7 +1216,7 @@ int sk_setsockopt(struct sock *sk, int level, int optname, if ((val >= 0 && val <= 6) || sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) || sockopt_ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) - sk->sk_priority = val; + WRITE_ONCE(sk->sk_priority, val); else ret = -EPERM; break; @@ -1685,7 +1685,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname, break; case SO_PRIORITY: - v.val = sk->sk_priority; + v.val = READ_ONCE(sk->sk_priority); break; case SO_LINGER: diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index bcdbf448324a9..54d2d3a2d850e 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -184,7 +184,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk, ip_options_build(skb, &opt->opt, daddr, rt); } - skb->priority = sk->sk_priority; + skb->priority = READ_ONCE(sk->sk_priority); if (!skb->mark) skb->mark = READ_ONCE(sk->sk_mark); @@ -528,7 +528,7 @@ int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl, skb_shinfo(skb)->gso_segs ?: 1); /* TODO : should we use skb->sk here instead of sk ? */ - skb->priority = sk->sk_priority; + skb->priority = READ_ONCE(sk->sk_priority); skb->mark = READ_ONCE(sk->sk_mark); res = ip_local_out(net, sk, skb); diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index 8e97d8d4cc9d9..d41bce8927b2c 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -592,7 +592,7 @@ void __ip_sock_set_tos(struct sock *sk, int val) } if (inet_sk(sk)->tos != val) { inet_sk(sk)->tos = val; - sk->sk_priority = rt_tos2priority(val); + WRITE_ONCE(sk->sk_priority, rt_tos2priority(val)); sk_dst_reset(sk); } } diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 7782ff5e65399..cb381f5aa4643 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -348,7 +348,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, goto error; skb_reserve(skb, hlen); - skb->priority = sk->sk_priority; + skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc->mark; skb->tstamp = sockc->transmit_time; skb_dst_set(skb, &rt->dst); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 894653be033a5..a59cc4b838611 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -933,7 +933,7 @@ static void tcp_v4_send_ack(const struct sock *sk, ctl_sk->sk_mark = (sk->sk_state == TCP_TIME_WAIT) ? inet_twsk(sk)->tw_mark : READ_ONCE(sk->sk_mark); ctl_sk->sk_priority = (sk->sk_state == TCP_TIME_WAIT) ? - inet_twsk(sk)->tw_priority : sk->sk_priority; + inet_twsk(sk)->tw_priority : READ_ONCE(sk->sk_priority); transmit_time = tcp_transmit_time(sk); ip_send_unicast_reply(ctl_sk, skb, &TCP_SKB_CB(skb)->header.h4.opt, diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 39b7d727ba407..49381f35b623c 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -614,7 +614,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length, skb_reserve(skb, hlen); skb->protocol = htons(ETH_P_IPV6); - skb->priority = sk->sk_priority; + skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc->mark; skb->tstamp = sockc->transmit_time; diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 3ec563742ac4f..6e86721e1cdbb 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1128,7 +1128,8 @@ static void tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, tcp_time_stamp_raw() + tcp_rsk(req)->ts_off, READ_ONCE(req->ts_recent), sk->sk_bound_dev_if, tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->saddr, l3index), - ipv6_get_dsfield(ipv6_hdr(skb)), 0, sk->sk_priority, + ipv6_get_dsfield(ipv6_hdr(skb)), 0, + READ_ONCE(sk->sk_priority), READ_ONCE(tcp_rsk(req)->txhash)); } diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index d9aa21a2b3a16..a4631cb457a91 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2050,7 +2050,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg, skb->protocol = proto; skb->dev = dev; - skb->priority = sk->sk_priority; + skb->priority = READ_ONCE(sk->sk_priority); skb->mark = READ_ONCE(sk->sk_mark); skb->tstamp = sockc.transmit_time; @@ -2585,7 +2585,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, skb->protocol = proto; skb->dev = dev; - skb->priority = po->sk.sk_priority; + skb->priority = READ_ONCE(po->sk.sk_priority); skb->mark = READ_ONCE(po->sk.sk_mark); skb->tstamp = sockc->transmit_time; skb_setup_tx_timestamp(skb, sockc->tsflags); @@ -3061,7 +3061,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) skb->protocol = proto; skb->dev = dev; - skb->priority = sk->sk_priority; + skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc.mark; skb->tstamp = sockc.transmit_time; From e739718444f7bf2fa3d70d101761ad83056ca628 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 28 Jul 2023 17:07:05 -0700 Subject: [PATCH 57/98] net/sched: taprio: Limit TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME to INT_MAX. syzkaller found zero division error [0] in div_s64_rem() called from get_cycle_time_elapsed(), where sched->cycle_time is the divisor. We have tests in parse_taprio_schedule() so that cycle_time will never be 0, and actually cycle_time is not 0 in get_cycle_time_elapsed(). The problem is that the types of divisor are different; cycle_time is s64, but the argument of div_s64_rem() is s32. syzkaller fed this input and 0x100000000 is cast to s32 to be 0. @TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME={0xc, 0x8, 0x100000000} We use s64 for cycle_time to cast it to ktime_t, so let's keep it and set max for cycle_time. While at it, we prevent overflow in setup_txtime() and add another test in parse_taprio_schedule() to check if cycle_time overflows. Also, we add a new tdc test case for this issue. [0]: divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 1 PID: 103 Comm: kworker/1:3 Not tainted 6.5.0-rc1-00330-g60cc1f7d0605 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:div_s64_rem include/linux/math64.h:42 [inline] RIP: 0010:get_cycle_time_elapsed net/sched/sch_taprio.c:223 [inline] RIP: 0010:find_entry_to_transmit+0x252/0x7e0 net/sched/sch_taprio.c:344 Code: 3c 02 00 0f 85 5e 05 00 00 48 8b 4c 24 08 4d 8b bd 40 01 00 00 48 8b 7c 24 48 48 89 c8 4c 29 f8 48 63 f7 48 99 48 89 74 24 70 <48> f7 fe 48 29 d1 48 8d 04 0f 49 89 cc 48 89 44 24 20 49 8d 85 10 RSP: 0018:ffffc90000acf260 EFLAGS: 00010206 RAX: 177450e0347560cf RBX: 0000000000000000 RCX: 177450e0347560cf RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000100000000 RBP: 0000000000000056 R08: 0000000000000000 R09: ffffed10020a0934 R10: ffff8880105049a7 R11: ffff88806cf3a520 R12: ffff888010504800 R13: ffff88800c00d800 R14: ffff8880105049a0 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0edf84f0e8 CR3: 000000000d73c002 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: get_packet_txtime net/sched/sch_taprio.c:508 [inline] taprio_enqueue_one+0x900/0xff0 net/sched/sch_taprio.c:577 taprio_enqueue+0x378/0xae0 net/sched/sch_taprio.c:658 dev_qdisc_enqueue+0x46/0x170 net/core/dev.c:3732 __dev_xmit_skb net/core/dev.c:3821 [inline] __dev_queue_xmit+0x1b2f/0x3000 net/core/dev.c:4169 dev_queue_xmit include/linux/netdevice.h:3088 [inline] neigh_resolve_output net/core/neighbour.c:1552 [inline] neigh_resolve_output+0x4a7/0x780 net/core/neighbour.c:1532 neigh_output include/net/neighbour.h:544 [inline] ip6_finish_output2+0x924/0x17d0 net/ipv6/ip6_output.c:135 __ip6_finish_output+0x620/0xaa0 net/ipv6/ip6_output.c:196 ip6_finish_output net/ipv6/ip6_output.c:207 [inline] NF_HOOK_COND include/linux/netfilter.h:292 [inline] ip6_output+0x206/0x410 net/ipv6/ip6_output.c:228 dst_output include/net/dst.h:458 [inline] NF_HOOK.constprop.0+0xea/0x260 include/linux/netfilter.h:303 ndisc_send_skb+0x872/0xe80 net/ipv6/ndisc.c:508 ndisc_send_ns+0xb5/0x130 net/ipv6/ndisc.c:666 addrconf_dad_work+0xc14/0x13f0 net/ipv6/addrconf.c:4175 process_one_work+0x92c/0x13a0 kernel/workqueue.c:2597 worker_thread+0x60f/0x1240 kernel/workqueue.c:2748 kthread+0x2fe/0x3f0 kernel/kthread.c:389 ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308 Modules linked in: Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode") Reported-by: syzkaller Signed-off-by: Kuniyuki Iwashima Co-developed-by: Eric Dumazet Co-developed-by: Pedro Tammela Acked-by: Vinicius Costa Gomes Signed-off-by: David S. Miller --- net/sched/sch_taprio.c | 15 +++++++++-- .../tc-testing/tc-tests/qdiscs/taprio.json | 25 +++++++++++++++++++ 2 files changed, 38 insertions(+), 2 deletions(-) diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 717ae51d94a0a..8c9cfff7fd05c 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -1015,6 +1015,11 @@ static const struct nla_policy taprio_tc_policy[TCA_TAPRIO_TC_ENTRY_MAX + 1] = { TC_FP_PREEMPTIBLE), }; +static struct netlink_range_validation_signed taprio_cycle_time_range = { + .min = 0, + .max = INT_MAX, +}; + static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = { [TCA_TAPRIO_ATTR_PRIOMAP] = { .len = sizeof(struct tc_mqprio_qopt) @@ -1023,7 +1028,8 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = { [TCA_TAPRIO_ATTR_SCHED_BASE_TIME] = { .type = NLA_S64 }, [TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY] = { .type = NLA_NESTED }, [TCA_TAPRIO_ATTR_SCHED_CLOCKID] = { .type = NLA_S32 }, - [TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME] = { .type = NLA_S64 }, + [TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME] = + NLA_POLICY_FULL_RANGE_SIGNED(NLA_S64, &taprio_cycle_time_range), [TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 }, [TCA_TAPRIO_ATTR_FLAGS] = { .type = NLA_U32 }, [TCA_TAPRIO_ATTR_TXTIME_DELAY] = { .type = NLA_U32 }, @@ -1159,6 +1165,11 @@ static int parse_taprio_schedule(struct taprio_sched *q, struct nlattr **tb, return -EINVAL; } + if (cycle < 0 || cycle > INT_MAX) { + NL_SET_ERR_MSG(extack, "'cycle_time' is too big"); + return -EINVAL; + } + new->cycle_time = cycle; } @@ -1347,7 +1358,7 @@ static void setup_txtime(struct taprio_sched *q, struct sched_gate_list *sched, ktime_t base) { struct sched_entry *entry; - u32 interval = 0; + u64 interval = 0; list_for_each_entry(entry, &sched->entries, list) { entry->next_txtime = ktime_add_ns(base, interval); diff --git a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json index a44455372646a..08d4861c2e782 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json +++ b/tools/testing/selftests/tc-testing/tc-tests/qdiscs/taprio.json @@ -131,5 +131,30 @@ "teardown": [ "echo \"1\" > /sys/bus/netdevsim/del_device" ] + }, + { + "id": "3e1e", + "name": "Add taprio Qdisc with an invalid cycle-time", + "category": [ + "qdisc", + "taprio" + ], + "plugins": { + "requires": "nsPlugin" + }, + "setup": [ + "echo \"1 1 8\" > /sys/bus/netdevsim/new_device", + "$TC qdisc add dev $ETH root handle 1: taprio num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@0 1@0 base-time 1000000000 sched-entry S 01 300000 flags 0x1 clockid CLOCK_TAI cycle-time 4294967296 || /bin/true", + "$IP link set dev $ETH up", + "$IP addr add 10.10.10.10/24 dev $ETH" + ], + "cmdUnderTest": "/bin/true", + "expExitCode": "0", + "verifyCmd": "$TC qdisc show dev $ETH", + "matchPattern": "qdisc taprio 1: root refcnt", + "matchCount": "0", + "teardown": [ + "echo \"1\" > /sys/bus/netdevsim/del_device" + ] } ] From 8469c7f5472fe5f77fc31c8f10f23d5aad987231 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Rafa=C5=82=20Mi=C5=82ecki?= Date: Sat, 29 Jul 2023 13:10:45 +0200 Subject: [PATCH 58/98] dt-bindings: net: mediatek,net: fixup MAC binding MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 1. Use unevaluatedProperties It's needed to allow ethernet-controller.yaml properties work correctly. 2. Drop unneeded phy-handle/phy-mode 3. Don't require phy-handle Some SoCs may use fixed link. For in-kernel MT7621 DTS files this fixes following errors: arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@0: 'fixed-link' does not match any of the regexes: 'pinctrl-[0-9]+' From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@0: 'phy-handle' is a required property From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@1: 'fixed-link' does not match any of the regexes: 'pinctrl-[0-9]+' From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml arch/mips/boot/dts/ralink/mt7621-tplink-hc220-g5-v1.dtb: ethernet@1e100000: mac@1: 'phy-handle' is a required property From schema: Documentation/devicetree/bindings/net/mediatek,net.yaml Signed-off-by: RafaÅ‚ MiÅ‚ecki Reviewed-by: Krzysztof Kozlowski Signed-off-by: David S. Miller --- Documentation/devicetree/bindings/net/mediatek,net.yaml | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/Documentation/devicetree/bindings/net/mediatek,net.yaml b/Documentation/devicetree/bindings/net/mediatek,net.yaml index acb2b2ac4fe1e..31cc0c412805c 100644 --- a/Documentation/devicetree/bindings/net/mediatek,net.yaml +++ b/Documentation/devicetree/bindings/net/mediatek,net.yaml @@ -293,7 +293,7 @@ allOf: patternProperties: "^mac@[0-1]$": type: object - additionalProperties: false + unevaluatedProperties: false allOf: - $ref: ethernet-controller.yaml# description: @@ -305,14 +305,9 @@ patternProperties: reg: maxItems: 1 - phy-handle: true - - phy-mode: true - required: - reg - compatible - - phy-handle required: - compatible From 1e7417c188d0a83fb385ba2dbe35fd2563f2b6f3 Mon Sep 17 00:00:00 2001 From: Duoming Zhou Date: Wed, 26 Jul 2023 16:14:07 +0800 Subject: [PATCH 59/98] net: usb: lan78xx: reorder cleanup operations to avoid UAF bugs The timer dev->stat_monitor can schedule the delayed work dev->wq and the delayed work dev->wq can also arm the dev->stat_monitor timer. When the device is detaching, the net_device will be deallocated. but the net_device private data could still be dereferenced in delayed work or timer handler. As a result, the UAF bugs will happen. One racy situation is shown below: (Thread 1) | (Thread 2) lan78xx_stat_monitor() | ... | lan78xx_disconnect() lan78xx_defer_kevent() | ... ... | cancel_delayed_work_sync(&dev->wq); schedule_delayed_work() | ... (wait some time) | free_netdev(net); //free net_device lan78xx_delayedwork() | //use net_device private data | dev-> //use | Although we use cancel_delayed_work_sync() to cancel the delayed work in lan78xx_disconnect(), it could still be scheduled in timer handler lan78xx_stat_monitor(). Another racy situation is shown below: (Thread 1) | (Thread 2) lan78xx_delayedwork | mod_timer() | lan78xx_disconnect() | cancel_delayed_work_sync() (wait some time) | if (timer_pending(&dev->stat_monitor)) | del_timer_sync(&dev->stat_monitor); lan78xx_stat_monitor() | ... lan78xx_defer_kevent() | free_netdev(net); //free //use net_device private data| dev-> //use | Although we use del_timer_sync() to delete the timer, the function timer_pending() returns 0 when the timer is activated. As a result, the del_timer_sync() will not be executed and the timer could be re-armed. In order to mitigate this bug, We use timer_shutdown_sync() to shutdown the timer and then use cancel_delayed_work_sync() to cancel the delayed work. As a result, the net_device could be deallocated safely. What's more, the dev->flags is set to EVENT_DEV_DISCONNECT in lan78xx_disconnect(). But it could still be set to EVENT_STAT_UPDATE in lan78xx_stat_monitor(). So this patch put the set_bit() behind timer_shutdown_sync(). Fixes: 77dfff5bb7e2 ("lan78xx: Fix race condition in disconnect handling") Signed-off-by: Duoming Zhou Signed-off-by: David S. Miller --- drivers/net/usb/lan78xx.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index c458c030fadf6..59cde06aa7f60 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -4224,8 +4224,6 @@ static void lan78xx_disconnect(struct usb_interface *intf) if (!dev) return; - set_bit(EVENT_DEV_DISCONNECT, &dev->flags); - netif_napi_del(&dev->napi); udev = interface_to_usbdev(intf); @@ -4233,6 +4231,8 @@ static void lan78xx_disconnect(struct usb_interface *intf) unregister_netdev(net); + timer_shutdown_sync(&dev->stat_monitor); + set_bit(EVENT_DEV_DISCONNECT, &dev->flags); cancel_delayed_work_sync(&dev->wq); phydev = net->phydev; @@ -4247,9 +4247,6 @@ static void lan78xx_disconnect(struct usb_interface *intf) usb_scuttle_anchored_urbs(&dev->deferred); - if (timer_pending(&dev->stat_monitor)) - del_timer_sync(&dev->stat_monitor); - lan78xx_unbind(dev, intf); lan78xx_free_tx_resources(dev); From d4480c9bb9258db9ddf2e632f6ef81e96b41089c Mon Sep 17 00:00:00 2001 From: Martin Kohn Date: Thu, 27 Jul 2023 20:00:43 +0000 Subject: [PATCH 60/98] net: usb: qmi_wwan: add Quectel EM05GV2 Add support for Quectel EM05GV2 (G=global) with vendor ID 0x2c7c and product ID 0x030e Enabling DTR on this modem was necessary to ensure stable operation. Patch for usb: serial: option: is also in progress. T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=2c7c ProdID=030e Rev= 3.18 S: Manufacturer=Quectel S: Product=Quectel EM05-G C:* #Ifs= 5 Cfg#= 1 Atr=a0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=83(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=87(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=89(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms Signed-off-by: Martin Kohn Link: https://lore.kernel.org/r/AM0PR04MB57648219DE893EE04FA6CC759701A@AM0PR04MB5764.eurprd04.prod.outlook.com Signed-off-by: Jakub Kicinski --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 417f7ea1fffa6..344af3c5c8366 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1423,6 +1423,7 @@ static const struct usb_device_id products[] = { {QMI_QUIRK_SET_DTR(0x2c7c, 0x0191, 4)}, /* Quectel EG91 */ {QMI_QUIRK_SET_DTR(0x2c7c, 0x0195, 4)}, /* Quectel EG95 */ {QMI_FIXED_INTF(0x2c7c, 0x0296, 4)}, /* Quectel BG96 */ + {QMI_QUIRK_SET_DTR(0x2c7c, 0x030e, 4)}, /* Quectel EM05GV2 */ {QMI_QUIRK_SET_DTR(0x2cb7, 0x0104, 4)}, /* Fibocom NL678 series */ {QMI_FIXED_INTF(0x0489, 0xe0b4, 0)}, /* Foxconn T77W968 LTE */ {QMI_FIXED_INTF(0x0489, 0xe0b5, 0)}, /* Foxconn T77W968 LTE with eSIM support*/ From 55c1528f9b97ff3b7efad73e8f79627fc2efb298 Mon Sep 17 00:00:00 2001 From: Edward Cree Date: Fri, 28 Jul 2023 17:55:28 +0100 Subject: [PATCH 61/98] sfc: fix field-spanning memcpy in selftest Add a struct_group for the whole packet body so we can copy it in one go without triggering FORTIFY_SOURCE complaints. Fixes: cf60ed469629 ("sfc: use padding to fix alignment in loopback test") Fixes: 30c24dd87f3f ("sfc: siena: use padding to fix alignment in loopback test") Fixes: 1186c6b31ee1 ("sfc: falcon: use padding to fix alignment in loopback test") Reviewed-by: Andy Moreton Tested-by: Andy Moreton Signed-off-by: Edward Cree Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230728165528.59070-1-edward.cree@amd.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/sfc/falcon/selftest.c | 23 ++++++++++++---------- drivers/net/ethernet/sfc/selftest.c | 23 ++++++++++++---------- drivers/net/ethernet/sfc/siena/selftest.c | 23 ++++++++++++---------- 3 files changed, 39 insertions(+), 30 deletions(-) diff --git a/drivers/net/ethernet/sfc/falcon/selftest.c b/drivers/net/ethernet/sfc/falcon/selftest.c index 9e5ce2a13787a..cf1d67b6d86db 100644 --- a/drivers/net/ethernet/sfc/falcon/selftest.c +++ b/drivers/net/ethernet/sfc/falcon/selftest.c @@ -40,15 +40,16 @@ */ struct ef4_loopback_payload { char pad[2]; /* Ensures ip is 4-byte aligned */ - struct ethhdr header; - struct iphdr ip; - struct udphdr udp; - __be16 iteration; - char msg[64]; + struct_group_attr(packet, __packed, + struct ethhdr header; + struct iphdr ip; + struct udphdr udp; + __be16 iteration; + char msg[64]; + ); } __packed __aligned(4); -#define EF4_LOOPBACK_PAYLOAD_LEN (sizeof(struct ef4_loopback_payload) - \ - offsetof(struct ef4_loopback_payload, \ - header)) +#define EF4_LOOPBACK_PAYLOAD_LEN \ + sizeof_field(struct ef4_loopback_payload, packet) /* Loopback test source MAC address */ static const u8 payload_source[ETH_ALEN] __aligned(2) = { @@ -299,7 +300,7 @@ void ef4_loopback_rx_packet(struct ef4_nic *efx, payload = &state->payload; - memcpy(&received.header, buf_ptr, + memcpy(&received.packet, buf_ptr, min_t(int, pkt_len, EF4_LOOPBACK_PAYLOAD_LEN)); received.ip.saddr = payload->ip.saddr; if (state->offload_csum) @@ -370,7 +371,7 @@ void ef4_loopback_rx_packet(struct ef4_nic *efx, buf_ptr, pkt_len, 0); netif_err(efx, drv, efx->net_dev, "expected packet:\n"); print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET, 0x10, 1, - &state->payload.header, EF4_LOOPBACK_PAYLOAD_LEN, + &state->payload.packet, EF4_LOOPBACK_PAYLOAD_LEN, 0); } #endif @@ -440,6 +441,8 @@ static int ef4_begin_loopback(struct ef4_tx_queue *tx_queue) payload->ip.saddr = htonl(INADDR_LOOPBACK | (i << 2)); /* Strip off the leading padding */ skb_pull(skb, offsetof(struct ef4_loopback_payload, header)); + /* Strip off the trailing padding */ + skb_trim(skb, EF4_LOOPBACK_PAYLOAD_LEN); /* Ensure everything we've written is visible to the * interrupt handler. */ diff --git a/drivers/net/ethernet/sfc/selftest.c b/drivers/net/ethernet/sfc/selftest.c index 96d856b9043c4..19a0b8584afbb 100644 --- a/drivers/net/ethernet/sfc/selftest.c +++ b/drivers/net/ethernet/sfc/selftest.c @@ -43,15 +43,16 @@ */ struct efx_loopback_payload { char pad[2]; /* Ensures ip is 4-byte aligned */ - struct ethhdr header; - struct iphdr ip; - struct udphdr udp; - __be16 iteration; - char msg[64]; + struct_group_attr(packet, __packed, + struct ethhdr header; + struct iphdr ip; + struct udphdr udp; + __be16 iteration; + char msg[64]; + ); } __packed __aligned(4); -#define EFX_LOOPBACK_PAYLOAD_LEN (sizeof(struct efx_loopback_payload) - \ - offsetof(struct efx_loopback_payload, \ - header)) +#define EFX_LOOPBACK_PAYLOAD_LEN \ + sizeof_field(struct efx_loopback_payload, packet) /* Loopback test source MAC address */ static const u8 payload_source[ETH_ALEN] __aligned(2) = { @@ -297,7 +298,7 @@ void efx_loopback_rx_packet(struct efx_nic *efx, payload = &state->payload; - memcpy(&received.header, buf_ptr, + memcpy(&received.packet, buf_ptr, min_t(int, pkt_len, EFX_LOOPBACK_PAYLOAD_LEN)); received.ip.saddr = payload->ip.saddr; if (state->offload_csum) @@ -368,7 +369,7 @@ void efx_loopback_rx_packet(struct efx_nic *efx, buf_ptr, pkt_len, 0); netif_err(efx, drv, efx->net_dev, "expected packet:\n"); print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET, 0x10, 1, - &state->payload.header, EFX_LOOPBACK_PAYLOAD_LEN, + &state->payload.packet, EFX_LOOPBACK_PAYLOAD_LEN, 0); } #endif @@ -438,6 +439,8 @@ static int efx_begin_loopback(struct efx_tx_queue *tx_queue) payload->ip.saddr = htonl(INADDR_LOOPBACK | (i << 2)); /* Strip off the leading padding */ skb_pull(skb, offsetof(struct efx_loopback_payload, header)); + /* Strip off the trailing padding */ + skb_trim(skb, EFX_LOOPBACK_PAYLOAD_LEN); /* Ensure everything we've written is visible to the * interrupt handler. */ diff --git a/drivers/net/ethernet/sfc/siena/selftest.c b/drivers/net/ethernet/sfc/siena/selftest.c index 111ac17194a56..b55fd33469727 100644 --- a/drivers/net/ethernet/sfc/siena/selftest.c +++ b/drivers/net/ethernet/sfc/siena/selftest.c @@ -43,15 +43,16 @@ */ struct efx_loopback_payload { char pad[2]; /* Ensures ip is 4-byte aligned */ - struct ethhdr header; - struct iphdr ip; - struct udphdr udp; - __be16 iteration; - char msg[64]; + struct_group_attr(packet, __packed, + struct ethhdr header; + struct iphdr ip; + struct udphdr udp; + __be16 iteration; + char msg[64]; + ); } __packed __aligned(4); -#define EFX_LOOPBACK_PAYLOAD_LEN (sizeof(struct efx_loopback_payload) - \ - offsetof(struct efx_loopback_payload, \ - header)) +#define EFX_LOOPBACK_PAYLOAD_LEN \ + sizeof_field(struct efx_loopback_payload, packet) /* Loopback test source MAC address */ static const u8 payload_source[ETH_ALEN] __aligned(2) = { @@ -297,7 +298,7 @@ void efx_siena_loopback_rx_packet(struct efx_nic *efx, payload = &state->payload; - memcpy(&received.header, buf_ptr, + memcpy(&received.packet, buf_ptr, min_t(int, pkt_len, EFX_LOOPBACK_PAYLOAD_LEN)); received.ip.saddr = payload->ip.saddr; if (state->offload_csum) @@ -368,7 +369,7 @@ void efx_siena_loopback_rx_packet(struct efx_nic *efx, buf_ptr, pkt_len, 0); netif_err(efx, drv, efx->net_dev, "expected packet:\n"); print_hex_dump(KERN_ERR, "", DUMP_PREFIX_OFFSET, 0x10, 1, - &state->payload.header, EFX_LOOPBACK_PAYLOAD_LEN, + &state->payload.packet, EFX_LOOPBACK_PAYLOAD_LEN, 0); } #endif @@ -438,6 +439,8 @@ static int efx_begin_loopback(struct efx_tx_queue *tx_queue) payload->ip.saddr = htonl(INADDR_LOOPBACK | (i << 2)); /* Strip off the leading padding */ skb_pull(skb, offsetof(struct efx_loopback_payload, header)); + /* Strip off the trailing padding */ + skb_trim(skb, EFX_LOOPBACK_PAYLOAD_LEN); /* Ensure everything we've written is visible to the * interrupt handler. */ From 4b31fd4d77ffa430d0b74ba1885ea0a41594f202 Mon Sep 17 00:00:00 2001 From: Rafal Rogalski Date: Fri, 28 Jul 2023 10:12:43 -0700 Subject: [PATCH 62/98] ice: Fix RDMA VSI removal during queue rebuild During qdisc create/delete, it is necessary to rebuild the queue of VSIs. An error occurred because the VSIs created by RDMA were still active. Added check if RDMA is active. If yes, it disallows qdisc changes and writes a message in the system logs. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Rafal Rogalski Signed-off-by: Mateusz Palczewski Signed-off-by: Kamil Maziarz Tested-by: Bharathi Sreenivas Signed-off-by: Tony Nguyen Reviewed-by: Leon Romanovsky Link: https://lore.kernel.org/r/20230728171243.2446101-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/intel/ice/ice_main.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index f02d44455772e..cf92c39467c8e 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -8813,6 +8813,7 @@ ice_setup_tc(struct net_device *netdev, enum tc_setup_type type, { struct ice_netdev_priv *np = netdev_priv(netdev); struct ice_pf *pf = np->vsi->back; + bool locked = false; int err; switch (type) { @@ -8822,10 +8823,27 @@ ice_setup_tc(struct net_device *netdev, enum tc_setup_type type, ice_setup_tc_block_cb, np, np, true); case TC_SETUP_QDISC_MQPRIO: + if (pf->adev) { + mutex_lock(&pf->adev_mutex); + device_lock(&pf->adev->dev); + locked = true; + if (pf->adev->dev.driver) { + netdev_err(netdev, "Cannot change qdisc when RDMA is active\n"); + err = -EBUSY; + goto adev_unlock; + } + } + /* setup traffic classifier for receive side */ mutex_lock(&pf->tc_mutex); err = ice_setup_tc_mqprio_qdisc(netdev, type_data); mutex_unlock(&pf->tc_mutex); + +adev_unlock: + if (locked) { + device_unlock(&pf->adev->dev); + mutex_unlock(&pf->adev_mutex); + } return err; default: return -EOPNOTSUPP; From 37b61cda9c1606cd8b6445d900ca9dc03185e8b6 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Fri, 28 Jul 2023 13:50:20 -0700 Subject: [PATCH 63/98] bnxt: don't handle XDP in netpoll Similarly to other recently fixed drivers make sure we don't try to access XDP or page pool APIs when NAPI budget is 0. NAPI budget of 0 may mean that we are in netpoll. This may result in running software IRQs in hard IRQ context, leading to deadlocks or crashes. To make sure bnapi->tx_pkts don't get wiped without handling the events, move clearing the field into the handler itself. Remember to clear tx_pkts after reset (bnxt_enable_napi()) as it's technically possible that netpoll will accumulate some tx_pkts and then a reset will happen, leaving tx_pkts out of sync with reality. Fixes: 322b87ca55f2 ("bnxt_en: add page_pool support") Reviewed-by: Andy Gospodarek Reviewed-by: Michael Chan Link: https://lore.kernel.org/r/20230728205020.2784844-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 26 +++++++++++-------- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 +- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 8 +++++- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h | 2 +- 4 files changed, 24 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index e5b54e6025bec..06b238bef9dd8 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -633,12 +633,13 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } -static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts) +static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int budget) { struct bnxt_tx_ring_info *txr = bnapi->tx_ring; struct netdev_queue *txq = netdev_get_tx_queue(bp->dev, txr->txq_index); u16 cons = txr->tx_cons; struct pci_dev *pdev = bp->pdev; + int nr_pkts = bnapi->tx_pkts; int i; unsigned int tx_bytes = 0; @@ -688,6 +689,7 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts) dev_kfree_skb_any(skb); } + bnapi->tx_pkts = 0; WRITE_ONCE(txr->tx_cons, cons); __netif_txq_completed_wake(txq, nr_pkts, tx_bytes, @@ -2569,12 +2571,11 @@ static int __bnxt_poll_work(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, return rx_pkts; } -static void __bnxt_poll_work_done(struct bnxt *bp, struct bnxt_napi *bnapi) +static void __bnxt_poll_work_done(struct bnxt *bp, struct bnxt_napi *bnapi, + int budget) { - if (bnapi->tx_pkts) { - bnapi->tx_int(bp, bnapi, bnapi->tx_pkts); - bnapi->tx_pkts = 0; - } + if (bnapi->tx_pkts) + bnapi->tx_int(bp, bnapi, budget); if ((bnapi->events & BNXT_RX_EVENT) && !(bnapi->in_reset)) { struct bnxt_rx_ring_info *rxr = bnapi->rx_ring; @@ -2603,7 +2604,7 @@ static int bnxt_poll_work(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, */ bnxt_db_cq(bp, &cpr->cp_db, cpr->cp_raw_cons); - __bnxt_poll_work_done(bp, bnapi); + __bnxt_poll_work_done(bp, bnapi, budget); return rx_pkts; } @@ -2734,7 +2735,7 @@ static int __bnxt_poll_cqs(struct bnxt *bp, struct bnxt_napi *bnapi, int budget) } static void __bnxt_poll_cqs_done(struct bnxt *bp, struct bnxt_napi *bnapi, - u64 dbr_type) + u64 dbr_type, int budget) { struct bnxt_cp_ring_info *cpr = &bnapi->cp_ring; int i; @@ -2750,7 +2751,7 @@ static void __bnxt_poll_cqs_done(struct bnxt *bp, struct bnxt_napi *bnapi, cpr2->had_work_done = 0; } } - __bnxt_poll_work_done(bp, bnapi); + __bnxt_poll_work_done(bp, bnapi, budget); } static int bnxt_poll_p5(struct napi_struct *napi, int budget) @@ -2780,7 +2781,8 @@ static int bnxt_poll_p5(struct napi_struct *napi, int budget) if (cpr->has_more_work) break; - __bnxt_poll_cqs_done(bp, bnapi, DBR_TYPE_CQ_ARMALL); + __bnxt_poll_cqs_done(bp, bnapi, DBR_TYPE_CQ_ARMALL, + budget); cpr->cp_raw_cons = raw_cons; if (napi_complete_done(napi, work_done)) BNXT_DB_NQ_ARM_P5(&cpr->cp_db, @@ -2810,7 +2812,7 @@ static int bnxt_poll_p5(struct napi_struct *napi, int budget) } raw_cons = NEXT_RAW_CMP(raw_cons); } - __bnxt_poll_cqs_done(bp, bnapi, DBR_TYPE_CQ); + __bnxt_poll_cqs_done(bp, bnapi, DBR_TYPE_CQ, budget); if (raw_cons != cpr->cp_raw_cons) { cpr->cp_raw_cons = raw_cons; BNXT_DB_NQ_P5(&cpr->cp_db, raw_cons); @@ -9429,6 +9431,8 @@ static void bnxt_enable_napi(struct bnxt *bp) cpr->sw_stats.rx.rx_resets++; bnapi->in_reset = false; + bnapi->tx_pkts = 0; + if (bnapi->rx_ring) { INIT_WORK(&cpr->dim.work, bnxt_dim_work); cpr->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 080e73496066b..bb95c3dc5270f 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -1005,7 +1005,7 @@ struct bnxt_napi { struct bnxt_tx_ring_info *tx_ring; void (*tx_int)(struct bnxt *, struct bnxt_napi *, - int); + int budget); int tx_pkts; u8 events; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index 4efa5fe6972b2..7f2f9a317d473 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -125,16 +125,20 @@ static void __bnxt_xmit_xdp_redirect(struct bnxt *bp, dma_unmap_len_set(tx_buf, len, 0); } -void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts) +void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int budget) { struct bnxt_tx_ring_info *txr = bnapi->tx_ring; struct bnxt_rx_ring_info *rxr = bnapi->rx_ring; bool rx_doorbell_needed = false; + int nr_pkts = bnapi->tx_pkts; struct bnxt_sw_tx_bd *tx_buf; u16 tx_cons = txr->tx_cons; u16 last_tx_cons = tx_cons; int i, j, frags; + if (!budget) + return; + for (i = 0; i < nr_pkts; i++) { tx_buf = &txr->tx_buf_ring[tx_cons]; @@ -161,6 +165,8 @@ void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts) } tx_cons = NEXT_TX(tx_cons); } + + bnapi->tx_pkts = 0; WRITE_ONCE(txr->tx_cons, tx_cons); if (rx_doorbell_needed) { tx_buf = &txr->tx_buf_ring[last_tx_cons]; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h index ea430d6961df3..5e412c5655ba5 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h @@ -16,7 +16,7 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp, struct bnxt_tx_ring_info *txr, dma_addr_t mapping, u32 len, struct xdp_buff *xdp); -void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int nr_pkts); +void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int budget); bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, struct xdp_buff xdp, struct page *page, u8 **data_ptr, unsigned int *len, u8 *event); From 611e1b016c7beceec5ae82ac62d4a7ca224c8f9d Mon Sep 17 00:00:00 2001 From: Michal Schmidt Date: Sat, 29 Jul 2023 17:15:16 +0200 Subject: [PATCH 64/98] octeon_ep: initialize mbox mutexes The two mbox-related mutexes are destroyed in octep_ctrl_mbox_uninit(), but the corresponding mutex_init calls were missing. A "DEBUG_LOCKS_WARN_ON(lock->magic != lock)" warning was emitted with CONFIG_DEBUG_MUTEXES on. Initialize the two mutexes in octep_ctrl_mbox_init(). Fixes: 577f0d1b1c5f ("octeon_ep: add separate mailbox command and response queues") Signed-off-by: Michal Schmidt Reviewed-by: Leon Romanovsky Link: https://lore.kernel.org/r/20230729151516.24153-1-mschmidt@redhat.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/marvell/octeon_ep/octep_ctrl_mbox.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_ctrl_mbox.c b/drivers/net/ethernet/marvell/octeon_ep/octep_ctrl_mbox.c index 035ead7935c74..dab61cc1acb57 100644 --- a/drivers/net/ethernet/marvell/octeon_ep/octep_ctrl_mbox.c +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_ctrl_mbox.c @@ -98,6 +98,9 @@ int octep_ctrl_mbox_init(struct octep_ctrl_mbox *mbox) writeq(OCTEP_CTRL_MBOX_STATUS_INIT, OCTEP_CTRL_MBOX_INFO_HOST_STATUS(mbox->barmem)); + mutex_init(&mbox->h2fq_lock); + mutex_init(&mbox->f2hq_lock); + mbox->h2fq.sz = readl(OCTEP_CTRL_MBOX_H2FQ_SZ(mbox->barmem)); mbox->h2fq.hw_prod = OCTEP_CTRL_MBOX_H2FQ_PROD(mbox->barmem); mbox->h2fq.hw_cons = OCTEP_CTRL_MBOX_H2FQ_CONS(mbox->barmem); From 640a604585aa30f93e39b17d4d6ba69fcb1e66c9 Mon Sep 17 00:00:00 2001 From: Hou Tao Date: Sat, 29 Jul 2023 17:51:06 +0800 Subject: [PATCH 65/98] bpf, cpumap: Make sure kthread is running before map update returns The following warning was reported when running stress-mode enabled xdp_redirect_cpu with some RT threads: ------------[ cut here ]------------ WARNING: CPU: 4 PID: 65 at kernel/bpf/cpumap.c:135 CPU: 4 PID: 65 Comm: kworker/4:1 Not tainted 6.5.0-rc2+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Workqueue: events cpu_map_kthread_stop RIP: 0010:put_cpu_map_entry+0xda/0x220 ...... Call Trace: ? show_regs+0x65/0x70 ? __warn+0xa5/0x240 ...... ? put_cpu_map_entry+0xda/0x220 cpu_map_kthread_stop+0x41/0x60 process_one_work+0x6b0/0xb80 worker_thread+0x96/0x720 kthread+0x1a5/0x1f0 ret_from_fork+0x3a/0x70 ret_from_fork_asm+0x1b/0x30 The root cause is the same as commit 436901649731 ("bpf: cpumap: Fix memory leak in cpu_map_update_elem"). The kthread is stopped prematurely by kthread_stop() in cpu_map_kthread_stop(), and kthread() doesn't call cpu_map_kthread_run() at all but XDP program has already queued some frames or skbs into ptr_ring. So when __cpu_map_ring_cleanup() checks the ptr_ring, it will find it was not emptied and report a warning. An alternative fix is to use __cpu_map_ring_cleanup() to drop these pending frames or skbs when kthread_stop() returns -EINTR, but it may confuse the user, because these frames or skbs have been handled correctly by XDP program. So instead of dropping these frames or skbs, just make sure the per-cpu kthread is running before __cpu_map_entry_alloc() returns. After apply the fix, the error handle for kthread_stop() will be unnecessary because it will always return 0, so just remove it. Fixes: 6710e1126934 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP") Signed-off-by: Hou Tao Reviewed-by: Pu Lehui Acked-by: Jesper Dangaard Brouer Link: https://lore.kernel.org/r/20230729095107.1722450-2-houtao@huaweicloud.com Signed-off-by: Martin KaFai Lau --- kernel/bpf/cpumap.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 6ae02be7a48e3..7eeb200251640 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include @@ -73,6 +74,7 @@ struct bpf_cpu_map_entry { struct rcu_head rcu; struct work_struct kthread_stop_wq; + struct completion kthread_running; }; struct bpf_cpu_map { @@ -153,7 +155,6 @@ static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu) static void cpu_map_kthread_stop(struct work_struct *work) { struct bpf_cpu_map_entry *rcpu; - int err; rcpu = container_of(work, struct bpf_cpu_map_entry, kthread_stop_wq); @@ -163,14 +164,7 @@ static void cpu_map_kthread_stop(struct work_struct *work) rcu_barrier(); /* kthread_stop will wake_up_process and wait for it to complete */ - err = kthread_stop(rcpu->kthread); - if (err) { - /* kthread_stop may be called before cpu_map_kthread_run - * is executed, so we need to release the memory related - * to rcpu. - */ - put_cpu_map_entry(rcpu); - } + kthread_stop(rcpu->kthread); } static void cpu_map_bpf_prog_run_skb(struct bpf_cpu_map_entry *rcpu, @@ -298,11 +292,11 @@ static int cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames, return nframes; } - static int cpu_map_kthread_run(void *data) { struct bpf_cpu_map_entry *rcpu = data; + complete(&rcpu->kthread_running); set_current_state(TASK_INTERRUPTIBLE); /* When kthread gives stop order, then rcpu have been disconnected @@ -467,6 +461,7 @@ __cpu_map_entry_alloc(struct bpf_map *map, struct bpf_cpumap_val *value, goto free_ptr_ring; /* Setup kthread */ + init_completion(&rcpu->kthread_running); rcpu->kthread = kthread_create_on_node(cpu_map_kthread_run, rcpu, numa, "cpumap/%d/map:%d", cpu, map->id); @@ -480,6 +475,12 @@ __cpu_map_entry_alloc(struct bpf_map *map, struct bpf_cpumap_val *value, kthread_bind(rcpu->kthread, cpu); wake_up_process(rcpu->kthread); + /* Make sure kthread has been running, so kthread_stop() will not + * stop the kthread prematurely and all pending frames or skbs + * will be handled by the kthread before kthread_stop() returns. + */ + wait_for_completion(&rcpu->kthread_running); + return rcpu; free_prog: From 7c62b75cd1a792e14b037fa4f61f9b18914e7de1 Mon Sep 17 00:00:00 2001 From: Hou Tao Date: Sat, 29 Jul 2023 17:51:07 +0800 Subject: [PATCH 66/98] bpf, cpumap: Handle skb as well when clean up ptr_ring The following warning was reported when running xdp_redirect_cpu with both skb-mode and stress-mode enabled: ------------[ cut here ]------------ Incorrect XDP memory type (-2128176192) usage WARNING: CPU: 7 PID: 1442 at net/core/xdp.c:405 Modules linked in: CPU: 7 PID: 1442 Comm: kworker/7:0 Tainted: G 6.5.0-rc2+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Workqueue: events __cpu_map_entry_free RIP: 0010:__xdp_return+0x1e4/0x4a0 ...... Call Trace: ? show_regs+0x65/0x70 ? __warn+0xa5/0x240 ? __xdp_return+0x1e4/0x4a0 ...... xdp_return_frame+0x4d/0x150 __cpu_map_entry_free+0xf9/0x230 process_one_work+0x6b0/0xb80 worker_thread+0x96/0x720 kthread+0x1a5/0x1f0 ret_from_fork+0x3a/0x70 ret_from_fork_asm+0x1b/0x30 The reason for the warning is twofold. One is due to the kthread cpu_map_kthread_run() is stopped prematurely. Another one is __cpu_map_ring_cleanup() doesn't handle skb mode and treats skbs in ptr_ring as XDP frames. Prematurely-stopped kthread will be fixed by the preceding patch and ptr_ring will be empty when __cpu_map_ring_cleanup() is called. But as the comments in __cpu_map_ring_cleanup() said, handling and freeing skbs in ptr_ring as well to "catch any broken behaviour gracefully". Fixes: 11941f8a8536 ("bpf: cpumap: Implement generic cpumap") Signed-off-by: Hou Tao Acked-by: Jesper Dangaard Brouer Link: https://lore.kernel.org/r/20230729095107.1722450-3-houtao@huaweicloud.com Signed-off-by: Martin KaFai Lau --- kernel/bpf/cpumap.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 7eeb200251640..286ab3db0fde8 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -131,11 +131,17 @@ static void __cpu_map_ring_cleanup(struct ptr_ring *ring) * invoked cpu_map_kthread_stop(). Catch any broken behaviour * gracefully and warn once. */ - struct xdp_frame *xdpf; + void *ptr; - while ((xdpf = ptr_ring_consume(ring))) - if (WARN_ON_ONCE(xdpf)) - xdp_return_frame(xdpf); + while ((ptr = ptr_ring_consume(ring))) { + WARN_ON_ONCE(1); + if (unlikely(__ptr_test_bit(0, &ptr))) { + __ptr_clear_bit(0, &ptr); + kfree_skb(ptr); + continue; + } + xdp_return_frame(ptr); + } } static void put_cpu_map_entry(struct bpf_cpu_map_entry *rcpu) From 3044b16e7c6fe5d24b1cdbcf1bd0a9d92d1ebd81 Mon Sep 17 00:00:00 2001 From: valis Date: Sat, 29 Jul 2023 08:32:00 -0400 Subject: [PATCH 67/98] net/sched: cls_u32: No longer copy tcf_result on update to avoid use-after-free When u32_change() is called on an existing filter, the whole tcf_result struct is always copied into the new instance of the filter. This causes a problem when updating a filter bound to a class, as tcf_unbind_filter() is always called on the old instance in the success path, decreasing filter_cnt of the still referenced class and allowing it to be deleted, leading to a use-after-free. Fix this by no longer copying the tcf_result struct from the old filter. Fixes: de5df63228fc ("net: sched: cls_u32 changes to knode must appear atomic to readers") Reported-by: valis Reported-by: M A Ramdhan Signed-off-by: valis Signed-off-by: Jamal Hadi Salim Reviewed-by: Victor Nogueira Reviewed-by: Pedro Tammela Reviewed-by: M A Ramdhan Link: https://lore.kernel.org/r/20230729123202.72406-2-jhs@mojatatu.com Signed-off-by: Jakub Kicinski --- net/sched/cls_u32.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 907e58841fe80..da4c179a4d418 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -826,7 +826,6 @@ static struct tc_u_knode *u32_init_knode(struct net *net, struct tcf_proto *tp, new->ifindex = n->ifindex; new->fshift = n->fshift; - new->res = n->res; new->flags = n->flags; RCU_INIT_POINTER(new->ht_down, ht); From 76e42ae831991c828cffa8c37736ebfb831ad5ec Mon Sep 17 00:00:00 2001 From: valis Date: Sat, 29 Jul 2023 08:32:01 -0400 Subject: [PATCH 68/98] net/sched: cls_fw: No longer copy tcf_result on update to avoid use-after-free When fw_change() is called on an existing filter, the whole tcf_result struct is always copied into the new instance of the filter. This causes a problem when updating a filter bound to a class, as tcf_unbind_filter() is always called on the old instance in the success path, decreasing filter_cnt of the still referenced class and allowing it to be deleted, leading to a use-after-free. Fix this by no longer copying the tcf_result struct from the old filter. Fixes: e35a8ee5993b ("net: sched: fw use RCU") Reported-by: valis Reported-by: Bing-Jhong Billy Jheng Signed-off-by: valis Signed-off-by: Jamal Hadi Salim Reviewed-by: Victor Nogueira Reviewed-by: Pedro Tammela Reviewed-by: M A Ramdhan Link: https://lore.kernel.org/r/20230729123202.72406-3-jhs@mojatatu.com Signed-off-by: Jakub Kicinski --- net/sched/cls_fw.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c index 8641f80593179..c49d6af0e0480 100644 --- a/net/sched/cls_fw.c +++ b/net/sched/cls_fw.c @@ -267,7 +267,6 @@ static int fw_change(struct net *net, struct sk_buff *in_skb, return -ENOBUFS; fnew->id = f->id; - fnew->res = f->res; fnew->ifindex = f->ifindex; fnew->tp = f->tp; From b80b829e9e2c1b3f7aae34855e04d8f6ecaf13c8 Mon Sep 17 00:00:00 2001 From: valis Date: Sat, 29 Jul 2023 08:32:02 -0400 Subject: [PATCH 69/98] net/sched: cls_route: No longer copy tcf_result on update to avoid use-after-free When route4_change() is called on an existing filter, the whole tcf_result struct is always copied into the new instance of the filter. This causes a problem when updating a filter bound to a class, as tcf_unbind_filter() is always called on the old instance in the success path, decreasing filter_cnt of the still referenced class and allowing it to be deleted, leading to a use-after-free. Fix this by no longer copying the tcf_result struct from the old filter. Fixes: 1109c00547fc ("net: sched: RCU cls_route") Reported-by: valis Reported-by: Bing-Jhong Billy Jheng Signed-off-by: valis Signed-off-by: Jamal Hadi Salim Reviewed-by: Victor Nogueira Reviewed-by: Pedro Tammela Reviewed-by: M A Ramdhan Link: https://lore.kernel.org/r/20230729123202.72406-4-jhs@mojatatu.com Signed-off-by: Jakub Kicinski --- net/sched/cls_route.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c index d0c53724d3e86..1e20bbd687f1d 100644 --- a/net/sched/cls_route.c +++ b/net/sched/cls_route.c @@ -513,7 +513,6 @@ static int route4_change(struct net *net, struct sk_buff *in_skb, if (fold) { f->id = fold->id; f->iif = fold->iif; - f->res = fold->res; f->handle = fold->handle; f->tp = fold->tp; From 13d2618b48f15966d1adfe1ff6a1985f5eef40ba Mon Sep 17 00:00:00 2001 From: Tomas Glozar Date: Fri, 28 Jul 2023 08:44:11 +0200 Subject: [PATCH 70/98] bpf: sockmap: Remove preempt_disable in sock_map_sk_acquire Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC allocation later in sk_psock_init_link on PREEMPT_RT kernels, since GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist patchset notes for details). This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to BUG (sleeping function called from invalid context) on RT kernels. preempt_disable was introduced together with lock_sk and rcu_read_lock in commit 99ba2b5aba24e ("bpf: sockhash, disallow bpf_tcp_close and update in parallel"), probably to match disabled migration of BPF programs, and is no longer necessary. Remove preempt_disable to fix BUG in sock_map_update_common on RT. Signed-off-by: Tomas Glozar Reviewed-by: Jakub Sitnicki Link: https://lore.kernel.org/all/20200224140131.461979697@linutronix.de/ Fixes: 99ba2b5aba24 ("bpf: sockhash, disallow bpf_tcp_close and update in parallel") Reviewed-by: John Fastabend Link: https://lore.kernel.org/r/20230728064411.305576-1-tglozar@redhat.com Signed-off-by: Paolo Abeni --- net/core/sock_map.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 19538d6287144..08ab108206bf8 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -115,7 +115,6 @@ static void sock_map_sk_acquire(struct sock *sk) __acquires(&sk->sk_lock.slock) { lock_sock(sk); - preempt_disable(); rcu_read_lock(); } @@ -123,7 +122,6 @@ static void sock_map_sk_release(struct sock *sk) __releases(&sk->sk_lock.slock) { rcu_read_unlock(); - preempt_enable(); release_sock(sk); } From 1d7dd5aa35474e553b8671b58579e0749b560779 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 28 Jul 2023 16:13:02 -0700 Subject: [PATCH 71/98] wifi: ray_cs: Replace 1-element array with flexible array MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The trailing array member of struct tx_buf was defined as a 1-element array, but used as a flexible array. This was resulting in build warnings: In function 'fortify_memset_chk', inlined from 'memset_io' at /kisskb/src/arch/mips/include/asm/io.h:486:2, inlined from 'build_auth_frame' at /kisskb/src/drivers/net/wireless/legacy/ray_cs.c:2697:2: /kisskb/src/include/linux/fortify-string.h:493:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 493 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Replace it with an actual flexible array. Binary difference comparison shows a single change in output: │ drivers/net/wireless/legacy/ray_cs.c:883 │ lea 0x1c(%rbp),%r13d │ - cmp $0x7c3,%r13d │ + cmp $0x7c4,%r13d This is from: if (len + TX_HEADER_LENGTH > TX_BUF_SIZE) { specifically: #define TX_BUF_SIZE (2048 - sizeof(struct tx_msg)) This appears to have been originally buggy, so the change is correct. Reported-by: Geert Uytterhoeven Closes: https://lore.kernel.org/all/88f83d73-781d-bdc-126-aa629cb368c@linux-m68k.org Cc: Kalle Valo Cc: linux-wireless@vger.kernel.org Signed-off-by: Kees Cook Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230728231245.never.309-kees@kernel.org --- drivers/net/wireless/legacy/rayctl.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/legacy/rayctl.h b/drivers/net/wireless/legacy/rayctl.h index 2b0f332043d79..1f3bde8ac73d4 100644 --- a/drivers/net/wireless/legacy/rayctl.h +++ b/drivers/net/wireless/legacy/rayctl.h @@ -577,7 +577,7 @@ struct tx_msg { struct tib_structure tib; struct phy_header phy; struct mac_header mac; - UCHAR var[1]; + UCHAR var[]; }; /****** ECF Receive Control Structure (RCS) Area at Shared RAM offset 0x0800 */ From ef45e8400f5bb66b03cc949f76c80e2a118447de Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Mon, 31 Jul 2023 10:42:32 +0300 Subject: [PATCH 72/98] net: ll_temac: fix error checking of irq_of_parse_and_map() Most kernel functions return negative error codes but some irq functions return zero on error. In this code irq_of_parse_and_map(), returns zero and platform_get_irq() returns negative error codes. We need to handle both cases appropriately. Fixes: 8425c41d1ef7 ("net: ll_temac: Extend support to non-device-tree platforms") Signed-off-by: Dan Carpenter Acked-by: Esben Haabendal Reviewed-by: Yang Yingliang Reviewed-by: Harini Katakam Link: https://lore.kernel.org/r/3d0aef75-06e0-45a5-a2a6-2cc4738d4143@moroto.mountain Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/xilinx/ll_temac_main.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/xilinx/ll_temac_main.c b/drivers/net/ethernet/xilinx/ll_temac_main.c index e0ac1bcd9925c..49f303353ecb0 100644 --- a/drivers/net/ethernet/xilinx/ll_temac_main.c +++ b/drivers/net/ethernet/xilinx/ll_temac_main.c @@ -1567,12 +1567,16 @@ static int temac_probe(struct platform_device *pdev) } /* Error handle returned DMA RX and TX interrupts */ - if (lp->rx_irq < 0) - return dev_err_probe(&pdev->dev, lp->rx_irq, + if (lp->rx_irq <= 0) { + rc = lp->rx_irq ?: -EINVAL; + return dev_err_probe(&pdev->dev, rc, "could not get DMA RX irq\n"); - if (lp->tx_irq < 0) - return dev_err_probe(&pdev->dev, lp->tx_irq, + } + if (lp->tx_irq <= 0) { + rc = lp->tx_irq ?: -EINVAL; + return dev_err_probe(&pdev->dev, rc, "could not get DMA TX irq\n"); + } if (temac_np) { /* Retrieve the MAC address */ From b99225b4fe297d07400f9e2332ecd7347b224f8d Mon Sep 17 00:00:00 2001 From: Ross Maynard Date: Mon, 31 Jul 2023 15:42:04 +1000 Subject: [PATCH 73/98] USB: zaurus: Add ID for A-300/B-500/C-700 The SL-A300, B500/5600, and C700 devices no longer auto-load because of "usbnet: Remove over-broad module alias from zaurus." This patch adds IDs for those 3 devices. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217632 Fixes: 16adf5d07987 ("usbnet: Remove over-broad module alias from zaurus.") Signed-off-by: Ross Maynard Cc: stable@vger.kernel.org Acked-by: Greg Kroah-Hartman Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/69b5423b-2013-9fc9-9569-58e707d9bafb@bigpond.com Signed-off-by: Jakub Kicinski --- drivers/net/usb/cdc_ether.c | 21 +++++++++++++++++++++ drivers/net/usb/zaurus.c | 21 +++++++++++++++++++++ 2 files changed, 42 insertions(+) diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c index c00a89b24df96..6d61052353f07 100644 --- a/drivers/net/usb/cdc_ether.c +++ b/drivers/net/usb/cdc_ether.c @@ -618,9 +618,23 @@ static const struct usb_device_id products[] = { .match_flags = USB_DEVICE_ID_MATCH_INT_INFO | USB_DEVICE_ID_MATCH_DEVICE, .idVendor = 0x04DD, + .idProduct = 0x8005, /* A-300 */ + ZAURUS_FAKE_INTERFACE, + .driver_info = 0, +}, { + .match_flags = USB_DEVICE_ID_MATCH_INT_INFO + | USB_DEVICE_ID_MATCH_DEVICE, + .idVendor = 0x04DD, .idProduct = 0x8006, /* B-500/SL-5600 */ ZAURUS_MASTER_INTERFACE, .driver_info = 0, +}, { + .match_flags = USB_DEVICE_ID_MATCH_INT_INFO + | USB_DEVICE_ID_MATCH_DEVICE, + .idVendor = 0x04DD, + .idProduct = 0x8006, /* B-500/SL-5600 */ + ZAURUS_FAKE_INTERFACE, + .driver_info = 0, }, { .match_flags = USB_DEVICE_ID_MATCH_INT_INFO | USB_DEVICE_ID_MATCH_DEVICE, @@ -628,6 +642,13 @@ static const struct usb_device_id products[] = { .idProduct = 0x8007, /* C-700 */ ZAURUS_MASTER_INTERFACE, .driver_info = 0, +}, { + .match_flags = USB_DEVICE_ID_MATCH_INT_INFO + | USB_DEVICE_ID_MATCH_DEVICE, + .idVendor = 0x04DD, + .idProduct = 0x8007, /* C-700 */ + ZAURUS_FAKE_INTERFACE, + .driver_info = 0, }, { .match_flags = USB_DEVICE_ID_MATCH_INT_INFO | USB_DEVICE_ID_MATCH_DEVICE, diff --git a/drivers/net/usb/zaurus.c b/drivers/net/usb/zaurus.c index 7984f2157d222..df3617c4c44e8 100644 --- a/drivers/net/usb/zaurus.c +++ b/drivers/net/usb/zaurus.c @@ -289,9 +289,23 @@ static const struct usb_device_id products [] = { .match_flags = USB_DEVICE_ID_MATCH_INT_INFO | USB_DEVICE_ID_MATCH_DEVICE, .idVendor = 0x04DD, + .idProduct = 0x8005, /* A-300 */ + ZAURUS_FAKE_INTERFACE, + .driver_info = (unsigned long)&bogus_mdlm_info, +}, { + .match_flags = USB_DEVICE_ID_MATCH_INT_INFO + | USB_DEVICE_ID_MATCH_DEVICE, + .idVendor = 0x04DD, .idProduct = 0x8006, /* B-500/SL-5600 */ ZAURUS_MASTER_INTERFACE, .driver_info = ZAURUS_PXA_INFO, +}, { + .match_flags = USB_DEVICE_ID_MATCH_INT_INFO + | USB_DEVICE_ID_MATCH_DEVICE, + .idVendor = 0x04DD, + .idProduct = 0x8006, /* B-500/SL-5600 */ + ZAURUS_FAKE_INTERFACE, + .driver_info = (unsigned long)&bogus_mdlm_info, }, { .match_flags = USB_DEVICE_ID_MATCH_INT_INFO | USB_DEVICE_ID_MATCH_DEVICE, @@ -299,6 +313,13 @@ static const struct usb_device_id products [] = { .idProduct = 0x8007, /* C-700 */ ZAURUS_MASTER_INTERFACE, .driver_info = ZAURUS_PXA_INFO, +}, { + .match_flags = USB_DEVICE_ID_MATCH_INT_INFO + | USB_DEVICE_ID_MATCH_DEVICE, + .idVendor = 0x04DD, + .idProduct = 0x8007, /* C-700 */ + ZAURUS_FAKE_INTERFACE, + .driver_info = (unsigned long)&bogus_mdlm_info, }, { .match_flags = USB_DEVICE_ID_MATCH_INT_INFO | USB_DEVICE_ID_MATCH_DEVICE, From 0b6291ad1940c403734312d0e453e8dac9148f69 Mon Sep 17 00:00:00 2001 From: Yuanjun Gong Date: Mon, 31 Jul 2023 17:05:35 +0800 Subject: [PATCH 74/98] net: korina: handle clk prepare error in korina_probe() in korina_probe(), the return value of clk_prepare_enable() should be checked since it might fail. we can use devm_clk_get_optional_enabled() instead of devm_clk_get_optional() and clk_prepare_enable() to automatically handle the error. Fixes: e4cd854ec487 ("net: korina: Get mdio input clock via common clock framework") Signed-off-by: Yuanjun Gong Link: https://lore.kernel.org/r/20230731090535.21416-1-ruc_gongyuanjun@163.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/korina.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c index 2b9335cb4bb3a..8537578e1cf1d 100644 --- a/drivers/net/ethernet/korina.c +++ b/drivers/net/ethernet/korina.c @@ -1302,11 +1302,10 @@ static int korina_probe(struct platform_device *pdev) else if (of_get_ethdev_address(pdev->dev.of_node, dev) < 0) eth_hw_addr_random(dev); - clk = devm_clk_get_optional(&pdev->dev, "mdioclk"); + clk = devm_clk_get_optional_enabled(&pdev->dev, "mdioclk"); if (IS_ERR(clk)) return PTR_ERR(clk); if (clk) { - clk_prepare_enable(clk); lp->mii_clock_freq = clk_get_rate(clk); } else { lp->mii_clock_freq = 200000000; /* max possible input clk */ From f3bb7759a924713bc54d15f6d0d70733b5935fad Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Mon, 31 Jul 2023 11:48:32 +0100 Subject: [PATCH 75/98] net: netsec: Ignore 'phy-mode' on SynQuacer in DT mode As documented in acd7aaf51b20 ("netsec: ignore 'phy-mode' device property on ACPI systems") the SocioNext SynQuacer platform ships with firmware defining the PHY mode as RGMII even though the physical configuration of the PHY is for TX and RX delays. Since bbc4d71d63549bc ("net: phy: realtek: fix rtl8211e rx/tx delay config") this has caused misconfiguration of the PHY, rendering the network unusable. This was worked around for ACPI by ignoring the phy-mode property but the system is also used with DT. For DT instead if we're running on a SynQuacer force a working PHY mode, as well as the standard EDK2 firmware with DT there are also some of these systems that use u-boot and might not initialise the PHY if not netbooting. Newer firmware imagaes for at least EDK2 are available from Linaro so print a warning when doing this. Fixes: 533dd11a12f6 ("net: socionext: Add Synquacer NetSec driver") Signed-off-by: Mark Brown Acked-by: Ard Biesheuvel Acked-by: Ilias Apalodimas Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/20230731-synquacer-net-v3-1-944be5f06428@kernel.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/socionext/netsec.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c index 2d7347b71c41b..0dcd6a568b061 100644 --- a/drivers/net/ethernet/socionext/netsec.c +++ b/drivers/net/ethernet/socionext/netsec.c @@ -1851,6 +1851,17 @@ static int netsec_of_probe(struct platform_device *pdev, return err; } + /* + * SynQuacer is physically configured with TX and RX delays + * but the standard firmware claimed otherwise for a long + * time, ignore it. + */ + if (of_machine_is_compatible("socionext,developer-box") && + priv->phy_interface != PHY_INTERFACE_MODE_RGMII_ID) { + dev_warn(&pdev->dev, "Outdated firmware reports incorrect PHY mode, overriding\n"); + priv->phy_interface = PHY_INTERFACE_MODE_RGMII_ID; + } + priv->phy_np = of_parse_phandle(pdev->dev.of_node, "phy-handle", 0); if (!priv->phy_np) { dev_err(&pdev->dev, "missing required property 'phy-handle'\n"); From 3ff1617450eceb290ac17120fc172815e09a93cf Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Mon, 31 Jul 2023 11:15:53 -0700 Subject: [PATCH 76/98] selftest: net: Assert on a proper value in so_incoming_cpu.c. Dan Carpenter reported an error spotted by Smatch. ./tools/testing/selftests/net/so_incoming_cpu.c:163 create_clients() error: uninitialized symbol 'ret'. The returned value of sched_setaffinity() should be checked with ASSERT_EQ(), but the value was not saved in a proper variable, resulting in an error above. Let's save the returned value of with sched_setaffinity(). Fixes: 6df96146b202 ("selftest: Add test for SO_INCOMING_CPU.") Reported-by: Dan Carpenter Closes: https://lore.kernel.org/linux-kselftest/fe376760-33b6-4fc9-88e8-178e809af1ac@moroto.mountain/ Signed-off-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20230731181553.5392-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski --- tools/testing/selftests/net/so_incoming_cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/so_incoming_cpu.c b/tools/testing/selftests/net/so_incoming_cpu.c index 0e04f9fef9867..a148181641026 100644 --- a/tools/testing/selftests/net/so_incoming_cpu.c +++ b/tools/testing/selftests/net/so_incoming_cpu.c @@ -159,7 +159,7 @@ void create_clients(struct __test_metadata *_metadata, /* Make sure SYN will be processed on the i-th CPU * and finally distributed to the i-th listener. */ - sched_setaffinity(0, sizeof(cpu_set), &cpu_set); + ret = sched_setaffinity(0, sizeof(cpu_set), &cpu_set); ASSERT_EQ(ret, 0); for (j = 0; j < CLIENT_PER_SERVER; j++) { From f6974b4c2d8e1062b5a52228ee47293c15b4ee1e Mon Sep 17 00:00:00 2001 From: Somnath Kotur Date: Mon, 31 Jul 2023 07:20:42 -0700 Subject: [PATCH 77/98] bnxt_en: Fix page pool logic for page size >= 64K The RXBD length field on all bnxt chips is 16-bit and so we cannot support a full page when the native page size is 64K or greater. The non-XDP (non page pool) code path has logic to handle this but the XDP page pool code path does not handle this. Add the missing logic to use page_pool_dev_alloc_frag() to allocate 32K chunks if the page size is 64K or greater. Fixes: 9f4b28301ce6 ("bnxt: XDP multibuffer enablement") Link: https://lore.kernel.org/netdev/20230728231829.235716-2-michael.chan@broadcom.com/ Reviewed-by: Andy Gospodarek Signed-off-by: Somnath Kotur Signed-off-by: Michael Chan Link: https://lore.kernel.org/r/20230731142043.58855-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 42 ++++++++++++------- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 6 +-- 2 files changed, 29 insertions(+), 19 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 06b238bef9dd8..4d36de6915a09 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -699,17 +699,24 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int budget) static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping, struct bnxt_rx_ring_info *rxr, + unsigned int *offset, gfp_t gfp) { struct device *dev = &bp->pdev->dev; struct page *page; - page = page_pool_dev_alloc_pages(rxr->page_pool); + if (PAGE_SIZE > BNXT_RX_PAGE_SIZE) { + page = page_pool_dev_alloc_frag(rxr->page_pool, offset, + BNXT_RX_PAGE_SIZE); + } else { + page = page_pool_dev_alloc_pages(rxr->page_pool); + *offset = 0; + } if (!page) return NULL; - *mapping = dma_map_page_attrs(dev, page, 0, PAGE_SIZE, bp->rx_dir, - DMA_ATTR_WEAK_ORDERING); + *mapping = dma_map_page_attrs(dev, page, *offset, BNXT_RX_PAGE_SIZE, + bp->rx_dir, DMA_ATTR_WEAK_ORDERING); if (dma_mapping_error(dev, *mapping)) { page_pool_recycle_direct(rxr->page_pool, page); return NULL; @@ -749,15 +756,16 @@ int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, dma_addr_t mapping; if (BNXT_RX_PAGE_MODE(bp)) { + unsigned int offset; struct page *page = - __bnxt_alloc_rx_page(bp, &mapping, rxr, gfp); + __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp); if (!page) return -ENOMEM; mapping += bp->rx_dma_offset; rx_buf->data = page; - rx_buf->data_ptr = page_address(page) + bp->rx_offset; + rx_buf->data_ptr = page_address(page) + offset + bp->rx_offset; } else { u8 *data = __bnxt_alloc_rx_frag(bp, &mapping, gfp); @@ -817,7 +825,7 @@ static inline int bnxt_alloc_rx_page(struct bnxt *bp, unsigned int offset = 0; if (BNXT_RX_PAGE_MODE(bp)) { - page = __bnxt_alloc_rx_page(bp, &mapping, rxr, gfp); + page = __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp); if (!page) return -ENOMEM; @@ -964,15 +972,15 @@ static struct sk_buff *bnxt_rx_multi_page_skb(struct bnxt *bp, return NULL; } dma_addr -= bp->rx_dma_offset; - dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, PAGE_SIZE, bp->rx_dir, - DMA_ATTR_WEAK_ORDERING); - skb = build_skb(page_address(page), PAGE_SIZE); + dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE, + bp->rx_dir, DMA_ATTR_WEAK_ORDERING); + skb = build_skb(data_ptr - bp->rx_offset, BNXT_RX_PAGE_SIZE); if (!skb) { page_pool_recycle_direct(rxr->page_pool, page); return NULL; } skb_mark_for_recycle(skb); - skb_reserve(skb, bp->rx_dma_offset); + skb_reserve(skb, bp->rx_offset); __skb_put(skb, len); return skb; @@ -998,8 +1006,8 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp, return NULL; } dma_addr -= bp->rx_dma_offset; - dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, PAGE_SIZE, bp->rx_dir, - DMA_ATTR_WEAK_ORDERING); + dma_unmap_page_attrs(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE, + bp->rx_dir, DMA_ATTR_WEAK_ORDERING); if (unlikely(!payload)) payload = eth_get_headlen(bp->dev, data_ptr, len); @@ -1012,7 +1020,7 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp, skb_mark_for_recycle(skb); off = (void *)data_ptr - page_address(page); - skb_add_rx_frag(skb, 0, page, off, len, PAGE_SIZE); + skb_add_rx_frag(skb, 0, page, off, len, BNXT_RX_PAGE_SIZE); memcpy(skb->data - NET_IP_ALIGN, data_ptr - NET_IP_ALIGN, payload + NET_IP_ALIGN); @@ -1143,7 +1151,7 @@ static struct sk_buff *bnxt_rx_agg_pages_skb(struct bnxt *bp, skb->data_len += total_frag_len; skb->len += total_frag_len; - skb->truesize += PAGE_SIZE * agg_bufs; + skb->truesize += BNXT_RX_PAGE_SIZE * agg_bufs; return skb; } @@ -2945,8 +2953,8 @@ static void bnxt_free_one_rx_ring_skbs(struct bnxt *bp, int ring_nr) rx_buf->data = NULL; if (BNXT_RX_PAGE_MODE(bp)) { mapping -= bp->rx_dma_offset; - dma_unmap_page_attrs(&pdev->dev, mapping, PAGE_SIZE, - bp->rx_dir, + dma_unmap_page_attrs(&pdev->dev, mapping, + BNXT_RX_PAGE_SIZE, bp->rx_dir, DMA_ATTR_WEAK_ORDERING); page_pool_recycle_direct(rxr->page_pool, data); } else { @@ -3215,6 +3223,8 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp, pp.napi = &rxr->bnapi->napi; pp.dev = &bp->pdev->dev; pp.dma_dir = DMA_BIDIRECTIONAL; + if (PAGE_SIZE > BNXT_RX_PAGE_SIZE) + pp.flags |= PP_FLAG_PAGE_FRAG; rxr->page_pool = page_pool_create(&pp); if (IS_ERR(rxr->page_pool)) { diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index 7f2f9a317d473..fb43232310b2d 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -186,8 +186,8 @@ void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, u8 *data_ptr, unsigned int len, struct xdp_buff *xdp) { + u32 buflen = BNXT_RX_PAGE_SIZE; struct bnxt_sw_rx_bd *rx_buf; - u32 buflen = PAGE_SIZE; struct pci_dev *pdev; dma_addr_t mapping; u32 offset; @@ -303,7 +303,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, rx_buf = &rxr->rx_buf_ring[cons]; mapping = rx_buf->mapping - bp->rx_dma_offset; dma_unmap_page_attrs(&pdev->dev, mapping, - PAGE_SIZE, bp->rx_dir, + BNXT_RX_PAGE_SIZE, bp->rx_dir, DMA_ATTR_WEAK_ORDERING); /* if we are unable to allocate a new buffer, abort and reuse */ @@ -486,7 +486,7 @@ bnxt_xdp_build_skb(struct bnxt *bp, struct sk_buff *skb, u8 num_frags, } xdp_update_skb_shared_info(skb, num_frags, sinfo->xdp_frags_size, - PAGE_SIZE * sinfo->nr_frags, + BNXT_RX_PAGE_SIZE * sinfo->nr_frags, xdp_buff_is_frag_pfmemalloc(xdp)); return skb; } From 08450ea98ae98d5a35145b675b76db616046ea11 Mon Sep 17 00:00:00 2001 From: Michael Chan Date: Mon, 31 Jul 2023 07:20:43 -0700 Subject: [PATCH 78/98] bnxt_en: Fix max_mtu setting for multi-buf XDP The existing code does not allow the MTU to be set to the maximum even after an XDP program supporting multiple buffers is attached. Fix it to set the netdev->max_mtu to the maximum value if the attached XDP program supports mutiple buffers, regardless of the current MTU value. Also use a local variable dev instead of repeatedly using bp->dev. Fixes: 1dc4c557bfed ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff") Reviewed-by: Somnath Kotur Reviewed-by: Ajit Khaparde Reviewed-by: Andy Gospodarek Signed-off-by: Michael Chan Link: https://lore.kernel.org/r/20230731142043.58855-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 4d36de6915a09..1eb490c48c52e 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -4001,26 +4001,29 @@ void bnxt_set_ring_params(struct bnxt *bp) */ int bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode) { + struct net_device *dev = bp->dev; + if (page_mode) { bp->flags &= ~BNXT_FLAG_AGG_RINGS; bp->flags |= BNXT_FLAG_RX_PAGE_MODE; - if (bp->dev->mtu > BNXT_MAX_PAGE_MODE_MTU) { + if (bp->xdp_prog->aux->xdp_has_frags) + dev->max_mtu = min_t(u16, bp->max_mtu, BNXT_MAX_MTU); + else + dev->max_mtu = + min_t(u16, bp->max_mtu, BNXT_MAX_PAGE_MODE_MTU); + if (dev->mtu > BNXT_MAX_PAGE_MODE_MTU) { bp->flags |= BNXT_FLAG_JUMBO; bp->rx_skb_func = bnxt_rx_multi_page_skb; - bp->dev->max_mtu = - min_t(u16, bp->max_mtu, BNXT_MAX_MTU); } else { bp->flags |= BNXT_FLAG_NO_AGG_RINGS; bp->rx_skb_func = bnxt_rx_page_skb; - bp->dev->max_mtu = - min_t(u16, bp->max_mtu, BNXT_MAX_PAGE_MODE_MTU); } bp->rx_dir = DMA_BIDIRECTIONAL; /* Disable LRO or GRO_HW */ - netdev_update_features(bp->dev); + netdev_update_features(dev); } else { - bp->dev->max_mtu = bp->max_mtu; + dev->max_mtu = bp->max_mtu; bp->flags &= ~BNXT_FLAG_RX_PAGE_MODE; bp->rx_dir = DMA_FROM_DEVICE; bp->rx_skb_func = bnxt_rx_skb; From 31d49ba033095f6e8158c60f69714a500922e0c3 Mon Sep 17 00:00:00 2001 From: Lin Ma Date: Tue, 1 Aug 2023 09:32:48 +0800 Subject: [PATCH 79/98] net: dcb: choose correct policy to parse DCB_ATTR_BCN The dcbnl_bcn_setcfg uses erroneous policy to parse tb[DCB_ATTR_BCN], which is introduced in commit 859ee3c43812 ("DCB: Add support for DCB BCN"). Please see the comment in below code static int dcbnl_bcn_setcfg(...) { ... ret = nla_parse_nested_deprecated(..., dcbnl_pfc_up_nest, .. ) // !!! dcbnl_pfc_up_nest for attributes // DCB_PFC_UP_ATTR_0 to DCB_PFC_UP_ATTR_ALL in enum dcbnl_pfc_up_attrs ... for (i = DCB_BCN_ATTR_RP_0; i <= DCB_BCN_ATTR_RP_7; i++) { // !!! DCB_BCN_ATTR_RP_0 to DCB_BCN_ATTR_RP_7 in enum dcbnl_bcn_attrs ... value_byte = nla_get_u8(data[i]); ... } ... for (i = DCB_BCN_ATTR_BCNA_0; i <= DCB_BCN_ATTR_RI; i++) { // !!! DCB_BCN_ATTR_BCNA_0 to DCB_BCN_ATTR_RI in enum dcbnl_bcn_attrs ... value_int = nla_get_u32(data[i]); ... } ... } That is, the nla_parse_nested_deprecated uses dcbnl_pfc_up_nest attributes to parse nlattr defined in dcbnl_pfc_up_attrs. But the following access code fetch each nlattr as dcbnl_bcn_attrs attributes. By looking up the associated nla_policy for dcbnl_bcn_attrs. We can find the beginning part of these two policies are "same". static const struct nla_policy dcbnl_pfc_up_nest[...] = { [DCB_PFC_UP_ATTR_0] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_1] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_2] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_3] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_4] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_5] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_6] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_7] = {.type = NLA_U8}, [DCB_PFC_UP_ATTR_ALL] = {.type = NLA_FLAG}, }; static const struct nla_policy dcbnl_bcn_nest[...] = { [DCB_BCN_ATTR_RP_0] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_1] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_2] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_3] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_4] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_5] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_6] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_7] = {.type = NLA_U8}, [DCB_BCN_ATTR_RP_ALL] = {.type = NLA_FLAG}, // from here is somewhat different [DCB_BCN_ATTR_BCNA_0] = {.type = NLA_U32}, ... [DCB_BCN_ATTR_ALL] = {.type = NLA_FLAG}, }; Therefore, the current code is buggy and this nla_parse_nested_deprecated could overflow the dcbnl_pfc_up_nest and use the adjacent nla_policy to parse attributes from DCB_BCN_ATTR_BCNA_0. Hence use the correct policy dcbnl_bcn_nest to parse the nested tb[DCB_ATTR_BCN] TLV. Fixes: 859ee3c43812 ("DCB: Add support for DCB BCN") Signed-off-by: Lin Ma Reviewed-by: Simon Horman Link: https://lore.kernel.org/r/20230801013248.87240-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski --- net/dcb/dcbnl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c index c0c4381285759..2e6b8c8fd2ded 100644 --- a/net/dcb/dcbnl.c +++ b/net/dcb/dcbnl.c @@ -980,7 +980,7 @@ static int dcbnl_bcn_setcfg(struct net_device *netdev, struct nlmsghdr *nlh, return -EOPNOTSUPP; ret = nla_parse_nested_deprecated(data, DCB_BCN_ATTR_MAX, - tb[DCB_ATTR_BCN], dcbnl_pfc_up_nest, + tb[DCB_ATTR_BCN], dcbnl_bcn_nest, NULL); if (ret) return ret; From 9bc3047374d5bec163e83e743709e23753376f0c Mon Sep 17 00:00:00 2001 From: Laszlo Ersek Date: Mon, 31 Jul 2023 18:42:36 +0200 Subject: [PATCH 80/98] net: tun_chr_open(): set sk_uid from current_fsuid() Commit a096ccca6e50 initializes the "sk_uid" field in the protocol socket (struct sock) from the "/dev/net/tun" device node's owner UID. Per original commit 86741ec25462 ("net: core: Add a UID field to struct sock.", 2016-11-04), that's wrong: the idea is to cache the UID of the userspace process that creates the socket. Commit 86741ec25462 mentions socket() and accept(); with "tun", the action that creates the socket is open("/dev/net/tun"). Therefore the device node's owner UID is irrelevant. In most cases, "/dev/net/tun" will be owned by root, so in practice, commit a096ccca6e50 has no observable effect: - before, "sk_uid" would be zero, due to undefined behavior (CVE-2023-1076), - after, "sk_uid" would be zero, due to "/dev/net/tun" being owned by root. What matters is the (fs)UID of the process performing the open(), so cache that in "sk_uid". Cc: Eric Dumazet Cc: Lorenzo Colitti Cc: Paolo Abeni Cc: Pietro Borrello Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Fixes: a096ccca6e50 ("tun: tun_chr_open(): correctly initialize socket uid") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435 Signed-off-by: Laszlo Ersek Signed-off-by: David S. Miller --- drivers/net/tun.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index d75456adc62ac..25f0191df00bf 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -3469,7 +3469,7 @@ static int tun_chr_open(struct inode *inode, struct file * file) tfile->socket.file = file; tfile->socket.ops = &tun_socket_ops; - sock_init_data_uid(&tfile->socket, &tfile->sk, inode->i_uid); + sock_init_data_uid(&tfile->socket, &tfile->sk, current_fsuid()); tfile->sk.sk_write_space = tun_sock_write_space; tfile->sk.sk_sndbuf = INT_MAX; From 5c9241f3ceab3257abe2923a59950db0dc8bb737 Mon Sep 17 00:00:00 2001 From: Laszlo Ersek Date: Mon, 31 Jul 2023 18:42:37 +0200 Subject: [PATCH 81/98] net: tap_open(): set sk_uid from current_fsuid() Commit 66b2c338adce initializes the "sk_uid" field in the protocol socket (struct sock) from the "/dev/tapX" device node's owner UID. Per original commit 86741ec25462 ("net: core: Add a UID field to struct sock.", 2016-11-04), that's wrong: the idea is to cache the UID of the userspace process that creates the socket. Commit 86741ec25462 mentions socket() and accept(); with "tap", the action that creates the socket is open("/dev/tapX"). Therefore the device node's owner UID is irrelevant. In most cases, "/dev/tapX" will be owned by root, so in practice, commit 66b2c338adce has no observable effect: - before, "sk_uid" would be zero, due to undefined behavior (CVE-2023-1076), - after, "sk_uid" would be zero, due to "/dev/tapX" being owned by root. What matters is the (fs)UID of the process performing the open(), so cache that in "sk_uid". Cc: Eric Dumazet Cc: Lorenzo Colitti Cc: Paolo Abeni Cc: Pietro Borrello Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 66b2c338adce ("tap: tap_open(): correctly initialize socket uid") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435 Signed-off-by: Laszlo Ersek Signed-off-by: David S. Miller --- drivers/net/tap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 9137fb8c1c420..49d1d6acf95eb 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -534,7 +534,7 @@ static int tap_open(struct inode *inode, struct file *file) q->sock.state = SS_CONNECTED; q->sock.file = file; q->sock.ops = &tap_socket_ops; - sock_init_data_uid(&q->sock, &q->sk, inode->i_uid); + sock_init_data_uid(&q->sock, &q->sk, current_fsuid()); q->sk.sk_write_space = tap_sock_write_space; q->sk.sk_destruct = tap_sock_destruct; q->flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP; From 1cfef80d4c2b2c599189f36f36320b205d9447d9 Mon Sep 17 00:00:00 2001 From: Alexandra Winter Date: Tue, 1 Aug 2023 10:00:16 +0200 Subject: [PATCH 82/98] s390/qeth: Don't call dev_close/dev_open (DOWN/UP) dev_close() and dev_open() are issued to change the interface state to DOWN or UP (dev->flags IFF_UP). When the netdev is set DOWN it loses e.g its Ipv6 addresses and routes. We don't want this in cases of device recovery (triggered by hardware or software) or when the qeth device is set offline. Setting a qeth device offline or online and device recovery actions call netif_device_detach() and/or netif_device_attach(). That will reset or set the LOWER_UP indication i.e. change the dev->state Bit __LINK_STATE_PRESENT. That is enough to e.g. cause bond failovers, and still preserves the interface settings that are handled by the network stack. Don't call dev_open() nor dev_close() from the qeth device driver. Let the network stack handle this. Fixes: d4560150cb47 ("s390/qeth: call dev_close() during recovery") Signed-off-by: Alexandra Winter Reviewed-by: Wenjia Zhang Signed-off-by: David S. Miller --- drivers/s390/net/qeth_core.h | 1 - drivers/s390/net/qeth_core_main.c | 2 -- drivers/s390/net/qeth_l2_main.c | 9 ++++++--- drivers/s390/net/qeth_l3_main.c | 8 +++++--- 4 files changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h index 1d195429753dd..613eab7297046 100644 --- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -716,7 +716,6 @@ struct qeth_card_info { u16 chid; u8 ids_valid:1; /* cssid,iid,chid */ u8 dev_addr_is_registered:1; - u8 open_when_online:1; u8 promisc_mode:1; u8 use_v1_blkt:1; u8 is_vm_nic:1; diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index 1d5b207c2b9e9..cd783290bde5e 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -5373,8 +5373,6 @@ int qeth_set_offline(struct qeth_card *card, const struct qeth_discipline *disc, qeth_clear_ipacmd_list(card); rtnl_lock(); - card->info.open_when_online = card->dev->flags & IFF_UP; - dev_close(card->dev); netif_device_detach(card->dev); netif_carrier_off(card->dev); rtnl_unlock(); diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c index 9f13ed170a437..75910c0bcc2bc 100644 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -2388,9 +2388,12 @@ static int qeth_l2_set_online(struct qeth_card *card, bool carrier_ok) qeth_enable_hw_features(dev); qeth_l2_enable_brport_features(card); - if (card->info.open_when_online) { - card->info.open_when_online = 0; - dev_open(dev, NULL); + if (netif_running(dev)) { + local_bh_disable(); + napi_schedule(&card->napi); + /* kick-start the NAPI softirq: */ + local_bh_enable(); + qeth_l2_set_rx_mode(dev); } rtnl_unlock(); } diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index af4e60d2917e9..b92a32b4b1141 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -2018,9 +2018,11 @@ static int qeth_l3_set_online(struct qeth_card *card, bool carrier_ok) netif_device_attach(dev); qeth_enable_hw_features(dev); - if (card->info.open_when_online) { - card->info.open_when_online = 0; - dev_open(dev, NULL); + if (netif_running(dev)) { + local_bh_disable(); + napi_schedule(&card->napi); + /* kick-start the NAPI softirq: */ + local_bh_enable(); } rtnl_unlock(); } From 30e0191b16e8a58e4620fa3e2839ddc7b9d4281c Mon Sep 17 00:00:00 2001 From: Yue Haibing Date: Tue, 1 Aug 2023 14:43:18 +0800 Subject: [PATCH 83/98] ip6mr: Fix skb_under_panic in ip6mr_cache_report() skbuff: skb_under_panic: text:ffffffff88771f69 len:56 put:-4 head:ffff88805f86a800 data:ffff887f5f86a850 tail:0x88 end:0x2c0 dev:pim6reg ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:192! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 2 PID: 22968 Comm: kworker/2:11 Not tainted 6.5.0-rc3-00044-g0a8db05b571a #236 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:skb_panic+0x152/0x1d0 Call Trace: skb_push+0xc4/0xe0 ip6mr_cache_report+0xd69/0x19b0 reg_vif_xmit+0x406/0x690 dev_hard_start_xmit+0x17e/0x6e0 __dev_queue_xmit+0x2d6a/0x3d20 vlan_dev_hard_start_xmit+0x3ab/0x5c0 dev_hard_start_xmit+0x17e/0x6e0 __dev_queue_xmit+0x2d6a/0x3d20 neigh_connected_output+0x3ed/0x570 ip6_finish_output2+0x5b5/0x1950 ip6_finish_output+0x693/0x11c0 ip6_output+0x24b/0x880 NF_HOOK.constprop.0+0xfd/0x530 ndisc_send_skb+0x9db/0x1400 ndisc_send_rs+0x12a/0x6c0 addrconf_dad_completed+0x3c9/0xea0 addrconf_dad_work+0x849/0x1420 process_one_work+0xa22/0x16e0 worker_thread+0x679/0x10c0 ret_from_fork+0x28/0x60 ret_from_fork_asm+0x11/0x20 When setup a vlan device on dev pim6reg, DAD ns packet may sent on reg_vif_xmit(). reg_vif_xmit() ip6mr_cache_report() skb_push(skb, -skb_network_offset(pkt));//skb_network_offset(pkt) is 4 And skb_push declared as: void *skb_push(struct sk_buff *skb, unsigned int len); skb->data -= len; //0xffff88805f86a84c - 0xfffffffc = 0xffff887f5f86a850 skb->data is set to 0xffff887f5f86a850, which is invalid mem addr, lead to skb_push() fails. Fixes: 14fb64e1f449 ("[IPV6] MROUTE: Support PIM-SM (SSM).") Signed-off-by: Yue Haibing Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller --- net/ipv6/ip6mr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index cc3d5ad172573..67a3b8f6e72b1 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -1073,7 +1073,7 @@ static int ip6mr_cache_report(const struct mr_table *mrt, struct sk_buff *pkt, And all this only to mangle msg->im6_msgtype and to set msg->im6_mbz to "mbz" :-) */ - skb_push(skb, -skb_network_offset(pkt)); + __skb_pull(skb, skb_network_offset(pkt)); skb_push(skb, sizeof(*msg)); skb_reset_transport_header(skb); From 0756384fb1bd38adb2ebcfd1307422f433a1d772 Mon Sep 17 00:00:00 2001 From: Benjamin Poirier Date: Mon, 31 Jul 2023 16:02:08 -0400 Subject: [PATCH 84/98] vxlan: Fix nexthop hash size The nexthop code expects a 31 bit hash, such as what is returned by fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash returned by skb_get_hash() can lead to problems related to the fact that 'int hash' is a negative number when the MSB is set. In the case of hash threshold nexthop groups, nexthop_select_path_hthr() will disproportionately select the first nexthop group entry. In the case of resilient nexthop groups, nexthop_select_path_res() may do an out of bounds access in nh_buckets[], for example: hash = -912054133 num_nh_buckets = 2 bucket_index = 65535 which leads to the following panic: BUG: unable to handle page fault for address: ffffc900025910c8 PGD 100000067 P4D 100000067 PUD 10026b067 PMD 0 Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI CPU: 4 PID: 856 Comm: kworker/4:3 Not tainted 6.5.0-rc2+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:nexthop_select_path+0x197/0xbf0 Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85 RSP: 0018:ffff88810c36f260 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8 RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219 R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0 R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900 FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc900025910c8 CR3: 0000000129d00000 CR4: 0000000000750ee0 PKRU: 55555554 Call Trace: ? __die+0x23/0x70 ? page_fault_oops+0x1ee/0x5c0 ? __pfx_is_prefetch.constprop.0+0x10/0x10 ? __pfx_page_fault_oops+0x10/0x10 ? search_bpf_extables+0xfe/0x1c0 ? fixup_exception+0x3b/0x470 ? exc_page_fault+0xf6/0x110 ? asm_exc_page_fault+0x26/0x30 ? nexthop_select_path+0x197/0xbf0 ? nexthop_select_path+0x197/0xbf0 ? lock_is_held_type+0xe7/0x140 vxlan_xmit+0x5b2/0x2340 ? __lock_acquire+0x92b/0x3370 ? __pfx_vxlan_xmit+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_register_lock_class+0x10/0x10 ? skb_network_protocol+0xce/0x2d0 ? dev_hard_start_xmit+0xca/0x350 ? __pfx_vxlan_xmit+0x10/0x10 dev_hard_start_xmit+0xca/0x350 __dev_queue_xmit+0x513/0x1e20 ? __pfx___dev_queue_xmit+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? mark_held_locks+0x44/0x90 ? skb_push+0x4c/0x80 ? eth_header+0x81/0xe0 ? __pfx_eth_header+0x10/0x10 ? neigh_resolve_output+0x215/0x310 ? ip6_finish_output2+0x2ba/0xc90 ip6_finish_output2+0x2ba/0xc90 ? lock_release+0x236/0x3e0 ? ip6_mtu+0xbb/0x240 ? __pfx_ip6_finish_output2+0x10/0x10 ? find_held_lock+0x83/0xa0 ? lock_is_held_type+0xe7/0x140 ip6_finish_output+0x1ee/0x780 ip6_output+0x138/0x460 ? __pfx_ip6_output+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 ? __pfx_ip6_finish_output+0x10/0x10 NF_HOOK.constprop.0+0xc0/0x420 ? __pfx_NF_HOOK.constprop.0+0x10/0x10 ? ndisc_send_skb+0x2c0/0x960 ? __pfx_lock_release+0x10/0x10 ? __local_bh_enable_ip+0x93/0x110 ? lock_is_held_type+0xe7/0x140 ndisc_send_skb+0x4be/0x960 ? __pfx_ndisc_send_skb+0x10/0x10 ? mark_held_locks+0x65/0x90 ? find_held_lock+0x83/0xa0 ndisc_send_ns+0xb0/0x110 ? __pfx_ndisc_send_ns+0x10/0x10 addrconf_dad_work+0x631/0x8e0 ? lock_acquire+0x180/0x3f0 ? __pfx_addrconf_dad_work+0x10/0x10 ? mark_held_locks+0x24/0x90 process_one_work+0x582/0x9c0 ? __pfx_process_one_work+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? mark_held_locks+0x24/0x90 worker_thread+0x93/0x630 ? __kthread_parkme+0xdc/0x100 ? __pfx_worker_thread+0x10/0x10 kthread+0x1a5/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 RIP: 0000:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: CR2: ffffc900025910c8 ---[ end trace 0000000000000000 ]--- RIP: 0010:nexthop_select_path+0x197/0xbf0 Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85 RSP: 0018:ffff88810c36f260 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77 RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8 RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219 R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0 R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900 FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000129d00000 CR4: 0000000000750ee0 PKRU: 55555554 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x2ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fix this problem by ensuring the MSB of hash is 0 using a right shift - the same approach used in fib_multipath_hash() and rt6_multipath_hash(). Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries") Signed-off-by: Benjamin Poirier Reviewed-by: Ido Schimmel Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- include/net/vxlan.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/net/vxlan.h b/include/net/vxlan.h index 1648240c96680..6a9f8a5f387ce 100644 --- a/include/net/vxlan.h +++ b/include/net/vxlan.h @@ -556,12 +556,12 @@ static inline void vxlan_flag_attr_error(int attrtype, } static inline bool vxlan_fdb_nh_path_select(struct nexthop *nh, - int hash, + u32 hash, struct vxlan_rdst *rdst) { struct fib_nh_common *nhc; - nhc = nexthop_path_fdb_result(nh, hash); + nhc = nexthop_path_fdb_result(nh, hash >> 1); if (unlikely(!nhc)) return false; From 16e455a465fca91907af0108f3d013150386df30 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Sat, 29 Jul 2023 16:05:00 +0200 Subject: [PATCH 85/98] wifi: brcmfmac: Fix field-spanning write in brcmf_scan_params_v2_to_v1() Using brcmfmac with 6.5-rc3 on a brcmfmac43241b4-sdio triggers a backtrace caused by the following field-spanning warning: memcpy: detected field-spanning write (size 120) of single field "¶ms_le->channel_list[0]" at drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:1072 (size 2) The driver still works after this warning. The warning was introduced by the new field-spanning write checks which were enabled recently. Fix this by replacing the channel_list[1] declaration at the end of the struct with a flexible array declaration. Most users of struct brcmf_scan_params_le calculate the size to alloc using the size of the non flex-array part of the struct + needed extra space, so they do not care about sizeof(struct brcmf_scan_params_le). brcmf_notify_escan_complete() however uses the struct on the stack, expecting there to be room for at least 1 entry in the channel-list to store the special -1 abort channel-id. To make this work use an anonymous union with a padding member added + the actual channel_list flexible array. Cc: Kees Cook Signed-off-by: Hans de Goede Reviewed-by: Kees Cook Reviewed-by: Franky Lin Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20230729140500.27892-1-hdegoede@redhat.com --- .../net/wireless/broadcom/brcm80211/brcmfmac/fwil_types.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwil_types.h b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwil_types.h index 792adaf880b44..bece26741d3a3 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwil_types.h +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwil_types.h @@ -398,7 +398,12 @@ struct brcmf_scan_params_le { * fixed parameter portion is assumed, otherwise * ssid in the fixed portion is ignored */ - __le16 channel_list[1]; /* list of chanspecs */ + union { + __le16 padding; /* Reserve space for at least 1 entry for abort + * which uses an on stack brcmf_scan_params_le + */ + DECLARE_FLEX_ARRAY(__le16, channel_list); /* chanspecs */ + }; }; struct brcmf_scan_params_v2_le { From 618d28a535a0582617465d14e05f3881736a2962 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 31 Jul 2023 14:58:40 +0300 Subject: [PATCH 86/98] net/mlx5: fs_core: Make find_closest_ft more generic As find_closest_ft_recursive is called to find the closest FT, the first parameter of find_closest_ft can be changed from fs_prio to fs_node. Thus this function is extended to find the closest FT for the nodes of any type, not only prios, but also the sub namespaces. Signed-off-by: Jianbo Liu Signed-off-by: Leon Romanovsky Link: https://lore.kernel.org/r/d3962c2b443ec8dde7a740dc742a1f052d5e256c.1690803944.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski --- .../net/ethernet/mellanox/mlx5/core/fs_core.c | 29 +++++++++---------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index b4eb27e7f28b1..221723694d442 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -905,18 +905,17 @@ static struct mlx5_flow_table *find_closest_ft_recursive(struct fs_node *root, return ft; } -/* If reverse is false then return the first flow table in next priority of - * prio in the tree, else return the last flow table in the previous priority - * of prio in the tree. +/* If reverse is false then return the first flow table next to the passed node + * in the tree, else return the last flow table before the node in the tree. */ -static struct mlx5_flow_table *find_closest_ft(struct fs_prio *prio, bool reverse) +static struct mlx5_flow_table *find_closest_ft(struct fs_node *node, bool reverse) { struct mlx5_flow_table *ft = NULL; struct fs_node *curr_node; struct fs_node *parent; - parent = prio->node.parent; - curr_node = &prio->node; + parent = node->parent; + curr_node = node; while (!ft && parent) { ft = find_closest_ft_recursive(parent, &curr_node->list, reverse); curr_node = parent; @@ -926,15 +925,15 @@ static struct mlx5_flow_table *find_closest_ft(struct fs_prio *prio, bool revers } /* Assuming all the tree is locked by mutex chain lock */ -static struct mlx5_flow_table *find_next_chained_ft(struct fs_prio *prio) +static struct mlx5_flow_table *find_next_chained_ft(struct fs_node *node) { - return find_closest_ft(prio, false); + return find_closest_ft(node, false); } /* Assuming all the tree is locked by mutex chain lock */ -static struct mlx5_flow_table *find_prev_chained_ft(struct fs_prio *prio) +static struct mlx5_flow_table *find_prev_chained_ft(struct fs_node *node) { - return find_closest_ft(prio, true); + return find_closest_ft(node, true); } static struct mlx5_flow_table *find_next_fwd_ft(struct mlx5_flow_table *ft, @@ -946,7 +945,7 @@ static struct mlx5_flow_table *find_next_fwd_ft(struct mlx5_flow_table *ft, next_ns = flow_act->action & MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_NS; fs_get_obj(prio, next_ns ? ft->ns->node.parent : ft->node.parent); - return find_next_chained_ft(prio); + return find_next_chained_ft(&prio->node); } static int connect_fts_in_prio(struct mlx5_core_dev *dev, @@ -977,7 +976,7 @@ static int connect_prev_fts(struct mlx5_core_dev *dev, { struct mlx5_flow_table *prev_ft; - prev_ft = find_prev_chained_ft(prio); + prev_ft = find_prev_chained_ft(&prio->node); if (prev_ft) { struct fs_prio *prev_prio; @@ -1123,7 +1122,7 @@ static int connect_flow_table(struct mlx5_core_dev *dev, struct mlx5_flow_table if (err) return err; - next_ft = first_ft ? first_ft : find_next_chained_ft(prio); + next_ft = first_ft ? first_ft : find_next_chained_ft(&prio->node); err = connect_fwd_rules(dev, ft, next_ft); if (err) return err; @@ -1198,7 +1197,7 @@ static struct mlx5_flow_table *__mlx5_create_flow_table(struct mlx5_flow_namespa tree_init_node(&ft->node, del_hw_flow_table, del_sw_flow_table); next_ft = unmanaged ? ft_attr->next_ft : - find_next_chained_ft(fs_prio); + find_next_chained_ft(&fs_prio->node); ft->def_miss_action = ns->def_miss_action; ft->ns = ns; err = root->cmds->create_flow_table(root, ft, ft_attr, next_ft); @@ -2201,7 +2200,7 @@ static struct mlx5_flow_table *find_next_ft(struct mlx5_flow_table *ft) if (!list_is_last(&ft->node.list, &prio->node.children)) return list_next_entry(ft, node.list); - return find_next_chained_ft(prio); + return find_next_chained_ft(&prio->node); } static int update_root_ft_destroy(struct mlx5_flow_table *ft) From c635ca45a7a2023904a1f851e99319af7b87017d Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Mon, 31 Jul 2023 14:58:41 +0300 Subject: [PATCH 87/98] net/mlx5: fs_core: Skip the FTs in the same FS_TYPE_PRIO_CHAINS fs_prio In the cited commit, new type of FS_TYPE_PRIO_CHAINS fs_prio was added to support multiple parallel namespaces for multi-chains. And we skip all the flow tables under the fs_node of this type unconditionally, when searching for the next or previous flow table to connect for a new table. As this search function is also used for find new root table when the old one is being deleted, it will skip the entire FS_TYPE_PRIO_CHAINS fs_node next to the old root. However, new root table should be chosen from it if there is any table in it. Fix it by skipping only the flow tables in the same FS_TYPE_PRIO_CHAINS fs_node when finding the closest FT for a fs_node. Besides, complete the connecting from FTs of previous priority of prio because there should be multiple prevs after this fs_prio type is introduced. And also the next FT should be chosen from the first flow table next to the prio in the same FS_TYPE_PRIO_CHAINS fs_prio, if this prio is the first child. Fixes: 328edb499f99 ("net/mlx5: Split FDB fast path prio to multiple namespaces") Signed-off-by: Jianbo Liu Reviewed-by: Paul Blakey Signed-off-by: Leon Romanovsky Link: https://lore.kernel.org/r/7a95754df479e722038996c97c97b062b372591f.1690803944.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski --- .../net/ethernet/mellanox/mlx5/core/fs_core.c | 80 +++++++++++++++++-- 1 file changed, 72 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 221723694d442..6b069fa411c5a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -889,7 +889,7 @@ static struct mlx5_flow_table *find_closest_ft_recursive(struct fs_node *root, struct fs_node *iter = list_entry(start, struct fs_node, list); struct mlx5_flow_table *ft = NULL; - if (!root || root->type == FS_TYPE_PRIO_CHAINS) + if (!root) return NULL; list_for_each_advance_continue(iter, &root->children, reverse) { @@ -905,19 +905,42 @@ static struct mlx5_flow_table *find_closest_ft_recursive(struct fs_node *root, return ft; } +static struct fs_node *find_prio_chains_parent(struct fs_node *parent, + struct fs_node **child) +{ + struct fs_node *node = NULL; + + while (parent && parent->type != FS_TYPE_PRIO_CHAINS) { + node = parent; + parent = parent->parent; + } + + if (child) + *child = node; + + return parent; +} + /* If reverse is false then return the first flow table next to the passed node * in the tree, else return the last flow table before the node in the tree. + * If skip is true, skip the flow tables in the same prio_chains prio. */ -static struct mlx5_flow_table *find_closest_ft(struct fs_node *node, bool reverse) +static struct mlx5_flow_table *find_closest_ft(struct fs_node *node, bool reverse, + bool skip) { + struct fs_node *prio_chains_parent = NULL; struct mlx5_flow_table *ft = NULL; struct fs_node *curr_node; struct fs_node *parent; + if (skip) + prio_chains_parent = find_prio_chains_parent(node, NULL); parent = node->parent; curr_node = node; while (!ft && parent) { - ft = find_closest_ft_recursive(parent, &curr_node->list, reverse); + if (parent != prio_chains_parent) + ft = find_closest_ft_recursive(parent, &curr_node->list, + reverse); curr_node = parent; parent = curr_node->parent; } @@ -927,13 +950,13 @@ static struct mlx5_flow_table *find_closest_ft(struct fs_node *node, bool revers /* Assuming all the tree is locked by mutex chain lock */ static struct mlx5_flow_table *find_next_chained_ft(struct fs_node *node) { - return find_closest_ft(node, false); + return find_closest_ft(node, false, true); } /* Assuming all the tree is locked by mutex chain lock */ static struct mlx5_flow_table *find_prev_chained_ft(struct fs_node *node) { - return find_closest_ft(node, true); + return find_closest_ft(node, true, true); } static struct mlx5_flow_table *find_next_fwd_ft(struct mlx5_flow_table *ft, @@ -969,21 +992,55 @@ static int connect_fts_in_prio(struct mlx5_core_dev *dev, return 0; } +static struct mlx5_flow_table *find_closet_ft_prio_chains(struct fs_node *node, + struct fs_node *parent, + struct fs_node **child, + bool reverse) +{ + struct mlx5_flow_table *ft; + + ft = find_closest_ft(node, reverse, false); + + if (ft && parent == find_prio_chains_parent(&ft->node, child)) + return ft; + + return NULL; +} + /* Connect flow tables from previous priority of prio to ft */ static int connect_prev_fts(struct mlx5_core_dev *dev, struct mlx5_flow_table *ft, struct fs_prio *prio) { + struct fs_node *prio_parent, *parent = NULL, *child, *node; struct mlx5_flow_table *prev_ft; + int err = 0; + + prio_parent = find_prio_chains_parent(&prio->node, &child); + + /* return directly if not under the first sub ns of prio_chains prio */ + if (prio_parent && !list_is_first(&child->list, &prio_parent->children)) + return 0; prev_ft = find_prev_chained_ft(&prio->node); - if (prev_ft) { + while (prev_ft) { struct fs_prio *prev_prio; fs_get_obj(prev_prio, prev_ft->node.parent); - return connect_fts_in_prio(dev, prev_prio, ft); + err = connect_fts_in_prio(dev, prev_prio, ft); + if (err) + break; + + if (!parent) { + parent = find_prio_chains_parent(&prev_prio->node, &child); + if (!parent) + break; + } + + node = child; + prev_ft = find_closet_ft_prio_chains(node, parent, &child, true); } - return 0; + return err; } static int update_root_ft_create(struct mlx5_flow_table *ft, struct fs_prio @@ -2194,12 +2251,19 @@ EXPORT_SYMBOL(mlx5_del_flow_rules); /* Assuming prio->node.children(flow tables) is sorted by level */ static struct mlx5_flow_table *find_next_ft(struct mlx5_flow_table *ft) { + struct fs_node *prio_parent, *child; struct fs_prio *prio; fs_get_obj(prio, ft->node.parent); if (!list_is_last(&ft->node.list, &prio->node.children)) return list_next_entry(ft, node.list); + + prio_parent = find_prio_chains_parent(&prio->node, &child); + + if (prio_parent && list_is_first(&child->list, &prio_parent->children)) + return find_closest_ft(&prio->node, false, false); + return find_next_chained_ft(&prio->node); } From 62da08331f1a2bef9d0148613133ce8e640a2f8d Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Mon, 31 Jul 2023 14:58:42 +0300 Subject: [PATCH 88/98] net/mlx5e: Set proper IPsec source port in L4 selector Fix typo in setup_fte_upper_proto_match() where destination UDP port was used instead of source port. Fixes: a7385187a386 ("net/mlx5e: IPsec, support upper protocol selector field offload") Signed-off-by: Leon Romanovsky Link: https://lore.kernel.org/r/ffc024a4d192113103f392b0502688366ca88c1f.1690803944.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c index dbe87bf89c0dd..832d36be4a17b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c @@ -808,9 +808,9 @@ static void setup_fte_upper_proto_match(struct mlx5_flow_spec *spec, struct upsp } if (upspec->sport) { - MLX5_SET(fte_match_set_lyr_2_4, spec->match_criteria, udp_dport, + MLX5_SET(fte_match_set_lyr_2_4, spec->match_criteria, udp_sport, upspec->sport_mask); - MLX5_SET(fte_match_set_lyr_2_4, spec->match_value, udp_dport, upspec->sport); + MLX5_SET(fte_match_set_lyr_2_4, spec->match_value, udp_sport, upspec->sport); } } From 0f71c9caf26726efea674646f566984e735cc3b9 Mon Sep 17 00:00:00 2001 From: David Howells Date: Tue, 1 Aug 2023 16:48:53 +0100 Subject: [PATCH 89/98] udp: Fix __ip_append_data()'s handling of MSG_SPLICE_PAGES __ip_append_data() can get into an infinite loop when asked to splice into a partially-built UDP message that has more than the frag-limit data and up to the MTU limit. Something like: pipe(pfd); sfd = socket(AF_INET, SOCK_DGRAM, 0); connect(sfd, ...); send(sfd, buffer, 8161, MSG_CONFIRM|MSG_MORE); write(pfd[1], buffer, 8); splice(pfd[0], 0, sfd, 0, 0x4ffe0ul, 0); where the amount of data given to send() is dependent on the MTU size (in this instance an interface with an MTU of 8192). The problem is that the calculation of the amount to copy in __ip_append_data() goes negative in two places, and, in the second place, this gets subtracted from the length remaining, thereby increasing it. This happens when pagedlen > 0 (which happens for MSG_ZEROCOPY and MSG_SPLICE_PAGES), because the terms in: copy = datalen - transhdrlen - fraggap - pagedlen; then mostly cancel when pagedlen is substituted for, leaving just -fraggap. This causes: length -= copy + transhdrlen; to increase the length to more than the amount of data in msg->msg_iter, which causes skb_splice_from_iter() to be unable to fill the request and it returns less than 'copied' - which means that length never gets to 0 and we never exit the loop. Fix this by: (1) Insert a note about the dodgy calculation of 'copy'. (2) If MSG_SPLICE_PAGES, clear copy if it is negative from the above equation, so that 'offset' isn't regressed and 'length' isn't increased, which will mean that length and thus copy should match the amount left in the iterator. (3) When handling MSG_SPLICE_PAGES, give a warning and return -EIO if we're asked to splice more than is in the iterator. It might be better to not give the warning or even just give a 'short' write. [!] Note that this ought to also affect MSG_ZEROCOPY, but MSG_ZEROCOPY avoids the problem by simply assuming that everything asked for got copied, not just the amount that was in the iterator. This is a potential bug for the future. Fixes: 7ac7c987850c ("udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES") Reported-by: syzbot+f527b971b4bdc8e79f9e@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/000000000000881d0606004541d1@google.com/ Signed-off-by: David Howells cc: David Ahern cc: Jens Axboe Reviewed-by: Willem de Bruijn Link: https://lore.kernel.org/r/1420063.1690904933@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski --- net/ipv4/ip_output.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 54d2d3a2d850e..6ba1a0fafbaab 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1158,10 +1158,15 @@ static int __ip_append_data(struct sock *sk, } copy = datalen - transhdrlen - fraggap - pagedlen; + /* [!] NOTE: copy will be negative if pagedlen>0 + * because then the equation reduces to -fraggap. + */ if (copy > 0 && getfrag(from, data + transhdrlen, offset, copy, fraggap, skb) < 0) { err = -EFAULT; kfree_skb(skb); goto error; + } else if (flags & MSG_SPLICE_PAGES) { + copy = 0; } offset += copy; @@ -1209,6 +1214,10 @@ static int __ip_append_data(struct sock *sk, } else if (flags & MSG_SPLICE_PAGES) { struct msghdr *msg = from; + err = -EIO; + if (WARN_ON_ONCE(copy > msg->msg_iter.count)) + goto error; + err = skb_splice_from_iter(skb, &msg->msg_iter, copy, sk->sk_allocation); if (err < 0) From b755c25fbcd568821a3bb0e0d5c2daa5fcb00bba Mon Sep 17 00:00:00 2001 From: Jonas Gorski Date: Wed, 2 Aug 2023 11:23:56 +0200 Subject: [PATCH 90/98] prestera: fix fallback to previous version on same major version When both supported and previous version have the same major version, and the firmwares are missing, the driver ends in a loop requesting the same (previous) version over and over again: [ 76.327413] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.1.img firmware, fall-back to previous 4.0 version [ 76.339802] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version [ 76.352162] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version [ 76.364502] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version [ 76.376848] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version [ 76.389183] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version [ 76.401522] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version [ 76.413860] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version [ 76.426199] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version ... Fix this by inverting the check to that we aren't yet at the previous version, and also check the minor version. This also catches the case where both versions are the same, as it was after commit bb5dbf2cc64d ("net: marvell: prestera: add firmware v4.0 support"). With this fix applied: [ 88.499622] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.1.img firmware, fall-back to previous 4.0 version [ 88.511995] Prestera DX 0000:01:00.0: failed to request previous firmware: mrvl/prestera/mvsw_prestera_fw-v4.0.img [ 88.522403] Prestera DX: probe of 0000:01:00.0 failed with error -2 Fixes: 47f26018a414 ("net: marvell: prestera: try to load previous fw version") Signed-off-by: Jonas Gorski Acked-by: Elad Nachman Reviewed-by: Jesse Brandeburg Acked-by: Taras Chornyi Link: https://lore.kernel.org/r/20230802092357.163944-1-jonas.gorski@bisdn.de Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/marvell/prestera/prestera_pci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/prestera/prestera_pci.c b/drivers/net/ethernet/marvell/prestera/prestera_pci.c index f328d957b2db7..35857dc19542f 100644 --- a/drivers/net/ethernet/marvell/prestera/prestera_pci.c +++ b/drivers/net/ethernet/marvell/prestera/prestera_pci.c @@ -727,7 +727,8 @@ static int prestera_fw_get(struct prestera_fw *fw) err = request_firmware_direct(&fw->bin, fw_path, fw->dev.dev); if (err) { - if (ver_maj == PRESTERA_SUPP_FW_MAJ_VER) { + if (ver_maj != PRESTERA_PREV_FW_MAJ_VER || + ver_min != PRESTERA_PREV_FW_MIN_VER) { ver_maj = PRESTERA_PREV_FW_MAJ_VER; ver_min = PRESTERA_PREV_FW_MIN_VER; From e6638094d7af6c7b9dcca05ad009e79e31b4f670 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 2 Aug 2023 13:14:55 +0000 Subject: [PATCH 91/98] tcp_metrics: fix addr_same() helper Because v4 and v6 families use separate inetpeer trees (respectively net->ipv4.peers and net->ipv6.peers), inetpeer_addr_cmp(a, b) assumes a & b share the same family. tcp_metrics use a common hash table, where entries can have different families. We must therefore make sure to not call inetpeer_addr_cmp() if the families do not match. Fixes: d39d14ffa24c ("net: Add helper function to compare inetpeer addresses") Signed-off-by: Eric Dumazet Reviewed-by: David Ahern Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20230802131500.1478140-2-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp_metrics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 82f4575f9cd90..c4daf0aa2d4d9 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -78,7 +78,7 @@ static void tcp_metric_set(struct tcp_metrics_block *tm, static bool addr_same(const struct inetpeer_addr *a, const struct inetpeer_addr *b) { - return inetpeer_addr_cmp(a, b) == 0; + return (a->family == b->family) && !inetpeer_addr_cmp(a, b); } struct tcpm_hash_bucket { From 949ad62a5d5311d36fce2e14fe5fed3f936da51c Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 2 Aug 2023 13:14:56 +0000 Subject: [PATCH 92/98] tcp_metrics: annotate data-races around tm->tcpm_stamp tm->tcpm_stamp can be read or written locklessly. Add needed READ_ONCE()/WRITE_ONCE() to document this. Also constify tcpm_check_stamp() dst argument. Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.") Signed-off-by: Eric Dumazet Reviewed-by: David Ahern Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20230802131500.1478140-3-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp_metrics.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index c4daf0aa2d4d9..8386165887963 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -97,7 +97,7 @@ static void tcpm_suck_dst(struct tcp_metrics_block *tm, u32 msval; u32 val; - tm->tcpm_stamp = jiffies; + WRITE_ONCE(tm->tcpm_stamp, jiffies); val = 0; if (dst_metric_locked(dst, RTAX_RTT)) @@ -131,9 +131,15 @@ static void tcpm_suck_dst(struct tcp_metrics_block *tm, #define TCP_METRICS_TIMEOUT (60 * 60 * HZ) -static void tcpm_check_stamp(struct tcp_metrics_block *tm, struct dst_entry *dst) +static void tcpm_check_stamp(struct tcp_metrics_block *tm, + const struct dst_entry *dst) { - if (tm && unlikely(time_after(jiffies, tm->tcpm_stamp + TCP_METRICS_TIMEOUT))) + unsigned long limit; + + if (!tm) + return; + limit = READ_ONCE(tm->tcpm_stamp) + TCP_METRICS_TIMEOUT; + if (unlikely(time_after(jiffies, limit))) tcpm_suck_dst(tm, dst, false); } @@ -174,7 +180,8 @@ static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst, oldest = deref_locked(tcp_metrics_hash[hash].chain); for (tm = deref_locked(oldest->tcpm_next); tm; tm = deref_locked(tm->tcpm_next)) { - if (time_before(tm->tcpm_stamp, oldest->tcpm_stamp)) + if (time_before(READ_ONCE(tm->tcpm_stamp), + READ_ONCE(oldest->tcpm_stamp))) oldest = tm; } tm = oldest; @@ -434,7 +441,7 @@ void tcp_update_metrics(struct sock *sk) tp->reordering); } } - tm->tcpm_stamp = jiffies; + WRITE_ONCE(tm->tcpm_stamp, jiffies); out_unlock: rcu_read_unlock(); } @@ -647,7 +654,7 @@ static int tcp_metrics_fill_info(struct sk_buff *msg, } if (nla_put_msecs(msg, TCP_METRICS_ATTR_AGE, - jiffies - tm->tcpm_stamp, + jiffies - READ_ONCE(tm->tcpm_stamp), TCP_METRICS_ATTR_PAD) < 0) goto nla_put_failure; From 285ce119a3c6c4502585936650143e54c8692788 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 2 Aug 2023 13:14:57 +0000 Subject: [PATCH 93/98] tcp_metrics: annotate data-races around tm->tcpm_lock tm->tcpm_lock can be read or written locklessly. Add needed READ_ONCE()/WRITE_ONCE() to document this. Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.") Signed-off-by: Eric Dumazet Reviewed-by: David Ahern Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20230802131500.1478140-4-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp_metrics.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 8386165887963..131fa30049691 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -59,7 +59,8 @@ static inline struct net *tm_net(struct tcp_metrics_block *tm) static bool tcp_metric_locked(struct tcp_metrics_block *tm, enum tcp_metric_index idx) { - return tm->tcpm_lock & (1 << idx); + /* Paired with WRITE_ONCE() in tcpm_suck_dst() */ + return READ_ONCE(tm->tcpm_lock) & (1 << idx); } static u32 tcp_metric_get(struct tcp_metrics_block *tm, @@ -110,7 +111,8 @@ static void tcpm_suck_dst(struct tcp_metrics_block *tm, val |= 1 << TCP_METRIC_CWND; if (dst_metric_locked(dst, RTAX_REORDERING)) val |= 1 << TCP_METRIC_REORDERING; - tm->tcpm_lock = val; + /* Paired with READ_ONCE() in tcp_metric_locked() */ + WRITE_ONCE(tm->tcpm_lock, val); msval = dst_metric_raw(dst, RTAX_RTT); tm->tcpm_vals[TCP_METRIC_RTT] = msval * USEC_PER_MSEC; From 8c4d04f6b443869d25e59822f7cec88d647028a9 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 2 Aug 2023 13:14:58 +0000 Subject: [PATCH 94/98] tcp_metrics: annotate data-races around tm->tcpm_vals[] tm->tcpm_vals[] values can be read or written locklessly. Add needed READ_ONCE()/WRITE_ONCE() to document this, and force use of tcp_metric_get() and tcp_metric_set() Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.") Signed-off-by: Eric Dumazet Reviewed-by: David Ahern Reviewed-by: Kuniyuki Iwashima Signed-off-by: Jakub Kicinski --- net/ipv4/tcp_metrics.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 131fa30049691..fd4ab7a51cef2 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -63,17 +63,19 @@ static bool tcp_metric_locked(struct tcp_metrics_block *tm, return READ_ONCE(tm->tcpm_lock) & (1 << idx); } -static u32 tcp_metric_get(struct tcp_metrics_block *tm, +static u32 tcp_metric_get(const struct tcp_metrics_block *tm, enum tcp_metric_index idx) { - return tm->tcpm_vals[idx]; + /* Paired with WRITE_ONCE() in tcp_metric_set() */ + return READ_ONCE(tm->tcpm_vals[idx]); } static void tcp_metric_set(struct tcp_metrics_block *tm, enum tcp_metric_index idx, u32 val) { - tm->tcpm_vals[idx] = val; + /* Paired with READ_ONCE() in tcp_metric_get() */ + WRITE_ONCE(tm->tcpm_vals[idx], val); } static bool addr_same(const struct inetpeer_addr *a, @@ -115,13 +117,16 @@ static void tcpm_suck_dst(struct tcp_metrics_block *tm, WRITE_ONCE(tm->tcpm_lock, val); msval = dst_metric_raw(dst, RTAX_RTT); - tm->tcpm_vals[TCP_METRIC_RTT] = msval * USEC_PER_MSEC; + tcp_metric_set(tm, TCP_METRIC_RTT, msval * USEC_PER_MSEC); msval = dst_metric_raw(dst, RTAX_RTTVAR); - tm->tcpm_vals[TCP_METRIC_RTTVAR] = msval * USEC_PER_MSEC; - tm->tcpm_vals[TCP_METRIC_SSTHRESH] = dst_metric_raw(dst, RTAX_SSTHRESH); - tm->tcpm_vals[TCP_METRIC_CWND] = dst_metric_raw(dst, RTAX_CWND); - tm->tcpm_vals[TCP_METRIC_REORDERING] = dst_metric_raw(dst, RTAX_REORDERING); + tcp_metric_set(tm, TCP_METRIC_RTTVAR, msval * USEC_PER_MSEC); + tcp_metric_set(tm, TCP_METRIC_SSTHRESH, + dst_metric_raw(dst, RTAX_SSTHRESH)); + tcp_metric_set(tm, TCP_METRIC_CWND, + dst_metric_raw(dst, RTAX_CWND)); + tcp_metric_set(tm, TCP_METRIC_REORDERING, + dst_metric_raw(dst, RTAX_REORDERING)); if (fastopen_clear) { tm->tcpm_fastopen.mss = 0; tm->tcpm_fastopen.syn_loss = 0; @@ -667,7 +672,7 @@ static int tcp_metrics_fill_info(struct sk_buff *msg, if (!nest) goto nla_put_failure; for (i = 0; i < TCP_METRIC_MAX_KERNEL + 1; i++) { - u32 val = tm->tcpm_vals[i]; + u32 val = tcp_metric_get(tm, i); if (!val) continue; From d5d986ce42c71a7562d32c4e21e026b0f87befec Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 2 Aug 2023 13:14:59 +0000 Subject: [PATCH 95/98] tcp_metrics: annotate data-races around tm->tcpm_net tm->tcpm_net can be read or written locklessly. Instead of changing write_pnet() and read_pnet() and potentially hurt performance, add the needed READ_ONCE()/WRITE_ONCE() in tm_net() and tcpm_new(). Fixes: 849e8a0ca8d5 ("tcp_metrics: Add a field tcpm_net and verify it matches on lookup") Signed-off-by: Eric Dumazet Reviewed-by: David Ahern Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20230802131500.1478140-6-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp_metrics.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index fd4ab7a51cef2..4fd274836a48f 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -40,7 +40,7 @@ struct tcp_fastopen_metrics { struct tcp_metrics_block { struct tcp_metrics_block __rcu *tcpm_next; - possible_net_t tcpm_net; + struct net *tcpm_net; struct inetpeer_addr tcpm_saddr; struct inetpeer_addr tcpm_daddr; unsigned long tcpm_stamp; @@ -51,9 +51,10 @@ struct tcp_metrics_block { struct rcu_head rcu_head; }; -static inline struct net *tm_net(struct tcp_metrics_block *tm) +static inline struct net *tm_net(const struct tcp_metrics_block *tm) { - return read_pnet(&tm->tcpm_net); + /* Paired with the WRITE_ONCE() in tcpm_new() */ + return READ_ONCE(tm->tcpm_net); } static bool tcp_metric_locked(struct tcp_metrics_block *tm, @@ -197,7 +198,9 @@ static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst, if (!tm) goto out_unlock; } - write_pnet(&tm->tcpm_net, net); + /* Paired with the READ_ONCE() in tm_net() */ + WRITE_ONCE(tm->tcpm_net, net); + tm->tcpm_saddr = *saddr; tm->tcpm_daddr = *daddr; From ddf251fa2bc1d3699eec0bae6ed0bc373b8fda79 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 2 Aug 2023 13:15:00 +0000 Subject: [PATCH 96/98] tcp_metrics: fix data-race in tcpm_suck_dst() vs fastopen Whenever tcpm_new() reclaims an old entry, tcpm_suck_dst() would overwrite data that could be read from tcp_fastopen_cache_get() or tcp_metrics_fill_info(). We need to acquire fastopen_seqlock to maintain consistency. For newly allocated objects, tcpm_new() can switch to kzalloc() to avoid an extra fastopen_seqlock acquisition. Fixes: 1fe4c481ba63 ("net-tcp: Fast Open client - cookie cache") Signed-off-by: Eric Dumazet Cc: Yuchung Cheng Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20230802131500.1478140-7-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp_metrics.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 4fd274836a48f..99ac5efe244d3 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -93,6 +93,7 @@ static struct tcpm_hash_bucket *tcp_metrics_hash __read_mostly; static unsigned int tcp_metrics_hash_log __read_mostly; static DEFINE_SPINLOCK(tcp_metrics_lock); +static DEFINE_SEQLOCK(fastopen_seqlock); static void tcpm_suck_dst(struct tcp_metrics_block *tm, const struct dst_entry *dst, @@ -129,11 +130,13 @@ static void tcpm_suck_dst(struct tcp_metrics_block *tm, tcp_metric_set(tm, TCP_METRIC_REORDERING, dst_metric_raw(dst, RTAX_REORDERING)); if (fastopen_clear) { + write_seqlock(&fastopen_seqlock); tm->tcpm_fastopen.mss = 0; tm->tcpm_fastopen.syn_loss = 0; tm->tcpm_fastopen.try_exp = 0; tm->tcpm_fastopen.cookie.exp = false; tm->tcpm_fastopen.cookie.len = 0; + write_sequnlock(&fastopen_seqlock); } } @@ -194,7 +197,7 @@ static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst, } tm = oldest; } else { - tm = kmalloc(sizeof(*tm), GFP_ATOMIC); + tm = kzalloc(sizeof(*tm), GFP_ATOMIC); if (!tm) goto out_unlock; } @@ -204,7 +207,7 @@ static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst, tm->tcpm_saddr = *saddr; tm->tcpm_daddr = *daddr; - tcpm_suck_dst(tm, dst, true); + tcpm_suck_dst(tm, dst, reclaim); if (likely(!reclaim)) { tm->tcpm_next = tcp_metrics_hash[hash].chain; @@ -556,8 +559,6 @@ bool tcp_peer_is_proven(struct request_sock *req, struct dst_entry *dst) return ret; } -static DEFINE_SEQLOCK(fastopen_seqlock); - void tcp_fastopen_cache_get(struct sock *sk, u16 *mss, struct tcp_fastopen_cookie *cookie) { From 3c50c8b240390907c9a33c86d25d850520db6dfa Mon Sep 17 00:00:00 2001 From: Stefano Garzarella Date: Thu, 3 Aug 2023 10:54:54 +0200 Subject: [PATCH 97/98] test/vsock: remove vsock_perf executable on `make clean` We forgot to add vsock_perf to the rm command in the `clean` target, so now we have a left over after `make clean` in tools/testing/vsock. Fixes: 8abbffd27ced ("test/vsock: vsock_perf utility") Cc: AVKrasnov@sberdevices.ru Signed-off-by: Stefano Garzarella Reviewed-by: Simon Horman Tested-by: Simon Horman # build-tested Link: https://lore.kernel.org/r/20230803085454.30897-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski --- tools/testing/vsock/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile index 43a254f0e14dd..21a98ba565ab5 100644 --- a/tools/testing/vsock/Makefile +++ b/tools/testing/vsock/Makefile @@ -8,5 +8,5 @@ vsock_perf: vsock_perf.o CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE .PHONY: all test clean clean: - ${RM} *.o *.d vsock_test vsock_diag_test + ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf -include *.d From 0765c5f293357ee43eca72e27c3547f9d99ac355 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 2 Aug 2023 11:28:43 -0700 Subject: [PATCH 98/98] MAINTAINERS: update TUN/TAP maintainers Willem and Jason have agreed to take over the maintainer duties for TUN/TAP, thank you! There's an existing entry for TUN/TAP which only covers the user mode Linux implementation. Since we haven't heard from Maxim on the list for almost a decade, extend that entry and take it over, rather than adding a new one. Acked-by: Willem de Bruijn Acked-by: Jason Wang Link: https://lore.kernel.org/r/20230802182843.4193099-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- MAINTAINERS | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 1085dfb357776..8f93c48f69319 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -21670,11 +21670,14 @@ S: Orphan F: drivers/net/ethernet/dec/tulip/ TUN/TAP driver -M: Maxim Krasnyansky +M: Willem de Bruijn +M: Jason Wang S: Maintained W: http://vtun.sourceforge.net/tun F: Documentation/networking/tuntap.rst F: arch/um/os-Linux/drivers/ +F: drivers/net/tap.c +F: drivers/net/tun.c TURBOCHANNEL SUBSYSTEM M: "Maciej W. Rozycki"