From ac9c5e92dd15b9927e7355ccf79df76a58b44344 Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Thu, 15 Dec 2022 16:51:55 +0000 Subject: [PATCH 1/6] iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY even betterer Although it's vanishingly unlikely that anyone would integrate an SMMU within a coherent interconnect without also making the pagetable walk interface coherent, the same effect happens if a coherent SMMU fails to advertise CTTW correctly. This turns out to be the case on some popular NXP SoCs, where VFIO started failing the IOMMU_CAP_CACHE_COHERENCY test, even though IOMMU_CACHE *was* previously achieving the desired effect anyway thanks to the underlying integration. While those SoCs stand to gain some more general benefits from a firmware update to override CTTW correctly in DT/ACPI, it's also easy to work around this in Linux as well, to avoid imposing too much on affected users - since the upstream client devices *are* correctly marked as coherent, we can trivially infer their coherent paths through the SMMU as well. Reported-by: Vladimir Oltean Fixes: df198b37e72c ("iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY better") Signed-off-by: Robin Murphy Tested-by: Vladimir Oltean Link: https://lore.kernel.org/r/d6dc41952961e5c7b21acac08a8bf1eb0f69e124.1671123115.git.robin.murphy@arm.com Signed-off-by: Will Deacon --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 719fbca1fe52a..443bc517277ba 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -1316,8 +1316,14 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap) switch (cap) { case IOMMU_CAP_CACHE_COHERENCY: - /* Assume that a coherent TCU implies coherent TBUs */ - return cfg->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK; + /* + * It's overwhelmingly the case in practice that when the pagetable + * walk interface is connected to a coherent interconnect, all the + * translation interfaces are too. Furthermore if the device is + * natively coherent, then its translation interface must also be. + */ + return cfg->smmu->features & ARM_SMMU_FEAT_COHERENT_WALK || + device_get_dma_attr(dev) == DEV_DMA_COHERENT; case IOMMU_CAP_NOEXEC: return true; default: From ce31e6ca68bd7639bd3e5ef97be215031842bbab Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 15 Dec 2022 16:12:50 +0200 Subject: [PATCH 2/6] iommu/arm-smmu: Don't unregister on shutdown Michael Walle says he noticed the following stack trace while performing a shutdown with "reboot -f". He suggests he got "lucky" and just hit the correct spot for the reboot while there was a packet transmission in flight. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000098 CPU: 0 PID: 23 Comm: kworker/0:1 Not tainted 6.1.0-rc5-00088-gf3600ff8e322 #1930 Hardware name: Kontron KBox A-230-LS (DT) pc : iommu_get_dma_domain+0x14/0x20 lr : iommu_dma_map_page+0x9c/0x254 Call trace: iommu_get_dma_domain+0x14/0x20 dma_map_page_attrs+0x1ec/0x250 enetc_start_xmit+0x14c/0x10b0 enetc_xmit+0x60/0xdc dev_hard_start_xmit+0xb8/0x210 sch_direct_xmit+0x11c/0x420 __dev_queue_xmit+0x354/0xb20 ip6_finish_output2+0x280/0x5b0 __ip6_finish_output+0x15c/0x270 ip6_output+0x78/0x15c NF_HOOK.constprop.0+0x50/0xd0 mld_sendpack+0x1bc/0x320 mld_ifc_work+0x1d8/0x4dc process_one_work+0x1e8/0x460 worker_thread+0x178/0x534 kthread+0xe0/0xe4 ret_from_fork+0x10/0x20 Code: d503201f f9416800 d503233f d50323bf (f9404c00) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt This appears to be reproducible when the board has a fixed IP address, is ping flooded from another host, and "reboot -f" is used. The following is one more manifestation of the issue: $ reboot -f kvm: exiting hardware virtualization cfg80211: failed to load regulatory.db arm-smmu 5000000.iommu: disabling translation sdhci-esdhc 2140000.mmc: Removing from iommu group 11 sdhci-esdhc 2150000.mmc: Removing from iommu group 12 fsl-edma 22c0000.dma-controller: Removing from iommu group 17 dwc3 3100000.usb: Removing from iommu group 9 dwc3 3110000.usb: Removing from iommu group 10 ahci-qoriq 3200000.sata: Removing from iommu group 2 fsl-qdma 8380000.dma-controller: Removing from iommu group 20 platform f080000.display: Removing from iommu group 0 etnaviv-gpu f0c0000.gpu: Removing from iommu group 1 etnaviv etnaviv: Removing from iommu group 1 caam_jr 8010000.jr: Removing from iommu group 13 caam_jr 8020000.jr: Removing from iommu group 14 caam_jr 8030000.jr: Removing from iommu group 15 caam_jr 8040000.jr: Removing from iommu group 16 fsl_enetc 0000:00:00.0: Removing from iommu group 4 arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications arm-smmu 5000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000002, GFSYNR1 0x00000429, GFSYNR2 0x00000000 fsl_enetc 0000:00:00.1: Removing from iommu group 5 arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications arm-smmu 5000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000002, GFSYNR1 0x00000429, GFSYNR2 0x00000000 arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications arm-smmu 5000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000429, GFSYNR2 0x00000000 fsl_enetc 0000:00:00.2: Removing from iommu group 6 fsl_enetc_mdio 0000:00:00.3: Removing from iommu group 8 mscc_felix 0000:00:00.5: Removing from iommu group 3 fsl_enetc 0000:00:00.6: Removing from iommu group 7 pcieport 0001:00:00.0: Removing from iommu group 18 arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications arm-smmu 5000000.iommu: GFSR 0x00000002, GFSYNR0 0x00000000, GFSYNR1 0x00000429, GFSYNR2 0x00000000 pcieport 0002:00:00.0: Removing from iommu group 19 Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a8 pc : iommu_get_dma_domain+0x14/0x20 lr : iommu_dma_unmap_page+0x38/0xe0 Call trace: iommu_get_dma_domain+0x14/0x20 dma_unmap_page_attrs+0x38/0x1d0 enetc_unmap_tx_buff.isra.0+0x6c/0x80 enetc_poll+0x170/0x910 __napi_poll+0x40/0x1e0 net_rx_action+0x164/0x37c __do_softirq+0x128/0x368 run_ksoftirqd+0x68/0x90 smpboot_thread_fn+0x14c/0x190 Code: d503201f f9416800 d503233f d50323bf (f9405400) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- The problem seems to be that iommu_group_remove_device() is allowed to run with no coordination whatsoever with the shutdown procedure of the enetc PCI device. In fact, it almost seems as if it implies that the pci_driver :: shutdown() method is mandatory if DMA is used with an IOMMU, otherwise this is inevitable. That was never the case; shutdown methods are optional in device drivers. This is the call stack that leads to iommu_group_remove_device() during reboot: kernel_restart -> device_shutdown -> platform_shutdown -> arm_smmu_device_shutdown -> arm_smmu_device_remove -> iommu_device_unregister -> bus_for_each_dev -> remove_iommu_group -> iommu_release_device -> iommu_group_remove_device I don't know much about the arm_smmu driver, but arm_smmu_device_shutdown() invoking arm_smmu_device_remove() looks suspicious, since it causes the IOMMU device to unregister and that's where everything starts to unravel. It forces all other devices which depend on IOMMU groups to also point their ->shutdown() to ->remove(), which will make reboot slower overall. There are 2 moments relevant to this behavior. First was commit b06c076ea962 ("Revert "iommu/arm-smmu: Make arm-smmu explicitly non-modular"") when arm_smmu_device_shutdown() was made to run the exact same thing as arm_smmu_device_remove(). Prior to that, there was no iommu_device_unregister() call in arm_smmu_device_shutdown(). However, that was benign until commit 57365a04c921 ("iommu: Move bus setup to IOMMU device registration"), which made iommu_device_unregister() call remove_iommu_group(). Restore the old shutdown behavior by making remove() call shutdown(), but shutdown() does not call the remove() specific bits. Fixes: 57365a04c921 ("iommu: Move bus setup to IOMMU device registration") Reported-by: Michael Walle Tested-by: Michael Walle # on kontron-sl28 Signed-off-by: Vladimir Oltean Link: https://lore.kernel.org/r/20221215141251.3688780-1-vladimir.oltean@nxp.com Signed-off-by: Will Deacon --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 443bc517277ba..2ff7a72cf3772 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -2191,19 +2191,16 @@ static int arm_smmu_device_probe(struct platform_device *pdev) return 0; } -static int arm_smmu_device_remove(struct platform_device *pdev) +static void arm_smmu_device_shutdown(struct platform_device *pdev) { struct arm_smmu_device *smmu = platform_get_drvdata(pdev); if (!smmu) - return -ENODEV; + return; if (!bitmap_empty(smmu->context_map, ARM_SMMU_MAX_CBS)) dev_notice(&pdev->dev, "disabling translation\n"); - iommu_device_unregister(&smmu->iommu); - iommu_device_sysfs_remove(&smmu->iommu); - arm_smmu_rpm_get(smmu); /* Turn the thing off */ arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sCR0, ARM_SMMU_sCR0_CLIENTPD); @@ -2215,12 +2212,21 @@ static int arm_smmu_device_remove(struct platform_device *pdev) clk_bulk_disable(smmu->num_clks, smmu->clks); clk_bulk_unprepare(smmu->num_clks, smmu->clks); - return 0; } -static void arm_smmu_device_shutdown(struct platform_device *pdev) +static int arm_smmu_device_remove(struct platform_device *pdev) { - arm_smmu_device_remove(pdev); + struct arm_smmu_device *smmu = platform_get_drvdata(pdev); + + if (!smmu) + return -ENODEV; + + iommu_device_unregister(&smmu->iommu); + iommu_device_sysfs_remove(&smmu->iommu); + + arm_smmu_device_shutdown(pdev); + + return 0; } static int __maybe_unused arm_smmu_runtime_resume(struct device *dev) From 32ea2c57dc216b6ad8125fa680d31daa5d421c95 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 15 Dec 2022 16:12:51 +0200 Subject: [PATCH 3/6] iommu/arm-smmu-v3: Don't unregister on shutdown Similar to SMMUv2, this driver calls iommu_device_unregister() from the shutdown path, which removes the IOMMU groups with no coordination whatsoever with their users - shutdown methods are optional in device drivers. This can lead to NULL pointer dereferences in those drivers' DMA API calls, or worse. Instead of calling the full arm_smmu_device_remove() from arm_smmu_device_shutdown(), let's pick only the relevant function call - arm_smmu_device_disable() - more or less the reverse of arm_smmu_device_reset() - and call just that from the shutdown path. Fixes: 57365a04c921 ("iommu: Move bus setup to IOMMU device registration") Suggested-by: Robin Murphy Signed-off-by: Vladimir Oltean Link: https://lore.kernel.org/r/20221215141251.3688780-2-vladimir.oltean@nxp.com Signed-off-by: Will Deacon --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index ab160198edd6b..f2425b0f0cd62 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3858,7 +3858,9 @@ static int arm_smmu_device_remove(struct platform_device *pdev) static void arm_smmu_device_shutdown(struct platform_device *pdev) { - arm_smmu_device_remove(pdev); + struct arm_smmu_device *smmu = platform_get_drvdata(pdev); + + arm_smmu_device_disable(smmu); } static const struct of_device_id arm_smmu_of_match[] = { From a6a9a5da68084d5eac6ef85ad09df7fc9c971de4 Mon Sep 17 00:00:00 2001 From: Miaoqian Lin Date: Fri, 30 Dec 2022 12:31:00 +0400 Subject: [PATCH 4/6] iommu: Fix refcount leak in iommu_device_claim_dma_owner iommu_group_get() returns the group with the reference incremented. Move iommu_group_get() after owner check to fix the refcount leak. Fixes: 89395ccedbc1 ("iommu: Add device-centric DMA ownership interfaces") Signed-off-by: Miaoqian Lin Reviewed-by: Lu Baolu Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20221230083100.1489569-1-linmq006@gmail.com [ joro: Remove *group = NULL initialization ] Signed-off-by: Joerg Roedel --- drivers/iommu/iommu.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index de91dd88705bd..5f6a85aea501e 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -3185,14 +3185,16 @@ EXPORT_SYMBOL_GPL(iommu_group_claim_dma_owner); */ int iommu_device_claim_dma_owner(struct device *dev, void *owner) { - struct iommu_group *group = iommu_group_get(dev); + struct iommu_group *group; int ret = 0; - if (!group) - return -ENODEV; if (WARN_ON(!owner)) return -EINVAL; + group = iommu_group_get(dev); + if (!group) + return -ENODEV; + mutex_lock(&group->mutex); if (group->owner_cnt) { if (group->owner != owner) { From dcdb3ba7e2a8caae7bfefd603bc22fd0ce9a389c Mon Sep 17 00:00:00 2001 From: Yunfei Wang Date: Wed, 11 Jan 2023 14:38:00 +0800 Subject: [PATCH 5/6] iommu/iova: Fix alloc iova overflows issue In __alloc_and_insert_iova_range, there is an issue that retry_pfn overflows. The value of iovad->anchor.pfn_hi is ~0UL, then when iovad->cached_node is iovad->anchor, curr_iova->pfn_hi + 1 will overflow. As a result, if the retry logic is executed, low_pfn is updated to 0, and then new_pfn < low_pfn returns false to make the allocation successful. This issue occurs in the following two situations: 1. The first iova size exceeds the domain size. When initializing iova domain, iovad->cached_node is assigned as iovad->anchor. For example, the iova domain size is 10M, start_pfn is 0x1_F000_0000, and the iova size allocated for the first time is 11M. The following is the log information, new->pfn_lo is smaller than iovad->cached_node. Example log as follows: [ 223.798112][T1705487] sh: [name:iova&]__alloc_and_insert_iova_range start_pfn:0x1f0000,retry_pfn:0x0,size:0xb00,limit_pfn:0x1f0a00 [ 223.799590][T1705487] sh: [name:iova&]__alloc_and_insert_iova_range success start_pfn:0x1f0000,new->pfn_lo:0x1efe00,new->pfn_hi:0x1f08ff 2. The node with the largest iova->pfn_lo value in the iova domain is deleted, iovad->cached_node will be updated to iovad->anchor, and then the alloc iova size exceeds the maximum iova size that can be allocated in the domain. After judging that retry_pfn is less than limit_pfn, call retry_pfn+1 to fix the overflow issue. Signed-off-by: jianjiao zeng Signed-off-by: Yunfei Wang Cc: # 5.15.* Fixes: 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search fails") Acked-by: Robin Murphy Link: https://lore.kernel.org/r/20230111063801.25107-1-yf.wang@mediatek.com Signed-off-by: Joerg Roedel --- drivers/iommu/iova.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index a44ad92fc5eb7..fe452ce466429 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -197,7 +197,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad, curr = __get_cached_rbnode(iovad, limit_pfn); curr_iova = to_iova(curr); - retry_pfn = curr_iova->pfn_hi + 1; + retry_pfn = curr_iova->pfn_hi; retry: do { @@ -211,7 +211,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad, if (high_pfn < size || new_pfn < low_pfn) { if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) { high_pfn = limit_pfn; - low_pfn = retry_pfn; + low_pfn = retry_pfn + 1; curr = iova_find_limit(iovad, limit_pfn); curr_iova = to_iova(curr); goto retry; From 142e821f68cf5da79ce722cb9c1323afae30e185 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Mon, 19 Dec 2022 19:06:22 +0100 Subject: [PATCH 6/6] iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe() A clk, prepared and enabled in mtk_iommu_v1_hw_init(), is not released in the error handling path of mtk_iommu_v1_probe(). Add the corresponding clk_disable_unprepare(), as already done in the remove function. Fixes: b17336c55d89 ("iommu/mediatek: add support for mtk iommu generation one HW") Signed-off-by: Christophe JAILLET Reviewed-by: Yong Wu Reviewed-by: AngeloGioacchino Del Regno Reviewed-by: Matthias Brugger Link: https://lore.kernel.org/r/593e7b7d97c6e064b29716b091a9d4fd122241fb.1671473163.git.christophe.jaillet@wanadoo.fr Signed-off-by: Joerg Roedel --- drivers/iommu/mtk_iommu_v1.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index 69682ee068d2b..ca581ff1c7696 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -683,7 +683,7 @@ static int mtk_iommu_v1_probe(struct platform_device *pdev) ret = iommu_device_sysfs_add(&data->iommu, &pdev->dev, NULL, dev_name(&pdev->dev)); if (ret) - return ret; + goto out_clk_unprepare; ret = iommu_device_register(&data->iommu, &mtk_iommu_v1_ops, dev); if (ret) @@ -698,6 +698,8 @@ static int mtk_iommu_v1_probe(struct platform_device *pdev) iommu_device_unregister(&data->iommu); out_sysfs_remove: iommu_device_sysfs_remove(&data->iommu); +out_clk_unprepare: + clk_disable_unprepare(data->bclk); return ret; }