Skip to content

Suggestion for MPI branch and tag names #1

Closed
donald opened this issue Feb 25, 2022 · 0 comments
Closed

Suggestion for MPI branch and tag names #1

donald opened this issue Feb 25, 2022 · 0 comments

Comments

@donald
Copy link
Contributor

donald commented Feb 25, 2022

  • From Linus we have the branch master and tags like v5.10 , v5.17-rc5.
  • From Greg we have branches like linux-5.10 and tags like v5.10.93

We want branches on top of the some of these specific commits identified by their tags. These branches should include that tagged upstream version and our mpi specific patches. So I've made the branches linux-5.17-rc5-mpi and linux-5.10.93-mpi as examples. We use branches (and not tags) here, because we might want to evolve our mpi-patches on top of a specific upstream version.

To reference a specific version on our branches, we also need tags here. These are called (for example) v5.17-rc5-mpi.0 or v5.10.93-mpi-0. If we ever change our patches on top of v5.10.93, we could call it v5.10.93-mpi.1. These versions can be referenced from a bee file.

I've also included the defconfig into this repository (as one mpi-specific commit) with the filename config-mpi. So a mpi-tag references a source tree with a included kernel configuration. We can evolve the configuration in config-mpi file in this repository and no longer need to include it in the bee file.

donald pushed a commit that referenced this issue Feb 28, 2022
commit 0764db9 upstream.

Alexander reported a circular lock dependency revealed by the mmap1 ltp
test:

  LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
          WARNING: possible circular locking dependency detected
          5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
          ------------------------------------------------------
          mmap1/202299 is trying to acquire lock:
          00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
          but task is already holding lock:
          00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
          which lock already depends on the new lock.
          the existing dependency chain (in reverse order) is:
          -> #1 (&sighand->siglock){-.-.}-{2:2}:
                 __lock_acquire+0x604/0xbd8
                 lock_acquire.part.0+0xe2/0x238
                 lock_acquire+0xb0/0x200
                 _raw_spin_lock_irqsave+0x6a/0xd8
                 __lock_task_sighand+0x90/0x190
                 cgroup_freeze_task+0x2e/0x90
                 cgroup_migrate_execute+0x11c/0x608
                 cgroup_update_dfl_csses+0x246/0x270
                 cgroup_subtree_control_write+0x238/0x518
                 kernfs_fop_write_iter+0x13e/0x1e0
                 new_sync_write+0x100/0x190
                 vfs_write+0x22c/0x2d8
                 ksys_write+0x6c/0xf8
                 __do_syscall+0x1da/0x208
                 system_call+0x82/0xb0
          -> #0 (css_set_lock){..-.}-{2:2}:
                 check_prev_add+0xe0/0xed8
                 validate_chain+0x736/0xb20
                 __lock_acquire+0x604/0xbd8
                 lock_acquire.part.0+0xe2/0x238
                 lock_acquire+0xb0/0x200
                 _raw_spin_lock_irqsave+0x6a/0xd8
                 obj_cgroup_release+0x4a/0xe0
                 percpu_ref_put_many.constprop.0+0x150/0x168
                 drain_obj_stock+0x94/0xe8
                 refill_obj_stock+0x94/0x278
                 obj_cgroup_charge+0x164/0x1d8
                 kmem_cache_alloc+0xac/0x528
                 __sigqueue_alloc+0x150/0x308
                 __send_signal+0x260/0x550
                 send_signal+0x7e/0x348
                 force_sig_info_to_task+0x104/0x180
                 force_sig_fault+0x48/0x58
                 __do_pgm_check+0x120/0x1f0
                 pgm_check_handler+0x11e/0x180
          other info that might help us debug this:
           Possible unsafe locking scenario:
                 CPU0                    CPU1
                 ----                    ----
            lock(&sighand->siglock);
                                         lock(css_set_lock);
                                         lock(&sighand->siglock);
            lock(css_set_lock);
           *** DEADLOCK ***
          2 locks held by mmap1/202299:
           #0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
           #1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
          stack backtrace:
          CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
          Hardware name: IBM 3906 M04 704 (LPAR)
          Call Trace:
            dump_stack_lvl+0x76/0x98
            check_noncircular+0x136/0x158
            check_prev_add+0xe0/0xed8
            validate_chain+0x736/0xb20
            __lock_acquire+0x604/0xbd8
            lock_acquire.part.0+0xe2/0x238
            lock_acquire+0xb0/0x200
            _raw_spin_lock_irqsave+0x6a/0xd8
            obj_cgroup_release+0x4a/0xe0
            percpu_ref_put_many.constprop.0+0x150/0x168
            drain_obj_stock+0x94/0xe8
            refill_obj_stock+0x94/0x278
            obj_cgroup_charge+0x164/0x1d8
            kmem_cache_alloc+0xac/0x528
            __sigqueue_alloc+0x150/0x308
            __send_signal+0x260/0x550
            send_signal+0x7e/0x348
            force_sig_info_to_task+0x104/0x180
            force_sig_fault+0x48/0x58
            __do_pgm_check+0x120/0x1f0
            pgm_check_handler+0x11e/0x180
          INFO: lockdep is turned off.

In this example a slab allocation from __send_signal() caused a
refilling and draining of a percpu objcg stock, resulted in a releasing
of another non-related objcg.  Objcg release path requires taking the
css_set_lock, which is used to synchronize objcg lists.

This can create a circular dependency with the sighandler lock, which is
taken with the locked css_set_lock by the freezer code (to freeze a
task).

In general it seems that using css_set_lock to synchronize objcg lists
makes any slab allocations and deallocation with the locked css_set_lock
and any intervened locks risky.

To fix the problem and make the code more robust let's stop using
css_set_lock to synchronize objcg lists and use a new dedicated spinlock
instead.

Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
Fixes: bf4f059 ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Feb 28, 2022
commit 24d7275 upstream.

The syzbot reported the below BUG:

  kernel BUG at include/linux/page-flags.h:785!
  invalid opcode: 0000 [#1] PREEMPT SMP KASAN
  CPU: 1 PID: 4392 Comm: syz-executor560 Not tainted 5.16.0-rc6-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:PageDoubleMap include/linux/page-flags.h:785 [inline]
  RIP: 0010:__page_mapcount+0x2d2/0x350 mm/util.c:744
  Call Trace:
    page_mapcount include/linux/mm.h:837 [inline]
    smaps_account+0x470/0xb10 fs/proc/task_mmu.c:466
    smaps_pte_entry fs/proc/task_mmu.c:538 [inline]
    smaps_pte_range+0x611/0x1250 fs/proc/task_mmu.c:601
    walk_pmd_range mm/pagewalk.c:128 [inline]
    walk_pud_range mm/pagewalk.c:205 [inline]
    walk_p4d_range mm/pagewalk.c:240 [inline]
    walk_pgd_range mm/pagewalk.c:277 [inline]
    __walk_page_range+0xe23/0x1ea0 mm/pagewalk.c:379
    walk_page_vma+0x277/0x350 mm/pagewalk.c:530
    smap_gather_stats.part.0+0x148/0x260 fs/proc/task_mmu.c:768
    smap_gather_stats fs/proc/task_mmu.c:741 [inline]
    show_smap+0xc6/0x440 fs/proc/task_mmu.c:822
    seq_read_iter+0xbb0/0x1240 fs/seq_file.c:272
    seq_read+0x3e0/0x5b0 fs/seq_file.c:162
    vfs_read+0x1b5/0x600 fs/read_write.c:479
    ksys_read+0x12d/0x250 fs/read_write.c:619
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x44/0xae

The reproducer was trying to read /proc/$PID/smaps when calling
MADV_FREE at the mean time.  MADV_FREE may split THPs if it is called
for partial THP.  It may trigger the below race:

           CPU A                         CPU B
           -----                         -----
  smaps walk:                      MADV_FREE:
  page_mapcount()
    PageCompound()
                                   split_huge_page()
    page = compound_head(page)
    PageDoubleMap(page)

When calling PageDoubleMap() this page is not a tail page of THP anymore
so the BUG is triggered.

This could be fixed by elevated refcount of the page before calling
mapcount, but that would prevent it from counting migration entries, and
it seems overkilling because the race just could happen when PMD is
split so all PTE entries of tail pages are actually migration entries,
and smaps_account() does treat migration entries as mapcount == 1 as
Kirill pointed out.

Add a new parameter for smaps_account() to tell this entry is migration
entry then skip calling page_mapcount().  Don't skip getting mapcount
for device private entries since they do track references with mapcount.

Pagemap also has the similar issue although it was not reported.  Fixed
it as well.

[shy828301@gmail.com: v4]
  Link: https://lkml.kernel.org/r/20220203182641.824731-1-shy828301@gmail.com
[nathan@kernel.org: avoid unused variable warning in pagemap_pmd_range()]
  Link: https://lkml.kernel.org/r/20220207171049.1102239-1-nathan@kernel.org
Link: https://lkml.kernel.org/r/20220120202805.3369-1-shy828301@gmail.com
Fixes: e9b61f1 ("thp: reintroduce split_huge_page()")
Signed-off-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: syzbot+1f52b3a18d5633fa7f82@syzkaller.appspotmail.com
Acked-by: David Hildenbrand <david@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Feb 28, 2022
[ Upstream commit 17da2d5 ]

As reported:

[  256.104522] ======================================================
[  256.113783] WARNING: possible circular locking dependency detected
[  256.120093] 5.16.0-rc6-yocto-standard+ #99 Not tainted
[  256.125362] ------------------------------------------------------
[  256.131673] intel-speed-sel/844 is trying to acquire lock:
[  256.137290] ffffffffc036f0d0 (punit_misc_dev_lock){+.+.}-{3:3}, at: isst_if_open+0x18/0x90 [isst_if_common]
[  256.147171]
[  256.147171] but task is already holding lock:
[  256.153135] ffffffff8ee7cb50 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x2a/0x170
[  256.160407]
[  256.160407] which lock already depends on the new lock.
[  256.160407]
[  256.168712]
[  256.168712] the existing dependency chain (in reverse order) is:
[  256.176327]
[  256.176327] -> #1 (misc_mtx){+.+.}-{3:3}:
[  256.181946]        lock_acquire+0x1e6/0x330
[  256.186265]        __mutex_lock+0x9b/0x9b0
[  256.190497]        mutex_lock_nested+0x1b/0x20
[  256.195075]        misc_register+0x32/0x1a0
[  256.199390]        isst_if_cdev_register+0x65/0x180 [isst_if_common]
[  256.205878]        isst_if_probe+0x144/0x16e [isst_if_mmio]
...
[  256.241976]
[  256.241976] -> #0 (punit_misc_dev_lock){+.+.}-{3:3}:
[  256.248552]        validate_chain+0xbc6/0x1750
[  256.253131]        __lock_acquire+0x88c/0xc10
[  256.257618]        lock_acquire+0x1e6/0x330
[  256.261933]        __mutex_lock+0x9b/0x9b0
[  256.266165]        mutex_lock_nested+0x1b/0x20
[  256.270739]        isst_if_open+0x18/0x90 [isst_if_common]
[  256.276356]        misc_open+0x100/0x170
[  256.280409]        chrdev_open+0xa5/0x1e0
...

The call sequence suggested that misc_device /dev file can be opened
before misc device is yet to be registered, which is done only once.

Here punit_misc_dev_lock was used as common lock, to protect the
registration by multiple ISST HW drivers, one time setup, prevent
duplicate registry of misc device and prevent load/unload when device
is open.

We can split into locks:
- One which just prevent duplicate call to misc_register() and one
time setup. Also never call again if the misc_register() failed or
required one time setup is failed. This lock is not shared with
any misc device callbacks.

- The other lock protects registry, load and unload of HW drivers.

Sequence in isst_if_cdev_register()
- Register callbacks under punit_misc_dev_open_lock
- Call isst_misc_reg() which registers misc_device on the first
registry which is under punit_misc_dev_reg_lock, which is not
shared with callbacks.

Sequence in isst_if_cdev_unregister
Just opposite of isst_if_cdev_register

Reported-and-tested-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20220112022521.54669-1-srinivas.pandruvada@linux.intel.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Feb 28, 2022
…ction

commit ddc204b upstream.

I was made aware of the following lockdep splat:

[ 2516.308763] =====================================================
[ 2516.309085] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
[ 2516.309433] 5.14.0-51.el9.aarch64+debug #1 Not tainted
[ 2516.309703] -----------------------------------------------------
[ 2516.310149] stress-ng/153663 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 2516.310512] ffff0000e422b198 (&newf->file_lock){+.+.}-{2:2}, at: fd_install+0x368/0x4f0
[ 2516.310944]
               and this task is already holding:
[ 2516.311248] ffff0000c08140d8 (&sighand->siglock){-.-.}-{2:2}, at: copy_process+0x1e2c/0x3e80
[ 2516.311804] which would create a new lock dependency:
[ 2516.312066]  (&sighand->siglock){-.-.}-{2:2} -> (&newf->file_lock){+.+.}-{2:2}
[ 2516.312446]
               but this new dependency connects a HARDIRQ-irq-safe lock:
[ 2516.312983]  (&sighand->siglock){-.-.}-{2:2}
   :
[ 2516.330700]  Possible interrupt unsafe locking scenario:

[ 2516.331075]        CPU0                    CPU1
[ 2516.331328]        ----                    ----
[ 2516.331580]   lock(&newf->file_lock);
[ 2516.331790]                                local_irq_disable();
[ 2516.332231]                                lock(&sighand->siglock);
[ 2516.332579]                                lock(&newf->file_lock);
[ 2516.332922]   <Interrupt>
[ 2516.333069]     lock(&sighand->siglock);
[ 2516.333291]
                *** DEADLOCK ***
[ 2516.389845]
               stack backtrace:
[ 2516.390101] CPU: 3 PID: 153663 Comm: stress-ng Kdump: loaded Not tainted 5.14.0-51.el9.aarch64+debug #1
[ 2516.390756] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 2516.391155] Call trace:
[ 2516.391302]  dump_backtrace+0x0/0x3e0
[ 2516.391518]  show_stack+0x24/0x30
[ 2516.391717]  dump_stack_lvl+0x9c/0xd8
[ 2516.391938]  dump_stack+0x1c/0x38
[ 2516.392247]  print_bad_irq_dependency+0x620/0x710
[ 2516.392525]  check_irq_usage+0x4fc/0x86c
[ 2516.392756]  check_prev_add+0x180/0x1d90
[ 2516.392988]  validate_chain+0x8e0/0xee0
[ 2516.393215]  __lock_acquire+0x97c/0x1e40
[ 2516.393449]  lock_acquire.part.0+0x240/0x570
[ 2516.393814]  lock_acquire+0x90/0xb4
[ 2516.394021]  _raw_spin_lock+0xe8/0x154
[ 2516.394244]  fd_install+0x368/0x4f0
[ 2516.394451]  copy_process+0x1f5c/0x3e80
[ 2516.394678]  kernel_clone+0x134/0x660
[ 2516.394895]  __do_sys_clone3+0x130/0x1f4
[ 2516.395128]  __arm64_sys_clone3+0x5c/0x7c
[ 2516.395478]  invoke_syscall.constprop.0+0x78/0x1f0
[ 2516.395762]  el0_svc_common.constprop.0+0x22c/0x2c4
[ 2516.396050]  do_el0_svc+0xb0/0x10c
[ 2516.396252]  el0_svc+0x24/0x34
[ 2516.396436]  el0t_64_sync_handler+0xa4/0x12c
[ 2516.396688]  el0t_64_sync+0x198/0x19c
[ 2517.491197] NET: Registered PF_ATMPVC protocol family
[ 2517.491524] NET: Registered PF_ATMSVC protocol family
[ 2591.991877] sched: RT throttling activated

One way to solve this problem is to move the fd_install() call out of
the sighand->siglock critical section.

Before commit 6fd2fe4 ("copy_process(): don't use ksys_close()
on cleanups"), the pidfd installation was done without holding both
the task_list lock and the sighand->siglock. Obviously, holding these
two locks are not really needed to protect the fd_install() call.
So move the fd_install() call down to after the releases of both locks.

Link: https://lore.kernel.org/r/20220208163912.1084752-1-longman@redhat.com
Fixes: 6fd2fe4 ("copy_process(): don't use ksys_close() on cleanups")
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Feb 28, 2022
Compressed length can be corrupted to be a lot larger than memory
we have allocated for buffer.
This will cause memcpy in copy_compressed_segment to write outside
of allocated memory.

This mostly results in stuck read syscall but sometimes when using
btrfs send can get #GP

  kernel: general protection fault, probably for non-canonical address 0x841551d5c1000: 0000 [#1] PREEMPT SMP NOPTI
  kernel: CPU: 17 PID: 264 Comm: kworker/u256:7 Tainted: P           OE     5.17.0-rc2-1 #12
  kernel: Workqueue: btrfs-endio btrfs_work_helper [btrfs]
  kernel: RIP: 0010:lzo_decompress_bio (./include/linux/fortify-string.h:225 fs/btrfs/lzo.c:322 fs/btrfs/lzo.c:394) btrfs
  Code starting with the faulting instruction
  ===========================================
     0:*  48 8b 06                mov    (%rsi),%rax              <-- trapping instruction
     3:   48 8d 79 08             lea    0x8(%rcx),%rdi
     7:   48 83 e7 f8             and    $0xfffffffffffffff8,%rdi
     b:   48 89 01                mov    %rax,(%rcx)
     e:   44 89 f0                mov    %r14d,%eax
    11:   48 8b 54 06 f8          mov    -0x8(%rsi,%rax,1),%rdx
  kernel: RSP: 0018:ffffb110812efd50 EFLAGS: 00010212
  kernel: RAX: 0000000000001000 RBX: 000000009ca264c8 RCX: ffff98996e6d8ff8
  kernel: RDX: 0000000000000064 RSI: 000841551d5c1000 RDI: ffffffff9500435d
  kernel: RBP: ffff989a3be856c0 R08: 0000000000000000 R09: 0000000000000000
  kernel: R10: 0000000000000000 R11: 0000000000001000 R12: ffff98996e6d8000
  kernel: R13: 0000000000000008 R14: 0000000000001000 R15: 000841551d5c1000
  kernel: FS:  0000000000000000(0000) GS:ffff98a09d640000(0000) knlGS:0000000000000000
  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  kernel: CR2: 00001e9f984d9ea8 CR3: 000000014971a000 CR4: 00000000003506e0
  kernel: Call Trace:
  kernel:  <TASK>
  kernel: end_compressed_bio_read (fs/btrfs/compression.c:104 fs/btrfs/compression.c:1363 fs/btrfs/compression.c:323) btrfs
  kernel: end_workqueue_fn (fs/btrfs/disk-io.c:1923) btrfs
  kernel: btrfs_work_helper (fs/btrfs/async-thread.c:326) btrfs
  kernel: process_one_work (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:212 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2312)
  kernel: worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2455)
  kernel: ? process_one_work (kernel/workqueue.c:2397)
  kernel: kthread (kernel/kthread.c:377)
  kernel: ? kthread_complete_and_exit (kernel/kthread.c:332)
  kernel: ret_from_fork (arch/x86/entry/entry_64.S:301)
  kernel:  </TASK>

CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Dāvis Mosāns <davispuh@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
donald pushed a commit that referenced this issue Feb 28, 2022
The trace_hardirqs_{on,off}() require the caller to setup frame pointer
properly. This because these two functions use macro 'CALLER_ADDR1' (aka.
__builtin_return_address(1)) to acquire caller info. If the $fp is used
for other purpose, the code generated this macro (as below) could trigger
memory access fault.

   0xffffffff8011510e <+80>:    ld      a1,-16(s0)
   0xffffffff80115112 <+84>:    ld      s2,-8(a1)  # <-- paging fault here

The oops message during booting if compiled with 'irqoff' tracer enabled:
[    0.039615][    T0] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000f8
[    0.041925][    T0] Oops [#1]
[    0.042063][    T0] Modules linked in:
[    0.042864][    T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-rc1-00233-g9a20c48d1ed2 #29
[    0.043568][    T0] Hardware name: riscv-virtio,qemu (DT)
[    0.044343][    T0] epc : trace_hardirqs_on+0x56/0xe2
[    0.044601][    T0]  ra : restore_all+0x12/0x6e
[    0.044721][    T0] epc : ffffffff80126a5c ra : ffffffff80003b94 sp : ffffffff81403db0
[    0.044801][    T0]  gp : ffffffff8163acd8 tp : ffffffff81414880 t0 : 0000000000000020
[    0.044882][    T0]  t1 : 0098968000000000 t2 : 0000000000000000 s0 : ffffffff81403de0
[    0.044967][    T0]  s1 : 0000000000000000 a0 : 0000000000000001 a1 : 0000000000000100
[    0.045046][    T0]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
[    0.045124][    T0]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000054494d45
[    0.045210][    T0]  s2 : ffffffff80003b94 s3 : ffffffff81a8f1b0 s4 : ffffffff80e27b50
[    0.045289][    T0]  s5 : ffffffff81414880 s6 : ffffffff8160fa00 s7 : 00000000800120e8
[    0.045389][    T0]  s8 : 0000000080013100 s9 : 000000000000007f s10: 0000000000000000
[    0.045474][    T0]  s11: 0000000000000000 t3 : 7fffffffffffffff t4 : 0000000000000000
[    0.045548][    T0]  t5 : 0000000000000000 t6 : ffffffff814aa368
[    0.045620][    T0] status: 0000000200000100 badaddr: 00000000000000f8 cause: 000000000000000d
[    0.046402][    T0] [<ffffffff80003b94>] restore_all+0x12/0x6e

This because the $fp(aka. $s0) register is not used as frame pointer in the
assembly entry code.

	resume_kernel:
		REG_L s0, TASK_TI_PREEMPT_COUNT(tp)
		bnez s0, restore_all
		REG_L s0, TASK_TI_FLAGS(tp)
                andi s0, s0, _TIF_NEED_RESCHED
                beqz s0, restore_all
                call preempt_schedule_irq
                j restore_all

To fix above issue, here we add one extra level wrapper for function
trace_hardirqs_{on,off}() so they can be safely called by low level entry
code.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Fixes: 3c46979 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
@donald donald closed this as completed in a8cd285 Mar 3, 2022
donald pushed a commit that referenced this issue Mar 3, 2022
commit 0c98294 upstream.

while mounting the crafted image, out-of-bounds access happens:

  [350.429619] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1
  [350.429636] index 1048096 is out of range for type 'page *[16]'
  [350.429650] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4 #1
  [350.429652] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
  [350.429653] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs]
  [350.429772] Call Trace:
  [350.429774]  <TASK>
  [350.429776]  dump_stack_lvl+0x47/0x5c
  [350.429780]  ubsan_epilogue+0x5/0x50
  [350.429786]  __ubsan_handle_out_of_bounds+0x66/0x70
  [350.429791]  btrfs_get_16+0xfd/0x120 [btrfs]
  [350.429832]  check_leaf+0x754/0x1a40 [btrfs]
  [350.429874]  ? filemap_read+0x34a/0x390
  [350.429878]  ? load_balance+0x175/0xfc0
  [350.429881]  validate_extent_buffer+0x244/0x310 [btrfs]
  [350.429911]  btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs]
  [350.429935]  end_bio_extent_readpage+0x3af/0x850 [btrfs]
  [350.429969]  ? newidle_balance+0x259/0x480
  [350.429972]  end_workqueue_fn+0x29/0x40 [btrfs]
  [350.429995]  btrfs_work_helper+0x71/0x330 [btrfs]
  [350.430030]  ? __schedule+0x2fb/0xa40
  [350.430033]  process_one_work+0x1f6/0x400
  [350.430035]  ? process_one_work+0x400/0x400
  [350.430036]  worker_thread+0x2d/0x3d0
  [350.430037]  ? process_one_work+0x400/0x400
  [350.430038]  kthread+0x165/0x190
  [350.430041]  ? set_kthread_struct+0x40/0x40
  [350.430043]  ret_from_fork+0x1f/0x30
  [350.430047]  </TASK>
  [350.430077] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2

check_leaf() is checking the leaf:

  corrupt leaf: root=4 block=29396992 slot=1, bad key order, prev (16140901064495857664 1 0) current (1 204 12582912)
  leaf 29396992 items 6 free space 3565 generation 6 owner DEV_TREE
  leaf 29396992 flags 0x1(WRITTEN) backref revision 1
  fs uuid a62e00e8-e94e-4200-8217-12444de93c2e
  chunk uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8
	  item 0 key (16140901064495857664 INODE_ITEM 0) itemoff 3955 itemsize 40
		  generation 0 transid 0 size 0 nbytes 17592186044416
		  block group 0 mode 52667 links 33 uid 0 gid 2104132511 rdev 94223634821136
		  sequence 100305 flags 0x2409000(none)
		  atime 0.0 (1970-01-01 08:00:00)
		  ctime 2973280098083405823.4294967295 (-269783007-01-01 21:37:03)
		  mtime 18446744071572723616.4026825121 (1902-04-16 12:40:00)
		  otime 9249929404488876031.4294967295 (622322949-04-16 04:25:58)
	  item 1 key (1 DEV_EXTENT 12582912) itemoff 3907 itemsize 48
		  dev extent chunk_tree 3
		  chunk_objectid 256 chunk_offset 12582912 length 8388608
		  chunk_tree_uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8

The corrupted leaf of device tree has an inode item. The leaf passed
checksum and others checks in validate_extent_buffer until check_leaf_item().
Because of the key type BTRFS_INODE_ITEM, check_inode_item() is called even we
are in the device tree. Since the
item offset + sizeof(struct btrfs_inode_item) > eb->len, out-of-bounds access
is triggered.

The item end vs leaf boundary check has been done before
check_leaf_item(), so fix it by checking item size in check_inode_item()
before access of the inode item in extent buffer.

Other check functions except check_dev_item() in check_leaf_item()
have their item size checks.
The commit for check_dev_item() is followed.

No regression observed during running fstests.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
CC: stable@vger.kernel.org # 5.10+
CC: Wenqing Liu <wenqingliu0120@gmail.com>
Signed-off-by: Su Yue <l@damenly.su>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Mar 3, 2022
commit 1b5f517 upstream.

If an attempt is made to a sensor with a thermal zone and it fails,
the call to devm_thermal_zone_of_sensor_register() may return -ENODEV.
This may result in crashes similar to the following.

Unable to handle kernel NULL pointer dereference at virtual address 00000000000003cd
...
Internal error: Oops: 96000021 [#1] PREEMPT SMP
...
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mutex_lock+0x18/0x60
lr : thermal_zone_device_update+0x40/0x2e0
sp : ffff800014c4fc60
x29: ffff800014c4fc60 x28: ffff365ee3f6e000 x27: ffffdde218426790
x26: ffff365ee3f6e000 x25: 0000000000000000 x24: ffff365ee3f6e000
x23: ffffdde218426870 x22: ffff365ee3f6e000 x21: 00000000000003cd
x20: ffff365ee8bf3308 x19: ffffffffffffffed x18: 0000000000000000
x17: ffffdde21842689c x16: ffffdde1cb7a0b7c x15: 0000000000000040
x14: ffffdde21a4889a0 x13: 0000000000000228 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
x8 : 0000000001120000 x7 : 0000000000000001 x6 : 0000000000000000
x5 : 0068000878e20f07 x4 : 0000000000000000 x3 : 00000000000003cd
x2 : ffff365ee3f6e000 x1 : 0000000000000000 x0 : 00000000000003cd
Call trace:
 mutex_lock+0x18/0x60
 hwmon_notify_event+0xfc/0x110
 0xffffdde1cb7a0a90
 0xffffdde1cb7a0b7c
 irq_thread_fn+0x2c/0xa0
 irq_thread+0x134/0x240
 kthread+0x178/0x190
 ret_from_fork+0x10/0x20
Code: d503201f d503201f d2800001 aa0103e4 (c8e47c02)

Jon Hunter reports that the exact call sequence is:

hwmon_notify_event()
  --> hwmon_thermal_notify()
    --> thermal_zone_device_update()
      --> update_temperature()
        --> mutex_lock()

The hwmon core needs to handle all errors returned from calls
to devm_thermal_zone_of_sensor_register(). If the call fails
with -ENODEV, report that the sensor was not attached to a
thermal zone  but continue to register the hwmon device.

Reported-by: Jon Hunter <jonathanh@nvidia.com>
Cc: Dmitry Osipenko <digetx@gmail.com>
Fixes: 1597b37 ("hwmon: Add notification support")
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Mar 3, 2022
commit 22e2100 upstream.

The trace_hardirqs_{on,off}() require the caller to setup frame pointer
properly. This because these two functions use macro 'CALLER_ADDR1' (aka.
__builtin_return_address(1)) to acquire caller info. If the $fp is used
for other purpose, the code generated this macro (as below) could trigger
memory access fault.

   0xffffffff8011510e <+80>:    ld      a1,-16(s0)
   0xffffffff80115112 <+84>:    ld      s2,-8(a1)  # <-- paging fault here

The oops message during booting if compiled with 'irqoff' tracer enabled:
[    0.039615][    T0] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000f8
[    0.041925][    T0] Oops [#1]
[    0.042063][    T0] Modules linked in:
[    0.042864][    T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-rc1-00233-g9a20c48d1ed2 #29
[    0.043568][    T0] Hardware name: riscv-virtio,qemu (DT)
[    0.044343][    T0] epc : trace_hardirqs_on+0x56/0xe2
[    0.044601][    T0]  ra : restore_all+0x12/0x6e
[    0.044721][    T0] epc : ffffffff80126a5c ra : ffffffff80003b94 sp : ffffffff81403db0
[    0.044801][    T0]  gp : ffffffff8163acd8 tp : ffffffff81414880 t0 : 0000000000000020
[    0.044882][    T0]  t1 : 0098968000000000 t2 : 0000000000000000 s0 : ffffffff81403de0
[    0.044967][    T0]  s1 : 0000000000000000 a0 : 0000000000000001 a1 : 0000000000000100
[    0.045046][    T0]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
[    0.045124][    T0]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000054494d45
[    0.045210][    T0]  s2 : ffffffff80003b94 s3 : ffffffff81a8f1b0 s4 : ffffffff80e27b50
[    0.045289][    T0]  s5 : ffffffff81414880 s6 : ffffffff8160fa00 s7 : 00000000800120e8
[    0.045389][    T0]  s8 : 0000000080013100 s9 : 000000000000007f s10: 0000000000000000
[    0.045474][    T0]  s11: 0000000000000000 t3 : 7fffffffffffffff t4 : 0000000000000000
[    0.045548][    T0]  t5 : 0000000000000000 t6 : ffffffff814aa368
[    0.045620][    T0] status: 0000000200000100 badaddr: 00000000000000f8 cause: 000000000000000d
[    0.046402][    T0] [<ffffffff80003b94>] restore_all+0x12/0x6e

This because the $fp(aka. $s0) register is not used as frame pointer in the
assembly entry code.

	resume_kernel:
		REG_L s0, TASK_TI_PREEMPT_COUNT(tp)
		bnez s0, restore_all
		REG_L s0, TASK_TI_FLAGS(tp)
                andi s0, s0, _TIF_NEED_RESCHED
                beqz s0, restore_all
                call preempt_schedule_irq
                j restore_all

To fix above issue, here we add one extra level wrapper for function
trace_hardirqs_{on,off}() so they can be safely called by low level entry
code.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Fixes: 3c46979 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Mar 4, 2022
…arge

Syzkaller with UBSAN uncovered a scenario where a large number of
DATA_FIN retransmits caused a shift-out-of-bounds in the DATA_FIN
timeout calculation:

================================================================================
UBSAN: shift-out-of-bounds in net/mptcp/protocol.c:470:29
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 1 PID: 13059 Comm: kworker/1:0 Not tainted 5.17.0-rc2-00630-g5fbf21c90c60 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Workqueue: events mptcp_worker
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:151
 __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e lib/ubsan.c:330
 mptcp_set_datafin_timeout net/mptcp/protocol.c:470 [inline]
 __mptcp_retrans.cold+0x72/0x77 net/mptcp/protocol.c:2445
 mptcp_worker+0x58a/0xa70 net/mptcp/protocol.c:2528
 process_one_work+0x9df/0x16d0 kernel/workqueue.c:2307
 worker_thread+0x95/0xe10 kernel/workqueue.c:2454
 kthread+0x2f4/0x3b0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
================================================================================

This change limits the maximum timeout by limiting the size of the
shift, which keeps all intermediate values in-bounds.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/259
Fixes: 6477dd3 ("mptcp: Retransmit DATA_FIN")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
donald pushed a commit that referenced this issue Mar 4, 2022
xennet_destroy_queues() relies on info->netdev->real_num_tx_queues to
delete queues. Since d7dac08
("net-sysfs: update the queue counts in the unregistration path"),
unregister_netdev() indirectly sets real_num_tx_queues to 0. Those two
facts together means, that xennet_destroy_queues() called from
xennet_remove() cannot do its job, because it's called after
unregister_netdev(). This results in kfree-ing queues that are still
linked in napi, which ultimately crashes:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 0 P4D 0
    Oops: 0000 [#1] PREEMPT SMP PTI
    CPU: 1 PID: 52 Comm: xenwatch Tainted: G        W         5.16.10-1.32.fc32.qubes.x86_64+ #226
    RIP: 0010:free_netdev+0xa3/0x1a0
    Code: ff 48 89 df e8 2e e9 00 00 48 8b 43 50 48 8b 08 48 8d b8 a0 fe ff ff 48 8d a9 a0 fe ff ff 49 39 c4 75 26 eb 47 e8 ed c1 66 ff <48> 8b 85 60 01 00 00 48 8d 95 60 01 00 00 48 89 ef 48 2d 60 01 00
    RSP: 0000:ffffc90000bcfd00 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: ffff88800edad000 RCX: 0000000000000000
    RDX: 0000000000000001 RSI: ffffc90000bcfc30 RDI: 00000000ffffffff
    RBP: fffffffffffffea0 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800edad050
    R13: ffff8880065f8f88 R14: 0000000000000000 R15: ffff8880066c6680
    FS:  0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 00000000e998c006 CR4: 00000000003706e0
    Call Trace:
     <TASK>
     xennet_remove+0x13d/0x300 [xen_netfront]
     xenbus_dev_remove+0x6d/0xf0
     __device_release_driver+0x17a/0x240
     device_release_driver+0x24/0x30
     bus_remove_device+0xd8/0x140
     device_del+0x18b/0x410
     ? _raw_spin_unlock+0x16/0x30
     ? klist_iter_exit+0x14/0x20
     ? xenbus_dev_request_and_reply+0x80/0x80
     device_unregister+0x13/0x60
     xenbus_dev_changed+0x18e/0x1f0
     xenwatch_thread+0xc0/0x1a0
     ? do_wait_intr_irq+0xa0/0xa0
     kthread+0x16b/0x190
     ? set_kthread_struct+0x40/0x40
     ret_from_fork+0x22/0x30
     </TASK>

Fix this by calling xennet_destroy_queues() from xennet_uninit(),
when real_num_tx_queues is still available. This ensures that queues are
destroyed when real_num_tx_queues is set to 0, regardless of how
unregister_netdev() was called.

Originally reported at
https://github.com/QubesOS/qubes-issues/issues/7257

Fixes: d7dac08 ("net-sysfs: update the queue counts in the unregistration path")
Cc: stable@vger.kernel.org
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
donald pushed a commit that referenced this issue Mar 4, 2022
When "debugfs=off" is used on the kernel command line, iwiwifi's
mvm module uses an invalid/unchecked debugfs_dir pointer and causes
a BUG:

 BUG: kernel NULL pointer dereference, address: 000000000000004f
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP
 CPU: 1 PID: 503 Comm: modprobe Tainted: G        W         5.17.0-rc5 #7
 Hardware name: Dell Inc. Inspiron 15 5510/076F7Y, BIOS 2.4.1 11/05/2021
 RIP: 0010:iwl_mvm_dbgfs_register+0x692/0x700 [iwlmvm]
 Code: 69 a0 be 80 01 00 00 48 c7 c7 50 73 6a a0 e8 95 cf ee e0 48 8b 83 b0 1e 00 00 48 c7 c2 54 73 6a a0 be 64 00 00 00 48 8d 7d 8c <48> 8b 48 50 e8 15 22 07 e1 48 8b 43 28 48 8d 55 8c 48 c7 c7 5f 73
 RSP: 0018:ffffc90000a0ba68 EFLAGS: 00010246
 RAX: ffffffffffffffff RBX: ffff88817d6e3328 RCX: ffff88817d6e3328
 RDX: ffffffffa06a7354 RSI: 0000000000000064 RDI: ffffc90000a0ba6c
 RBP: ffffc90000a0bae0 R08: ffffffff824e4880 R09: ffffffffa069d620
 R10: ffffc90000a0ba00 R11: ffffffffffffffff R12: 0000000000000000
 R13: ffffc90000a0bb28 R14: ffff88817d6e3328 R15: ffff88817d6e3320
 FS:  00007f64dd92d740(0000) GS:ffff88847f640000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000000000000004f CR3: 000000016fc79001 CR4: 0000000000770ee0
 PKRU: 55555554
 Call Trace:
  <TASK>
  ? iwl_mvm_mac_setup_register+0xbdc/0xda0 [iwlmvm]
  iwl_mvm_start_post_nvm+0x71/0x100 [iwlmvm]
  iwl_op_mode_mvm_start+0xab8/0xb30 [iwlmvm]
  _iwl_op_mode_start+0x6f/0xd0 [iwlwifi]
  iwl_opmode_register+0x6a/0xe0 [iwlwifi]
  ? 0xffffffffa0231000
  iwl_mvm_init+0x35/0x1000 [iwlmvm]
  ? 0xffffffffa0231000
  do_one_initcall+0x5a/0x1b0
  ? kmem_cache_alloc+0x1e5/0x2f0
  ? do_init_module+0x1e/0x220
  do_init_module+0x48/0x220
  load_module+0x2602/0x2bc0
  ? __kernel_read+0x145/0x2e0
  ? kernel_read_file+0x229/0x290
  __do_sys_finit_module+0xc5/0x130
  ? __do_sys_finit_module+0xc5/0x130
  __x64_sys_finit_module+0x13/0x20
  do_syscall_64+0x38/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f64dda564dd
 Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1b 29 0f 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffdba393f88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f64dda564dd
 RDX: 0000000000000000 RSI: 00005575399e2ab2 RDI: 0000000000000001
 RBP: 000055753a91c5e0 R08: 0000000000000000 R09: 0000000000000002
 R10: 0000000000000001 R11: 0000000000000246 R12: 00005575399e2ab2
 R13: 000055753a91ceb0 R14: 0000000000000000 R15: 000055753a923018
  </TASK>
 Modules linked in: btintel(+) btmtk bluetooth vfat snd_hda_codec_hdmi fat snd_hda_codec_realtek snd_hda_codec_generic iwlmvm(+) snd_sof_pci_intel_tgl mac80211 snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence soundwire_bus snd_sof_intel_hda snd_sof_pci snd_sof snd_sof_xtensa_dsp snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core btrfs snd_compress snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec raid6_pq iwlwifi snd_hda_core snd_pcm snd_timer snd soundcore cfg80211 intel_ish_ipc(+) thunderbolt rfkill intel_ishtp ucsi_acpi wmi i2c_hid_acpi i2c_hid evdev
 CR2: 000000000000004f
 ---[ end trace 0000000000000000 ]---

Check the debugfs_dir pointer for an error before using it.

Fixes: 8c082a9 ("iwlwifi: mvm: simplify iwl_mvm_dbgfs_register")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: linux-wireless@vger.kernel.org
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220223030630.23241-1-rdunlap@infradead.org
[change to make both conditional]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
donald pushed a commit that referenced this issue Mar 4, 2022
Ido Schimmel says:

====================
selftests: mlxsw: A couple of fixes

Patch #1 fixes a breakage due to a change in iproute2 output. The real
problem is not iproute2, but the fact that the check was not strict
enough. Fixed by using JSON output instead. Targeting at net so that the
test will pass as part of old and new kernels regardless of iproute2
version.

Patch #2 fixes an issue uncovered by the first one.
====================

Link: https://lore.kernel.org/r/20220302161447.217447-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
@donald donald reopened this Mar 4, 2022
donald pushed a commit that referenced this issue Mar 5, 2022
When enabling VMD and IOMMU scalable mode, the following kernel panic
call trace/kernel log is shown in Eagle Stream platform (Sapphire Rapids
CPU) during booting:

pci 0000:59:00.5: Adding to iommu group 42
...
vmd 0000:59:00.5: PCI host bridge to bus 10000:80
pci 10000:80:01.0: [8086:352a] type 01 class 0x060400
pci 10000:80:01.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
pci 10000:80:01.0: enabling Extended Tags
pci 10000:80:01.0: PME# supported from D0 D3hot D3cold
pci 10000:80:01.0: DMAR: Setup RID2PASID failed
pci 10000:80:01.0: Failed to add to iommu group 42: -16
pci 10000:80:03.0: [8086:352b] type 01 class 0x060400
pci 10000:80:03.0: reg 0x10: [mem 0x00000000-0x0001ffff 64bit]
pci 10000:80:03.0: enabling Extended Tags
pci 10000:80:03.0: PME# supported from D0 D3hot D3cold
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.17.0-rc3+ #7
Hardware name: Lenovo ThinkSystem SR650V3/SB27A86647, BIOS ESE101Y-1.00 01/13/2022
Workqueue: events work_for_cpu_fn
RIP: 0010:__list_add_valid.cold+0x26/0x3f
Code: 9a 4a ab ff 4c 89 c1 48 c7 c7 40 0c d9 9e e8 b9 b1 fe ff 0f
      0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 f0 0c d9 9e e8 a2 b1
      fe ff <0f> 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 98 0c d9
      9e e8 8b b1 fe
RSP: 0000:ff5ad434865b3a40 EFLAGS: 00010246
RAX: 0000000000000058 RBX: ff4d61160b74b880 RCX: ff4d61255e1fffa8
RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd34f20
RBP: ff4d611d8e245c00 R08: 0000000000000000 R09: ff5ad434865b3888
R10: ff5ad434865b3880 R11: ff4d61257fdc6fe8 R12: ff4d61160b74b8a0
R13: ff4d61160b74b8a0 R14: ff4d611d8e245c10 R15: ff4d611d8001ba70
FS:  0000000000000000(0000) GS:ff4d611d5ea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ff4d611fa1401000 CR3: 0000000aa0210001 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 intel_pasid_alloc_table+0x9c/0x1d0
 dmar_insert_one_dev_info+0x423/0x540
 ? device_to_iommu+0x12d/0x2f0
 intel_iommu_attach_device+0x116/0x290
 __iommu_attach_device+0x1a/0x90
 iommu_group_add_device+0x190/0x2c0
 __iommu_probe_device+0x13e/0x250
 iommu_probe_device+0x24/0x150
 iommu_bus_notifier+0x69/0x90
 blocking_notifier_call_chain+0x5a/0x80
 device_add+0x3db/0x7b0
 ? arch_memremap_can_ram_remap+0x19/0x50
 ? memremap+0x75/0x140
 pci_device_add+0x193/0x1d0
 pci_scan_single_device+0xb9/0xf0
 pci_scan_slot+0x4c/0x110
 pci_scan_child_bus_extend+0x3a/0x290
 vmd_enable_domain.constprop.0+0x63e/0x820
 vmd_probe+0x163/0x190
 local_pci_probe+0x42/0x80
 work_for_cpu_fn+0x13/0x20
 process_one_work+0x1e2/0x3b0
 worker_thread+0x1c4/0x3a0
 ? rescuer_thread+0x370/0x370
 kthread+0xc7/0xf0
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
...
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---

The following 'lspci' output shows devices '10000:80:*' are subdevices of
the VMD device 0000:59:00.5:

  $ lspci
  ...
  0000:59:00.5 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller (rev 20)
  ...
  10000:80:01.0 PCI bridge: Intel Corporation Device 352a (rev 03)
  10000:80:03.0 PCI bridge: Intel Corporation Device 352b (rev 03)
  10000:80:05.0 PCI bridge: Intel Corporation Device 352c (rev 03)
  10000:80:07.0 PCI bridge: Intel Corporation Device 352d (rev 03)
  10000:81:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
  10000:82:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]

The symptom 'list_add double add' is caused by the following failure
message:

  pci 10000:80:01.0: DMAR: Setup RID2PASID failed
  pci 10000:80:01.0: Failed to add to iommu group 42: -16
  pci 10000:80:03.0: [8086:352b] type 01 class 0x060400

Device 10000:80:01.0 is the subdevice of the VMD device 0000:59:00.5,
so invoking intel_pasid_alloc_table() gets the pasid_table of the VMD
device 0000:59:00.5. Here is call path:

  intel_pasid_alloc_table
    pci_for_each_dma_alias
     get_alias_pasid_table
       search_pasid_table

pci_real_dma_dev() in pci_for_each_dma_alias() gets the real dma device
which is the VMD device 0000:59:00.5. However, pte of the VMD device
0000:59:00.5 has been configured during this message "pci 0000:59:00.5:
Adding to iommu group 42". So, the status -EBUSY is returned when
configuring pasid entry for device 10000:80:01.0.

It then invokes dmar_remove_one_dev_info() to release
'struct device_domain_info *' from iommu_devinfo_cache. But, the pasid
table is not released because of the following statement in
__dmar_remove_one_dev_info():

	if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
		...
		intel_pasid_free_table(info->dev);
        }

The subsequent dmar_insert_one_dev_info() operation of device
10000:80:03.0 allocates 'struct device_domain_info *' from
iommu_devinfo_cache. The allocated address is the same address that
is released previously for device 10000:80:01.0. Finally, invoking
device_attach_pasid_table() causes the issue.

`git bisect` points to the offending commit 474dd1c ("iommu/vt-d:
Fix clearing real DMA device's scalable-mode context entries"), which
releases the pasid table if the device is not the subdevice by
checking the returned status of dev_is_real_dma_subdevice().
Reverting the offending commit can work around the issue.

The solution is to prevent from allocating pasid table if those
devices are subdevices of the VMD device.

Fixes: 474dd1c ("iommu/vt-d: Fix clearing real DMA device's scalable-mode context entries")
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Link: https://lore.kernel.org/r/20220216091307.703-1-adrianhuang0701@gmail.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220221053348.262724-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
donald pushed a commit that referenced this issue Mar 5, 2022
This driver, like several others, uses a chained IRQ for each GPIO bank,
and forwards .irq_set_wake to the GPIO bank's upstream IRQ. As a result,
a call to irq_set_irq_wake() needs to lock both the upstream and
downstream irq_desc's. Lockdep considers this to be a possible deadlock
when the irq_desc's share lockdep classes, which they do by default:

 ============================================
 WARNING: possible recursive locking detected
 5.17.0-rc3-00394-gc849047c2473 #1 Not tainted
 --------------------------------------------
 init/307 is trying to acquire lock:
 c2dfe27c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0

 but task is already holding lock:
 c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&irq_desc_lock_class);
   lock(&irq_desc_lock_class);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 4 locks held by init/307:
  #0: c1f29f18 (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot+0x90/0x23c
  #1: c20f7760 (&dev->mutex){....}-{3:3}, at: device_shutdown+0xf4/0x224
  #2: c2e804d8 (&dev->mutex){....}-{3:3}, at: device_shutdown+0x104/0x224
  #3: c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0

 stack backtrace:
 CPU: 0 PID: 307 Comm: init Not tainted 5.17.0-rc3-00394-gc849047c2473 #1
 Hardware name: Allwinner sun8i Family
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x68/0x90
  dump_stack_lvl from __lock_acquire+0x1680/0x31a0
  __lock_acquire from lock_acquire+0x148/0x3dc
  lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
  _raw_spin_lock_irqsave from __irq_get_desc_lock+0x58/0xa0
  __irq_get_desc_lock from irq_set_irq_wake+0x2c/0x19c
  irq_set_irq_wake from irq_set_irq_wake+0x13c/0x19c
    [tail call from sunxi_pinctrl_irq_set_wake]
  irq_set_irq_wake from gpio_keys_suspend+0x80/0x1a4
  gpio_keys_suspend from gpio_keys_shutdown+0x10/0x2c
  gpio_keys_shutdown from device_shutdown+0x180/0x224
  device_shutdown from __do_sys_reboot+0x134/0x23c
  __do_sys_reboot from ret_fast_syscall+0x0/0x1c

However, this can never deadlock because the upstream and downstream
IRQs are never the same (nor do they even involve the same irqchip).

Silence this erroneous lockdep splat by applying what appears to be the
usual fix of moving the GPIO IRQs to separate lockdep classes.

Fixes: a59c99d ("pinctrl: sunxi: Forward calls to irq_set_irq_wake")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20220216040037.22730-1-samuel@sholland.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
donald pushed a commit that referenced this issue Mar 7, 2022
…_transaction()

We are seeing crashes similar to the following trace:

[38.969182] WARNING: CPU: 20 PID: 2105 at fs/btrfs/relocation.c:4070 btrfs_relocate_block_group+0x2dc/0x340 [btrfs]
[38.973556] CPU: 20 PID: 2105 Comm: btrfs Not tainted 5.17.0-rc4 #54
[38.974580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[38.976539] RIP: 0010:btrfs_relocate_block_group+0x2dc/0x340 [btrfs]
[38.980336] RSP: 0000:ffffb0dd42e03c20 EFLAGS: 00010206
[38.981218] RAX: ffff96cfc4ede800 RBX: ffff96cfc3ce0000 RCX: 000000000002ca14
[38.982560] RDX: 0000000000000000 RSI: 4cfd109a0bcb5d7f RDI: ffff96cfc3ce0360
[38.983619] RBP: ffff96cfc309c000 R08: 0000000000000000 R09: 0000000000000000
[38.984678] R10: ffff96cec0000001 R11: ffffe84c80000000 R12: ffff96cfc4ede800
[38.985735] R13: 0000000000000000 R14: 0000000000000000 R15: ffff96cfc3ce0360
[38.987146] FS:  00007f11c15218c0(0000) GS:ffff96d6dfb00000(0000) knlGS:0000000000000000
[38.988662] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[38.989398] CR2: 00007ffc922c8e60 CR3: 00000001147a6001 CR4: 0000000000370ee0
[38.990279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[38.991219] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[38.992528] Call Trace:
[38.992854]  <TASK>
[38.993148]  btrfs_relocate_chunk+0x27/0xe0 [btrfs]
[38.993941]  btrfs_balance+0x78e/0xea0 [btrfs]
[38.994801]  ? vsnprintf+0x33c/0x520
[38.995368]  ? __kmalloc_track_caller+0x351/0x440
[38.996198]  btrfs_ioctl_balance+0x2b9/0x3a0 [btrfs]
[38.997084]  btrfs_ioctl+0x11b0/0x2da0 [btrfs]
[38.997867]  ? mod_objcg_state+0xee/0x340
[38.998552]  ? seq_release+0x24/0x30
[38.999184]  ? proc_nr_files+0x30/0x30
[38.999654]  ? call_rcu+0xc8/0x2f0
[39.000228]  ? __x64_sys_ioctl+0x84/0xc0
[39.000872]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[39.001973]  __x64_sys_ioctl+0x84/0xc0
[39.002566]  do_syscall_64+0x3a/0x80
[39.003011]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[39.003735] RIP: 0033:0x7f11c166959b
[39.007324] RSP: 002b:00007fff2543e998 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[39.008521] RAX: ffffffffffffffda RBX: 00007f11c1521698 RCX: 00007f11c166959b
[39.009833] RDX: 00007fff2543ea40 RSI: 00000000c4009420 RDI: 0000000000000003
[39.011270] RBP: 0000000000000003 R08: 0000000000000013 R09: 00007f11c16f94e0
[39.012581] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff25440df3
[39.014046] R13: 0000000000000000 R14: 00007fff2543ea40 R15: 0000000000000001
[39.015040]  </TASK>
[39.015418] ---[ end trace 0000000000000000 ]---
[43.131559] ------------[ cut here ]------------
[43.132234] kernel BUG at fs/btrfs/extent-tree.c:2717!
[43.133031] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[43.133702] CPU: 1 PID: 1839 Comm: btrfs Tainted: G        W         5.17.0-rc4 #54
[43.134863] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[43.136426] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs]
[43.139913] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246
[43.140629] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001
[43.141604] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff
[43.142645] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50
[43.143669] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000
[43.144657] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000
[43.145686] FS:  00007f7657dd68c0(0000) GS:ffff96d6df640000(0000) knlGS:0000000000000000
[43.146808] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43.147584] CR2: 00007f7fe81bf5b0 CR3: 00000001093ee004 CR4: 0000000000370ee0
[43.148589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[43.149581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[43.150559] Call Trace:
[43.150904]  <TASK>
[43.151253]  btrfs_finish_extent_commit+0x88/0x290 [btrfs]
[43.152127]  btrfs_commit_transaction+0x74f/0xaa0 [btrfs]
[43.152932]  ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs]
[43.153786]  btrfs_ioctl+0x1edc/0x2da0 [btrfs]
[43.154475]  ? __check_object_size+0x150/0x170
[43.155170]  ? preempt_count_add+0x49/0xa0
[43.155753]  ? __x64_sys_ioctl+0x84/0xc0
[43.156437]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[43.157456]  __x64_sys_ioctl+0x84/0xc0
[43.157980]  do_syscall_64+0x3a/0x80
[43.158543]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[43.159231] RIP: 0033:0x7f7657f1e59b
[43.161819] RSP: 002b:00007ffda5cd1658 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[43.162702] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7657f1e59b
[43.163526] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003
[43.164358] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[43.165208] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[43.166029] R13: 00005621b91c3232 R14: 00005621b91ba580 R15: 00007ffda5cd1800
[43.166863]  </TASK>
[43.167125] Modules linked in: btrfs blake2b_generic xor pata_acpi ata_piix libata raid6_pq scsi_mod libcrc32c virtio_net virtio_rng net_failover rng_core failover scsi_common
[43.169552] ---[ end trace 0000000000000000 ]---
[43.171226] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs]
[43.174767] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246
[43.175600] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001
[43.176468] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff
[43.177357] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50
[43.178271] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000
[43.179178] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000
[43.180071] FS:  00007f7657dd68c0(0000) GS:ffff96d6df800000(0000) knlGS:0000000000000000
[43.181073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43.181808] CR2: 00007fe09905f010 CR3: 00000001093ee004 CR4: 0000000000370ee0
[43.182706] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[43.183591] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

We first hit the WARN_ON(rc->block_group->pinned > 0) in
btrfs_relocate_block_group() and then the BUG_ON(!cache) in
unpin_extent_range(). This tells us that we are exiting relocation and
removing the block group with bytes still pinned for that block group.
This is supposed to be impossible: the last thing relocate_block_group()
does is commit the transaction to get rid of pinned extents.

Commit d0c2f4f ("btrfs: make concurrent fsyncs wait less when
waiting for a transaction commit") introduced an optimization so that
commits from fsync don't have to wait for the previous commit to unpin
extents. This was only intended to affect fsync, but it inadvertently
made it possible for any commit to skip waiting for the previous commit
to unpin. This is because if a call to btrfs_commit_transaction() finds
that another thread is already committing the transaction, it waits for
the other thread to complete the commit and then returns. If that other
thread was in fsync, then it completes the commit without completing the
previous commit. This makes the following sequence of events possible:

Thread 1____________________|Thread 2 (fsync)_____________________|Thread 3 (balance)___________________
btrfs_commit_transaction(N) |                                     |
  btrfs_run_delayed_refs    |                                     |
    pin extents             |                                     |
  ...                       |                                     |
  state = UNBLOCKED         |btrfs_sync_file                      |
                            |  btrfs_start_transaction(N + 1)     |relocate_block_group
                            |                                     |  btrfs_join_transaction(N + 1)
                            |  btrfs_commit_transaction(N + 1)    |
  ...                       |  trans->state = COMMIT_START        |
                            |                                     |  btrfs_commit_transaction(N + 1)
                            |                                     |    wait_for_commit(N + 1, COMPLETED)
                            |  wait_for_commit(N, SUPER_COMMITTED)|
  state = SUPER_COMMITTED   |  ...                                |
  btrfs_finish_extent_commit|                                     |
    unpin_extent_range()    |  trans->state = COMPLETED           |
                            |                                     |    return
                            |                                     |
    ...                     |                                     |Thread 1 isn't done, so pinned > 0
                            |                                     |and we WARN
                            |                                     |
                            |                                     |btrfs_remove_block_group
    unpin_extent_range()    |                                     |
      Thread 3 removed the  |                                     |
      block group, so we BUG|                                     |

There are other sequences involving SUPER_COMMITTED transactions that
can cause a similar outcome.

We could fix this by making relocation explicitly wait for unpinning,
but there may be other cases that need it. Josef mentioned ENOSPC
flushing and the free space cache inode as other potential victims.
Rather than playing whack-a-mole, this fix is conservative and makes all
commits not in fsync wait for all previous transactions, which is what
the optimization intended.

Fixes: d0c2f4f ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
donald pushed a commit that referenced this issue Mar 9, 2022
…nters

[ Upstream commit 77360f9 ]

Pingfan reported that the following causes a fault:

  echo "filename ~ \"cpu\"" > events/syscalls/sys_enter_openat/filter
  echo 1 > events/syscalls/sys_enter_at/enable

The reason is that trace event filter treats the user space pointer
defined by "filename" as a normal pointer to compare against the "cpu"
string. The following bug happened:

 kvm-03-guest16 login: [72198.026181] BUG: unable to handle page fault for address: 00007fffaae8ef60
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0001) - permissions violation
 PGD 80000001008b7067 P4D 80000001008b7067 PUD 2393f1067 PMD 2393ec067 PTE 8000000108f47867
 Oops: 0001 [#1] PREEMPT SMP PTI
 CPU: 1 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.14.0-32.el9.x86_64 #1
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:strlen+0x0/0x20
 Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11
       48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8
       48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
 RSP: 0018:ffffb5b900013e48 EFLAGS: 00010246
 RAX: 0000000000000018 RBX: ffff8fc1c49ede00 RCX: 0000000000000000
 RDX: 0000000000000020 RSI: ffff8fc1c02d601c RDI: 00007fffaae8ef60
 RBP: 00007fffaae8ef60 R08: 0005034f4ddb8ea4 R09: 0000000000000000
 R10: ffff8fc1c02d601c R11: 0000000000000000 R12: ffff8fc1c8a6e380
 R13: 0000000000000000 R14: ffff8fc1c02d6010 R15: ffff8fc1c00453c0
 FS:  00007fa86123db40(0000) GS:ffff8fc2ffd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fffaae8ef60 CR3: 0000000102880001 CR4: 00000000007706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  filter_pred_pchar+0x18/0x40
  filter_match_preds+0x31/0x70
  ftrace_syscall_enter+0x27a/0x2c0
  syscall_trace_enter.constprop.0+0x1aa/0x1d0
  do_syscall_64+0x16/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7fa861d88664

The above happened because the kernel tried to access user space directly
and triggered a "supervisor read access in kernel mode" fault. Worse yet,
the memory could not even be loaded yet, and a SEGFAULT could happen as
well. This could be true for kernel space accessing as well.

To be even more robust, test both kernel and user space strings. If the
string fails to read, then simply have the filter fail.

Note, TASK_SIZE is used to determine if the pointer is user or kernel space
and the appropriate strncpy_from_kernel/user_nofault() function is used to
copy the memory. For some architectures, the compare to TASK_SIZE may always
pick user space or kernel space. If it gets it wrong, the only thing is that
the filter will fail to match. In the future, this needs to be fixed to have
the event denote which should be used. But failing a filter is much better
than panicing the machine, and that can be solved later.

Link: https://lore.kernel.org/all/20220107044951.22080-1-kernelfans@gmail.com/
Link: https://lkml.kernel.org/r/20220110115532.536088fd@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Reported-by: Pingfan Liu <kernelfans@gmail.com>
Tested-by: Pingfan Liu <kernelfans@gmail.com>
Fixes: 87a342f ("tracing/filters: Support filtering for char * strings")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Mar 9, 2022
commit dcf4ff7 upstream.

xennet_destroy_queues() relies on info->netdev->real_num_tx_queues to
delete queues. Since d7dac08
("net-sysfs: update the queue counts in the unregistration path"),
unregister_netdev() indirectly sets real_num_tx_queues to 0. Those two
facts together means, that xennet_destroy_queues() called from
xennet_remove() cannot do its job, because it's called after
unregister_netdev(). This results in kfree-ing queues that are still
linked in napi, which ultimately crashes:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 0 P4D 0
    Oops: 0000 [#1] PREEMPT SMP PTI
    CPU: 1 PID: 52 Comm: xenwatch Tainted: G        W         5.16.10-1.32.fc32.qubes.x86_64+ #226
    RIP: 0010:free_netdev+0xa3/0x1a0
    Code: ff 48 89 df e8 2e e9 00 00 48 8b 43 50 48 8b 08 48 8d b8 a0 fe ff ff 48 8d a9 a0 fe ff ff 49 39 c4 75 26 eb 47 e8 ed c1 66 ff <48> 8b 85 60 01 00 00 48 8d 95 60 01 00 00 48 89 ef 48 2d 60 01 00
    RSP: 0000:ffffc90000bcfd00 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: ffff88800edad000 RCX: 0000000000000000
    RDX: 0000000000000001 RSI: ffffc90000bcfc30 RDI: 00000000ffffffff
    RBP: fffffffffffffea0 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800edad050
    R13: ffff8880065f8f88 R14: 0000000000000000 R15: ffff8880066c6680
    FS:  0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 00000000e998c006 CR4: 00000000003706e0
    Call Trace:
     <TASK>
     xennet_remove+0x13d/0x300 [xen_netfront]
     xenbus_dev_remove+0x6d/0xf0
     __device_release_driver+0x17a/0x240
     device_release_driver+0x24/0x30
     bus_remove_device+0xd8/0x140
     device_del+0x18b/0x410
     ? _raw_spin_unlock+0x16/0x30
     ? klist_iter_exit+0x14/0x20
     ? xenbus_dev_request_and_reply+0x80/0x80
     device_unregister+0x13/0x60
     xenbus_dev_changed+0x18e/0x1f0
     xenwatch_thread+0xc0/0x1a0
     ? do_wait_intr_irq+0xa0/0xa0
     kthread+0x16b/0x190
     ? set_kthread_struct+0x40/0x40
     ret_from_fork+0x22/0x30
     </TASK>

Fix this by calling xennet_destroy_queues() from xennet_uninit(),
when real_num_tx_queues is still available. This ensures that queues are
destroyed when real_num_tx_queues is set to 0, regardless of how
unregister_netdev() was called.

Originally reported at
https://github.com/QubesOS/qubes-issues/issues/7257

Fixes: d7dac08 ("net-sysfs: update the queue counts in the unregistration path")
Cc: stable@vger.kernel.org
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Mar 9, 2022
commit bac129d upstream.

This driver, like several others, uses a chained IRQ for each GPIO bank,
and forwards .irq_set_wake to the GPIO bank's upstream IRQ. As a result,
a call to irq_set_irq_wake() needs to lock both the upstream and
downstream irq_desc's. Lockdep considers this to be a possible deadlock
when the irq_desc's share lockdep classes, which they do by default:

 ============================================
 WARNING: possible recursive locking detected
 5.17.0-rc3-00394-gc849047c2473 #1 Not tainted
 --------------------------------------------
 init/307 is trying to acquire lock:
 c2dfe27c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0

 but task is already holding lock:
 c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&irq_desc_lock_class);
   lock(&irq_desc_lock_class);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 4 locks held by init/307:
  #0: c1f29f18 (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot+0x90/0x23c
  #1: c20f7760 (&dev->mutex){....}-{3:3}, at: device_shutdown+0xf4/0x224
  #2: c2e804d8 (&dev->mutex){....}-{3:3}, at: device_shutdown+0x104/0x224
  #3: c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0

 stack backtrace:
 CPU: 0 PID: 307 Comm: init Not tainted 5.17.0-rc3-00394-gc849047c2473 #1
 Hardware name: Allwinner sun8i Family
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x68/0x90
  dump_stack_lvl from __lock_acquire+0x1680/0x31a0
  __lock_acquire from lock_acquire+0x148/0x3dc
  lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
  _raw_spin_lock_irqsave from __irq_get_desc_lock+0x58/0xa0
  __irq_get_desc_lock from irq_set_irq_wake+0x2c/0x19c
  irq_set_irq_wake from irq_set_irq_wake+0x13c/0x19c
    [tail call from sunxi_pinctrl_irq_set_wake]
  irq_set_irq_wake from gpio_keys_suspend+0x80/0x1a4
  gpio_keys_suspend from gpio_keys_shutdown+0x10/0x2c
  gpio_keys_shutdown from device_shutdown+0x180/0x224
  device_shutdown from __do_sys_reboot+0x134/0x23c
  __do_sys_reboot from ret_fast_syscall+0x0/0x1c

However, this can never deadlock because the upstream and downstream
IRQs are never the same (nor do they even involve the same irqchip).

Silence this erroneous lockdep splat by applying what appears to be the
usual fix of moving the GPIO IRQs to separate lockdep classes.

Fixes: a59c99d ("pinctrl: sunxi: Forward calls to irq_set_irq_wake")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20220216040037.22730-1-samuel@sholland.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Mar 11, 2022
Commit 54659ca ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783 ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[  +0.074085] WARNING: possible circular locking dependency detected
[  +0.074077] 5.16.0-1-amd64 #1 Tainted: G         C  E
[  +0.064710] ------------------------------------------------------
[  +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[  +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[  +0.114921]
              but task is already holding lock:
[  +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[  +0.116976]
              which lock already depends on the new lock.

[  +0.098037]
              the existing dependency chain (in reverse order) is:
[  +0.089704]
              -> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[  +0.077232]        _raw_spin_lock_bh+0x34/0x40
[  +0.053261]        xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[  +0.082572]        rtw_xmit+0x58b/0x940 [r8723bs]
[  +0.056528]        _rtw_xmit_entry+0xba/0x350 [r8723bs]
[  +0.062755]        dev_hard_start_xmit+0xf1/0x320
[  +0.056381]        sch_direct_xmit+0x9e/0x360
[  +0.052212]        __dev_queue_xmit+0xce4/0x1080
[  +0.055334]        ip6_finish_output2+0x18f/0x6e0
[  +0.056378]        ndisc_send_skb+0x2c8/0x870
[  +0.052209]        ndisc_send_ns+0xd3/0x210
[  +0.050130]        addrconf_dad_work+0x3df/0x5a0
[  +0.055338]        process_one_work+0x274/0x5a0
[  +0.054296]        worker_thread+0x52/0x3b0
[  +0.050124]        kthread+0x16c/0x1a0
[  +0.044925]        ret_from_fork+0x1f/0x30
[  +0.049092]
              -> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[  +0.074101]        __lock_acquire+0x10f5/0x1d80
[  +0.054298]        lock_acquire+0xd7/0x300
[  +0.049088]        _raw_spin_lock_bh+0x34/0x40
[  +0.053248]        rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[  +0.066949]        rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[  +0.066946]        rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[  +0.078386]        wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[  +0.065903]        rtw_recv_entry+0xe36/0x1160 [r8723bs]
[  +0.063809]        rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[  +0.071093]        tasklet_action_common.constprop.0+0xe5/0x110
[  +0.070966]        __do_softirq+0x16f/0x50a
[  +0.050134]        __irq_exit_rcu+0xeb/0x140
[  +0.051172]        irq_exit_rcu+0xa/0x20
[  +0.047006]        common_interrupt+0xb8/0xd0
[  +0.052214]        asm_common_interrupt+0x1e/0x40
[  +0.056381]        finish_task_switch.isra.0+0x100/0x3a0
[  +0.063670]        __schedule+0x3ad/0xd20
[  +0.048047]        schedule+0x4e/0xc0
[  +0.043880]        smpboot_thread_fn+0xc4/0x220
[  +0.054298]        kthread+0x16c/0x1a0
[  +0.044922]        ret_from_fork+0x1f/0x30
[  +0.049088]
              other info that might help us debug this:

[  +0.095950]  Possible unsafe locking scenario:

[  +0.070952]        CPU0                    CPU1
[  +0.054282]        ----                    ----
[  +0.054285]   lock(&psta->sleep_q.lock);
[  +0.047004]                                lock(&pxmitpriv->lock);
[  +0.074082]                                lock(&psta->sleep_q.lock);
[  +0.077209]   lock(&pxmitpriv->lock);
[  +0.043873]
               *** DEADLOCK ***

[  +0.070950] 1 lock held by ksoftirqd/3/29:
[  +0.049082]  #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Mar 17, 2022
commit 8f43470 upstream.

Commit 54659ca ("staging: rtl8723bs: remove possible deadlock when
disconnect (v2)") split the locking of pxmitpriv->lock vs sleep_q/lock
into 2 locks in attempt to fix a lockdep reported issue with the locking
order of the sta_hash_lock vs pxmitpriv->lock.

But in the end this turned out to not fully solve the sta_hash_lock issue
so commit a7ac783 ("staging: rtl8723bs: remove a second possible
deadlock") was added to fix this in another way.

The original fix was kept as it was still seen as a good thing to have,
but now it turns out that it creates a deadlock in access-point mode:

[Feb20 23:47] ======================================================
[  +0.074085] WARNING: possible circular locking dependency detected
[  +0.074077] 5.16.0-1-amd64 #1 Tainted: G         C  E
[  +0.064710] ------------------------------------------------------
[  +0.074075] ksoftirqd/3/29 is trying to acquire lock:
[  +0.060542] ffffb8b30062ab00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[  +0.114921]
              but task is already holding lock:
[  +0.069908] ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]
[  +0.116976]
              which lock already depends on the new lock.

[  +0.098037]
              the existing dependency chain (in reverse order) is:
[  +0.089704]
              -> #1 (&psta->sleep_q.lock){+.-.}-{2:2}:
[  +0.077232]        _raw_spin_lock_bh+0x34/0x40
[  +0.053261]        xmitframe_enqueue_for_sleeping_sta+0xc1/0x2f0 [r8723bs]
[  +0.082572]        rtw_xmit+0x58b/0x940 [r8723bs]
[  +0.056528]        _rtw_xmit_entry+0xba/0x350 [r8723bs]
[  +0.062755]        dev_hard_start_xmit+0xf1/0x320
[  +0.056381]        sch_direct_xmit+0x9e/0x360
[  +0.052212]        __dev_queue_xmit+0xce4/0x1080
[  +0.055334]        ip6_finish_output2+0x18f/0x6e0
[  +0.056378]        ndisc_send_skb+0x2c8/0x870
[  +0.052209]        ndisc_send_ns+0xd3/0x210
[  +0.050130]        addrconf_dad_work+0x3df/0x5a0
[  +0.055338]        process_one_work+0x274/0x5a0
[  +0.054296]        worker_thread+0x52/0x3b0
[  +0.050124]        kthread+0x16c/0x1a0
[  +0.044925]        ret_from_fork+0x1f/0x30
[  +0.049092]
              -> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[  +0.074101]        __lock_acquire+0x10f5/0x1d80
[  +0.054298]        lock_acquire+0xd7/0x300
[  +0.049088]        _raw_spin_lock_bh+0x34/0x40
[  +0.053248]        rtw_xmit_classifier+0x8a/0x140 [r8723bs]
[  +0.066949]        rtw_xmitframe_enqueue+0xa/0x20 [r8723bs]
[  +0.066946]        rtl8723bs_hal_xmitframe_enqueue+0x14/0x50 [r8723bs]
[  +0.078386]        wakeup_sta_to_xmit+0xa6/0x300 [r8723bs]
[  +0.065903]        rtw_recv_entry+0xe36/0x1160 [r8723bs]
[  +0.063809]        rtl8723bs_recv_tasklet+0x349/0x6c0 [r8723bs]
[  +0.071093]        tasklet_action_common.constprop.0+0xe5/0x110
[  +0.070966]        __do_softirq+0x16f/0x50a
[  +0.050134]        __irq_exit_rcu+0xeb/0x140
[  +0.051172]        irq_exit_rcu+0xa/0x20
[  +0.047006]        common_interrupt+0xb8/0xd0
[  +0.052214]        asm_common_interrupt+0x1e/0x40
[  +0.056381]        finish_task_switch.isra.0+0x100/0x3a0
[  +0.063670]        __schedule+0x3ad/0xd20
[  +0.048047]        schedule+0x4e/0xc0
[  +0.043880]        smpboot_thread_fn+0xc4/0x220
[  +0.054298]        kthread+0x16c/0x1a0
[  +0.044922]        ret_from_fork+0x1f/0x30
[  +0.049088]
              other info that might help us debug this:

[  +0.095950]  Possible unsafe locking scenario:

[  +0.070952]        CPU0                    CPU1
[  +0.054282]        ----                    ----
[  +0.054285]   lock(&psta->sleep_q.lock);
[  +0.047004]                                lock(&pxmitpriv->lock);
[  +0.074082]                                lock(&psta->sleep_q.lock);
[  +0.077209]   lock(&pxmitpriv->lock);
[  +0.043873]
               *** DEADLOCK ***

[  +0.070950] 1 lock held by ksoftirqd/3/29:
[  +0.049082]  #0: ffffb8b3007ab704 (&psta->sleep_q.lock){+.-.}-{2:2}, at: wakeup_sta_to_xmit+0x3b/0x300 [r8723bs]

Analysis shows that in hindsight the splitting of the lock was not
a good idea, so revert this to fix the access-point mode deadlock.

Note this is a straight-forward revert done with git revert, the commented
out "/* spin_lock_bh(&psta_bmc->sleep_q.lock); */" lines were part of the
code before the reverted changes.

Fixes: 54659ca ("staging: rtl8723bs: remove possible deadlock when disconnect (v2)")
Cc: stable <stable@vger.kernel.org>
Cc: Fabio Aiuto <fabioaiuto83@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215542
Link: https://lore.kernel.org/r/20220302101637.26542-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Mar 20, 2022
A page fault was encountered in mpt3sas on a LUN reset error path:

[  145.763216] mpt3sas_cm1: Task abort tm failed: handle(0x0002),timeout(30) tr_method(0x0) smid(3) msix_index(0)
[  145.778932] scsi 1:0:0:0: task abort: FAILED scmd(0x0000000024ba29a2)
[  145.817307] scsi 1:0:0:0: attempting device reset! scmd(0x0000000024ba29a2)
[  145.827253] scsi 1:0:0:0: [sg1] tag#2 CDB: Receive Diagnostic 1c 01 01 ff fc 00
[  145.837617] scsi target1:0:0: handle(0x0002), sas_address(0x500605b0000272b9), phy(0)
[  145.848598] scsi target1:0:0: enclosure logical id(0x500605b0000272b8), slot(0)
[  149.858378] mpt3sas_cm1: Poll ReplyDescriptor queues for completion of smid(0), task_type(0x05), handle(0x0002)
[  149.875202] BUG: unable to handle page fault for address: 00000007fffc445d
[  149.885617] #PF: supervisor read access in kernel mode
[  149.894346] #PF: error_code(0x0000) - not-present page
[  149.903123] PGD 0 P4D 0
[  149.909387] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  149.917417] CPU: 24 PID: 3512 Comm: scsi_eh_1 Kdump: loaded Tainted: G S         O      5.10.89-altav-1 #1
[  149.934327] Hardware name: DDN           200NVX2             /200NVX2-MB          , BIOS ATHG2.2.02.01 09/10/2021
[  149.951871] RIP: 0010:_base_process_reply_queue+0x4b/0x900 [mpt3sas]
[  149.961889] Code: 0f 84 22 02 00 00 8d 48 01 49 89 fd 48 8d 57 38 f0 0f b1 4f 38 0f 85 d8 01 00 00 49 8b 45 10 45 31 e4 41 8b 55 0c 48 8d 1c d0 <0f> b6 03 83 e0 0f 3c 0f 0f 85 a2 00 00 00 e9 e6 01 00 00 0f b7 ee
[  149.991952] RSP: 0018:ffffc9000f1ebcb8 EFLAGS: 00010246
[  150.000937] RAX: 0000000000000055 RBX: 00000007fffc445d RCX: 000000002548f071
[  150.011841] RDX: 00000000ffff8881 RSI: 0000000000000001 RDI: ffff888125ed50d8
[  150.022670] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000ffff7fff
[  150.033445] R10: ffffc9000f1ebb68 R11: ffffc9000f1ebb60 R12: 0000000000000000
[  150.044204] R13: ffff888125ed50d8 R14: 0000000000000080 R15: 34cdc00034cdea80
[  150.054963] FS:  0000000000000000(0000) GS:ffff88dfaf200000(0000) knlGS:0000000000000000
[  150.066715] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  150.076078] CR2: 00000007fffc445d CR3: 000000012448a006 CR4: 0000000000770ee0
[  150.086887] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  150.097670] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  150.108323] PKRU: 55555554
[  150.114690] Call Trace:
[  150.120497]  ? printk+0x48/0x4a
[  150.127049]  mpt3sas_scsih_issue_tm.cold.114+0x2e/0x2b3 [mpt3sas]
[  150.136453]  mpt3sas_scsih_issue_locked_tm+0x86/0xb0 [mpt3sas]
[  150.145759]  scsih_dev_reset+0xea/0x300 [mpt3sas]
[  150.153891]  scsi_eh_ready_devs+0x541/0x9e0 [scsi_mod]
[  150.162206]  ? __scsi_host_match+0x20/0x20 [scsi_mod]
[  150.170406]  ? scsi_try_target_reset+0x90/0x90 [scsi_mod]
[  150.178925]  ? blk_mq_tagset_busy_iter+0x45/0x60
[  150.186638]  ? scsi_try_target_reset+0x90/0x90 [scsi_mod]
[  150.195087]  scsi_error_handler+0x3a5/0x4a0 [scsi_mod]
[  150.203206]  ? __schedule+0x1e9/0x610
[  150.209783]  ? scsi_eh_get_sense+0x210/0x210 [scsi_mod]
[  150.217924]  kthread+0x12e/0x150
[  150.224041]  ? kthread_worker_fn+0x130/0x130
[  150.231206]  ret_from_fork+0x1f/0x30

This is caused by mpt3sas_base_sync_reply_irqs() using an invalid reply_q
pointer outside of the list_for_each_entry() loop. At the end of the full
list traversal the pointer is invalid.

Move the _base_process_reply_queue() call inside of the loop.

Link: https://lore.kernel.org/r/d625deae-a958-0ace-2ba3-0888dd0a415b@ddn.com
Fixes: 711a923 ("scsi: mpt3sas: Postprocessing of target and LUN reset")
Cc: stable@vger.kernel.org
Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Matt Lupfer <mlupfer@ddn.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
donald pushed a commit that referenced this issue Mar 21, 2022
The commit in Fixes started adding INT3 after RETs as a mitigation
against straight-line speculation.

The fastop SETcc implementation in kvm's insn emulator uses macro magic
to generate all possible SETcc functions and to jump to them when
emulating the respective instruction.

However, it hardcodes the size and alignment of those functions to 4: a
three-byte SETcc insn and a single-byte RET. BUT, with SLS, there's an
INT3 that gets slapped after the RET, which brings the whole scheme out
of alignment:

  15:   0f 90 c0                seto   %al
  18:   c3                      ret
  19:   cc                      int3
  1a:   0f 1f 00                nopl   (%rax)
  1d:   0f 91 c0                setno  %al
  20:   c3                      ret
  21:   cc                      int3
  22:   0f 1f 00                nopl   (%rax)
  25:   0f 92 c0                setb   %al
  28:   c3                      ret
  29:   cc                      int3

and this explodes like this:

  int3: 0000 [#1] PREEMPT SMP PTI
  CPU: 0 PID: 2435 Comm: qemu-system-x86 Not tainted 5.17.0-rc8-sls #1
  Hardware name: Dell Inc. Precision WorkStation T3400  /0TP412, BIOS A14 04/30/2012
  RIP: 0010:setc+0x5/0x8 [kvm]
  Code: 00 00 0f 1f 00 0f b6 05 43 24 06 00 c3 cc 0f 1f 80 00 00 00 00 0f 90 c0 c3 cc 0f \
	  1f 00 0f 91 c0 c3 cc 0f 1f 00 0f 92 c0 c3 cc <0f> 1f 00 0f 93 c0 c3 cc 0f 1f 00 \
	  0f 94 c0 c3 cc 0f 1f 00 0f 95 c0
  Call Trace:
   <TASK>
   ? x86_emulate_insn [kvm]
   ? x86_emulate_instruction [kvm]
   ? vmx_handle_exit [kvm_intel]
   ? kvm_arch_vcpu_ioctl_run [kvm]
   ? kvm_vcpu_ioctl [kvm]
   ? __x64_sys_ioctl
   ? do_syscall_64
   ? entry_SYSCALL_64_after_hwframe
   </TASK>

Raise the alignment value when SLS is enabled and use a macro for that
instead of hard-coding naked numbers.

Fixes: e463a09 ("x86: Add straight-line-speculation mitigation")
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Jamie Heilman <jamie@audible.transient.net>
Link: https://lore.kernel.org/r/YjGzJwjrvxg5YZ0Z@audible.transient.net
[Add a comment and a bit of safety checking, since this is going to be changed
 again for IBT support. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
donald pushed a commit that referenced this issue Mar 22, 2022
Due to a historical oversight, we emit a redundant static branch for
each atomic/atomic64 operation when CONFIG_ARM64_LSE_ATOMICS is
selected. We can safely remove this, making the kernel Image reasonably
smaller.

When CONFIG_ARM64_LSE_ATOMICS is selected, every LSE atomic operation
has two preceding static branches with the same target, e.g.

	b	f7c <kernel_init_freeable+0xa4>
	b	f7c <kernel_init_freeable+0xa4>
	mov	w0, #0x1                   	// #1
	ldadd	w0, w0, [x19]

This is because the __lse_ll_sc_body() wrapper uses
system_uses_lse_atomics(), which checks both `arm64_const_caps_ready`
and `cpu_hwcap_keys[ARM64_HAS_LSE_ATOMICS]`, each of which emits a
static branch. This has been the case since commit:

  addfc38 ("arm64: atomics: avoid out-of-line ll/sc atomics")

However, there was never a need to check `arm64_const_caps_ready`, which
was itself introduced in commit:

  63a1e1c ("arm64/cpufeature: don't use mutex in bringup path")

... so that cpus_have_const_cap() could fall back to checking the
`cpu_hwcaps` bitmap prior to the static keys for individual caps
becoming enabled. As system_uses_lse_atomics() doesn't check
`cpu_hwcaps`, and doesn't need to as we can safely use the LL/SC atomics
prior to enabling the `ARM64_HAS_LSE_ATOMICS` static key, it doesn't
need to check `arm64_const_caps_ready`.

This patch removes the `arm64_const_caps_ready` check from
system_uses_lse_atomics(). As the arch_atomic_* routines are meant to be
safely usable in noinstr code, I've also marked
system_uses_lse_atomics() as __always_inline.

This results in one fewer static branch per atomic operation, with the
prior example becoming:

	b	f78 <kernel_init_freeable+0xa0>
	mov	w0, #0x1                   	// #1
	ldadd	w0, w0, [x19]

Each static branch consists of the branch itself and an associated
__jump_table entry. Removing these has a reasonable impact on the Image
size, with a GCC 11.1.0 defconfig v5.17-rc2 Image being reduced by
128KiB:

| [mark@lakrids:~/src/linux]% ls -al Image*
| -rw-r--r-- 1 mark mark 34619904 Feb  3 18:24 Image.baseline
| -rw-r--r-- 1 mark mark 34488832 Feb  3 18:33 Image.onebranch

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220204104439.270567-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
donald pushed a commit that referenced this issue Mar 22, 2022
On systems without "Noto Sans CJK" fonts, CJK chapters in
translations.pdf are full of "TOFU" boxes, with a long build time and
a large log file containing lots of missing-font warnings.

Avoid such waste of time and resources by skipping CJK chapters when
CJK fonts are not available.

To skip whole chapters, change the definition of
\kerneldocBegin{SC|TC|KR|JP} commands so that they can have an argument
to be ignored.
This works as far as the argument (#1) is not used in the command.
In place of skipped contents, put a note on skipped contents at the
beginning of the PDF.

Change the call sites in index.rst of CJK translations accordingly.

When CJK fonts are available, existing command definitions with
no argument just work.  LaTeX engine will see additional pairs of
"{" and "}", which add a level of grouping without having any effect
on typesetting.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/3359ca41-b81d-b2c7-e437-7618efbe241d@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
donald pushed a commit that referenced this issue Mar 22, 2022
Put NVMe/TCP sockets in their own class to avoid some lockdep warnings.
Sockets created by nvme-tcp are not exposed to user-space, and will not
trigger certain code paths that the general socket API exposes.

Lockdep complains about a circular dependency between the socket and
filesystem locks, because setsockopt can trigger a page fault with a
socket lock held, but nvme-tcp sends requests on the socket while file
system locks are held.

  ======================================================
  WARNING: possible circular locking dependency detected
  5.15.0-rc3 #1 Not tainted
  ------------------------------------------------------
  fio/1496 is trying to acquire lock:
  (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendpage+0x23/0x80

  but task is already holding lock:
  (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xcf/0x290 [xfs]

  which lock already depends on the new lock.

  other info that might help us debug this:

  chain exists of:
   sk_lock-AF_INET --> sb_internal --> &xfs_dir_ilock_class/5

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&xfs_dir_ilock_class/5);
                                lock(sb_internal);
                                lock(&xfs_dir_ilock_class/5);
   lock(sk_lock-AF_INET);

  *** DEADLOCK ***

  6 locks held by fio/1496:
   #0: (sb_writers#13){.+.+}-{0:0}, at: path_openat+0x9fc/0xa20
   #1: (&inode->i_sb->s_type->i_mutex_dir_key){++++}-{3:3}, at: path_openat+0x296/0xa20
   #2: (sb_internal){.+.+}-{0:0}, at: xfs_trans_alloc_icreate+0x41/0xd0 [xfs]
   #3: (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xcf/0x290 [xfs]
   #4: (hctx->srcu){....}-{0:0}, at: hctx_lock+0x51/0xd0
   #5: (&queue->send_mutex){+.+.}-{3:3}, at: nvme_tcp_queue_rq+0x33e/0x380 [nvme_tcp]

This annotation lets lockdep analyze nvme-tcp controlled sockets
independently of what the user-space sockets API does.

Link: https://lore.kernel.org/linux-nvme/CAHj4cs9MDYLJ+q+2_GXUK9HxFizv2pxUryUR0toX974M040z7g@mail.gmail.com/

Signed-off-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
donald pushed a commit that referenced this issue Mar 23, 2022
This patch fixes xfstests/generic/475 failure.

[  293.680694] F2FS-fs (dm-1): May loss orphan inode, run fsck to fix.
[  293.685358] Buffer I/O error on dev dm-1, logical block 8388592, async page read
[  293.691527] Buffer I/O error on dev dm-1, logical block 8388592, async page read
[  293.691764] sh (7615): drop_caches: 3
[  293.691819] sh (7616): drop_caches: 3
[  293.694017] Buffer I/O error on dev dm-1, logical block 1, async page read
[  293.695659] sh (7618): drop_caches: 3
[  293.696979] sh (7617): drop_caches: 3
[  293.700290] sh (7623): drop_caches: 3
[  293.708621] sh (7626): drop_caches: 3
[  293.711386] sh (7628): drop_caches: 3
[  293.711825] sh (7627): drop_caches: 3
[  293.716738] sh (7630): drop_caches: 3
[  293.719613] sh (7632): drop_caches: 3
[  293.720971] sh (7633): drop_caches: 3
[  293.727741] sh (7634): drop_caches: 3
[  293.730783] sh (7636): drop_caches: 3
[  293.732681] sh (7635): drop_caches: 3
[  293.732988] sh (7637): drop_caches: 3
[  293.738836] sh (7639): drop_caches: 3
[  293.740568] sh (7641): drop_caches: 3
[  293.743053] sh (7640): drop_caches: 3
[  293.821889] ------------[ cut here ]------------
[  293.824654] kernel BUG at fs/f2fs/node.c:3334!
[  293.826226] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[  293.828713] CPU: 0 PID: 7653 Comm: umount Tainted: G           OE     5.17.0-rc1-custom #1
[  293.830946] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[  293.832526] RIP: 0010:f2fs_destroy_node_manager+0x33f/0x350 [f2fs]
[  293.833905] Code: e8 d6 3d f9 f9 48 8b 45 d0 65 48 2b 04 25 28 00 00 00 75 1a 48 81 c4 28 03 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b
[  293.837783] RSP: 0018:ffffb04ec31e7a20 EFLAGS: 00010202
[  293.839062] RAX: 0000000000000001 RBX: ffff9df947db2eb8 RCX: 0000000080aa0072
[  293.840666] RDX: 0000000000000000 RSI: ffffe86c0432a140 RDI: ffffffffc0b72a21
[  293.842261] RBP: ffffb04ec31e7d70 R08: ffff9df94ca85780 R09: 0000000080aa0072
[  293.843909] R10: ffff9df94ca85700 R11: ffff9df94e1ccf58 R12: ffff9df947db2e00
[  293.845594] R13: ffff9df947db2ed0 R14: ffff9df947db2eb8 R15: ffff9df947db2eb8
[  293.847855] FS:  00007f5a97379800(0000) GS:ffff9dfa77c00000(0000) knlGS:0000000000000000
[  293.850647] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  293.852940] CR2: 00007f5a97528730 CR3: 000000010bc76005 CR4: 0000000000370ef0
[  293.854680] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  293.856423] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  293.858380] Call Trace:
[  293.859302]  <TASK>
[  293.860311]  ? ttwu_do_wakeup+0x1c/0x170
[  293.861800]  ? ttwu_do_activate+0x6d/0xb0
[  293.863057]  ? _raw_spin_unlock_irqrestore+0x29/0x40
[  293.864411]  ? try_to_wake_up+0x9d/0x5e0
[  293.865618]  ? debug_smp_processor_id+0x17/0x20
[  293.866934]  ? debug_smp_processor_id+0x17/0x20
[  293.868223]  ? free_unref_page+0xbf/0x120
[  293.869470]  ? __free_slab+0xcb/0x1c0
[  293.870614]  ? preempt_count_add+0x7a/0xc0
[  293.871811]  ? __slab_free+0xa0/0x2d0
[  293.872918]  ? __wake_up_common_lock+0x8a/0xc0
[  293.874186]  ? __slab_free+0xa0/0x2d0
[  293.875305]  ? free_inode_nonrcu+0x20/0x20
[  293.876466]  ? free_inode_nonrcu+0x20/0x20
[  293.877650]  ? debug_smp_processor_id+0x17/0x20
[  293.878949]  ? call_rcu+0x11a/0x240
[  293.880060]  ? f2fs_destroy_stats+0x59/0x60 [f2fs]
[  293.881437]  ? kfree+0x1fe/0x230
[  293.882674]  f2fs_put_super+0x160/0x390 [f2fs]
[  293.883978]  generic_shutdown_super+0x7a/0x120
[  293.885274]  kill_block_super+0x27/0x50
[  293.886496]  kill_f2fs_super+0x7f/0x100 [f2fs]
[  293.887806]  deactivate_locked_super+0x35/0xa0
[  293.889271]  deactivate_super+0x40/0x50
[  293.890513]  cleanup_mnt+0x139/0x190
[  293.891689]  __cleanup_mnt+0x12/0x20
[  293.892850]  task_work_run+0x64/0xa0
[  293.894035]  exit_to_user_mode_prepare+0x1b7/0x1c0
[  293.895409]  syscall_exit_to_user_mode+0x27/0x50
[  293.896872]  do_syscall_64+0x48/0xc0
[  293.898090]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  293.899517] RIP: 0033:0x7f5a975cd25b

Fixes: 7735730 ("f2fs: fix to propagate error from __get_meta_page()")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
donald pushed a commit that referenced this issue Mar 23, 2022
cnt should be passed to sb_has_quota_active() instead of type to check
active quota properly.

Moreover, when the type is -1, the compiler with enough inline knowledge
can discard sb_has_quota_active() check altogether, causing a NULL pointer
dereference at the following inode_lock(dqopt->files[cnt]):

[    2.796010] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0
[    2.796024] Mem abort info:
[    2.796025]   ESR = 0x96000005
[    2.796028]   EC = 0x25: DABT (current EL), IL = 32 bits
[    2.796029]   SET = 0, FnV = 0
[    2.796031]   EA = 0, S1PTW = 0
[    2.796032] Data abort info:
[    2.796034]   ISV = 0, ISS = 0x00000005
[    2.796035]   CM = 0, WnR = 0
[    2.796046] user pgtable: 4k pages, 39-bit VAs, pgdp=00000003370d1000
[    2.796048] [00000000000000a0] pgd=0000000000000000, pud=0000000000000000
[    2.796051] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[    2.796056] CPU: 7 PID: 640 Comm: f2fs_ckpt-259:7 Tainted: G S                5.4.179-arter97-r8-64666-g2f16e087f9d8 #1
[    2.796057] Hardware name: Qualcomm Technologies, Inc. Lahaina MTP lemonadep (DT)
[    2.796059] pstate: 80c00005 (Nzcv daif +PAN +UAO)
[    2.796065] pc : down_write+0x28/0x70
[    2.796070] lr : f2fs_quota_sync+0x100/0x294
[    2.796071] sp : ffffffa3f48ffc30
[    2.796073] x29: ffffffa3f48ffc30 x28: 0000000000000000
[    2.796075] x27: ffffffa3f6d718b8 x26: ffffffa415fe9d80
[    2.796077] x25: ffffffa3f7290048 x24: 0000000000000001
[    2.796078] x23: 0000000000000000 x22: ffffffa3f7290000
[    2.796080] x21: ffffffa3f72904a0 x20: ffffffa3f7290110
[    2.796081] x19: ffffffa3f77a9800 x18: ffffffc020aae038
[    2.796083] x17: ffffffa40e38e040 x16: ffffffa40e38e6d0
[    2.796085] x15: ffffffa40e38e6cc x14: ffffffa40e38e6d0
[    2.796086] x13: 00000000000004f6 x12: 00162c44ff493000
[    2.796088] x11: 0000000000000400 x10: ffffffa40e38c948
[    2.796090] x9 : 0000000000000000 x8 : 00000000000000a0
[    2.796091] x7 : 0000000000000000 x6 : 0000d1060f00002a
[    2.796093] x5 : ffffffa3f48ff718 x4 : 000000000000000d
[    2.796094] x3 : 00000000060c0000 x2 : 0000000000000001
[    2.796096] x1 : 0000000000000000 x0 : 00000000000000a0
[    2.796098] Call trace:
[    2.796100]  down_write+0x28/0x70
[    2.796102]  f2fs_quota_sync+0x100/0x294
[    2.796104]  block_operations+0x120/0x204
[    2.796106]  f2fs_write_checkpoint+0x11c/0x520
[    2.796107]  __checkpoint_and_complete_reqs+0x7c/0xd34
[    2.796109]  issue_checkpoint_thread+0x6c/0xb8
[    2.796112]  kthread+0x138/0x414
[    2.796114]  ret_from_fork+0x10/0x18
[    2.796117] Code: aa0803e0 aa1f03e1 52800022 aa0103e9 (c8e97d02)
[    2.796120] ---[ end trace 96e942e8eb6a0b53 ]---
[    2.800116] Kernel panic - not syncing: Fatal exception
[    2.800120] SMP: stopping secondary CPUs

Fixes: 9de71ed ("f2fs: quota: fix potential deadlock")
Cc: <stable@vger.kernel.org> # v5.15+
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
donald pushed a commit that referenced this issue Aug 16, 2025
commit 151c0aa upstream.

1. In func configfs_composite_bind() -> composite_os_desc_req_prepare():
if kmalloc fails, the pointer cdev->os_desc_req will be freed but not
set to NULL. Then it will return a failure to the upper-level function.
2. in func configfs_composite_bind() -> composite_dev_cleanup():
it will checks whether cdev->os_desc_req is NULL. If it is not NULL, it
will attempt to use it.This will lead to a use-after-free issue.

BUG: KASAN: use-after-free in composite_dev_cleanup+0xf4/0x2c0
Read of size 8 at addr 0000004827837a00 by task init/1

CPU: 10 PID: 1 Comm: init Tainted: G           O      5.10.97-oh #1
 kasan_report+0x188/0x1cc
 __asan_load8+0xb4/0xbc
 composite_dev_cleanup+0xf4/0x2c0
 configfs_composite_bind+0x210/0x7ac
 udc_bind_to_driver+0xb4/0x1ec
 usb_gadget_probe_driver+0xec/0x21c
 gadget_dev_desc_UDC_store+0x264/0x27c

Fixes: 37a3a53 ("usb: gadget: OS Feature Descriptors support")
Cc: stable <stable@kernel.org>
Signed-off-by: Tao Xue <xuetao09@huawei.com>
Link: https://lore.kernel.org/r/20250721093908.14967-1-xuetao09@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Aug 16, 2025
…upported 6 GHz band

[ Upstream commit 7e04f01 ]

With a quite rare chance, RX report might be problematic to make SW think
a packet is received on 6 GHz band even if the chip does not support 6 GHz
band actually. Since SW won't initialize stuffs for unsupported bands, NULL
dereference will happen then in the sequence, rtw89_vif_rx_stats_iter() ->
rtw89_core_cancel_6ghz_probe_tx(). So, add a check to avoid it.

The following is a crash log for this case.

 BUG: kernel NULL pointer dereference, address: 0000000000000032
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP NOPTI
 CPU: 1 PID: 1907 Comm: irq/131-rtw89_p Tainted: G     U             6.6.56-05896-g89f5fb0eb30b #1 (HASH:1400 4)
 Hardware name: Google Telith/Telith, BIOS Google_Telith.15217.747.0 11/12/2024
 RIP: 0010:rtw89_vif_rx_stats_iter+0xd2/0x310 [rtw89_core]
 Code: 4c 89 7d c8 48 89 55 c0 49 8d 44 24 02 48 89 45 b8 45 31 ff eb 11
 41 c6 45 3a 01 41 b7 01 4d 8b 6d 00 4d 39 f5 74 42 8b 43 10 <41> 33 45
 32 0f b7 4b 14 66 41 33 4d 36 0f b7 c9 09 c1 74 d8 4d 85
 RSP: 0018:ffff9f3080138ca0 EFLAGS: 00010246
 RAX: 00000000b8bf5770 RBX: ffff91b5e8c639c0 RCX: 0000000000000011
 RDX: ffff91b582de1be8 RSI: 0000000000000000 RDI: ffff91b5e8c639e6
 RBP: ffff9f3080138d00 R08: 0000000000000000 R09: 0000000000000000
 R10: ffff91b59de70000 R11: ffffffffc069be50 R12: ffff91b5e8c639e4
 R13: 0000000000000000 R14: ffff91b5828020b8 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff91b8efa40000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000032 CR3: 00000002bf838000 CR4: 0000000000750ee0
 PKRU: 55555554
 Call Trace:
  <IRQ>
  ? __die_body+0x68/0xb0
  ? page_fault_oops+0x379/0x3e0
  ? exc_page_fault+0x4f/0xa0
  ? asm_exc_page_fault+0x22/0x30
  ? __pfx_rtw89_vif_rx_stats_iter+0x10/0x10 [rtw89_core (HASH:1400 5)]
  ? rtw89_vif_rx_stats_iter+0xd2/0x310 [rtw89_core (HASH:1400 5)]
  __iterate_interfaces+0x59/0x110 [mac80211 (HASH:1400 6)]
  ? __pfx_rtw89_vif_rx_stats_iter+0x10/0x10 [rtw89_core (HASH:1400 5)]
  ? __pfx_rtw89_vif_rx_stats_iter+0x10/0x10 [rtw89_core (HASH:1400 5)]
  ieee80211_iterate_active_interfaces_atomic+0x36/0x50 [mac80211 (HASH:1400 6)]
  rtw89_core_rx_to_mac80211+0xfd/0x1b0 [rtw89_core (HASH:1400 5)]
  rtw89_core_rx+0x43a/0x980 [rtw89_core (HASH:1400 5)]

Fixes: c6aa9a9 ("wifi: rtw89: add RNR support for 6 GHz scan")
Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250618124649.11436-5-pkshih@realtek.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 16, 2025
[ Upstream commit 16d8fd7 ]

In rtl8187_stop() move the call of usb_kill_anchored_urbs() before clearing
b_tx_status.queue. This change prevents callbacks from using already freed
skb due to anchor was not killed before freeing such skb.

 BUG: kernel NULL pointer dereference, address: 0000000000000080
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 7 UID: 0 PID: 0 Comm: swapper/7 Not tainted 6.15.0 #8 PREEMPT(voluntary)
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
 RIP: 0010:ieee80211_tx_status_irqsafe+0x21/0xc0 [mac80211]
 Call Trace:
  <IRQ>
  rtl8187_tx_cb+0x116/0x150 [rtl8187]
  __usb_hcd_giveback_urb+0x9d/0x120
  usb_giveback_urb_bh+0xbb/0x140
  process_one_work+0x19b/0x3c0
  bh_worker+0x1a7/0x210
  tasklet_action+0x10/0x30
  handle_softirqs+0xf0/0x340
  __irq_exit_rcu+0xcd/0xf0
  common_interrupt+0x85/0xa0
  </IRQ>

Tested on RTL8187BvE device.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: c1db52b ("rtl8187: Use usb anchor facilities to manage urbs")
Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Link: https://patch.msgid.link/20250617135634.21760-1-d.dulov@aladdin.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 16, 2025
[ Upstream commit a5b46aa ]

In a number of cases we see kernel panics on resume due
to ath11k kernel page fault, which happens under the
following circumstances:

1) First ath11k_hal_dump_srng_stats() call

 Last interrupt received for each group:
 ath11k_pci 0000:01:00.0: group_id 0 22511ms before
 ath11k_pci 0000:01:00.0: group_id 1 14440788ms before
 [..]
 ath11k_pci 0000:01:00.0: failed to receive control response completion, polling..
 ath11k_pci 0000:01:00.0: Service connect timeout
 ath11k_pci 0000:01:00.0: failed to connect to HTT: -110
 ath11k_pci 0000:01:00.0: failed to start core: -110
 ath11k_pci 0000:01:00.0: firmware crashed: MHI_CB_EE_RDDM
 ath11k_pci 0000:01:00.0: already resetting count 2
 ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110
 ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110
 ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery
 [..]

2) At this point reconfiguration fails (we have 2 resets) and
  ath11k_core_reconfigure_on_crash() calls ath11k_hal_srng_deinit()
  which destroys srng lists.  However, it does not reset per-list
  ->initialized flag.

3) Second ath11k_hal_dump_srng_stats() call sees stale ->initialized
  flag and attempts to dump srng stats:

 Last interrupt received for each group:
 ath11k_pci 0000:01:00.0: group_id 0 66785ms before
 ath11k_pci 0000:01:00.0: group_id 1 14485062ms before
 ath11k_pci 0000:01:00.0: group_id 2 14485062ms before
 ath11k_pci 0000:01:00.0: group_id 3 14485062ms before
 ath11k_pci 0000:01:00.0: group_id 4 14780845ms before
 ath11k_pci 0000:01:00.0: group_id 5 14780845ms before
 ath11k_pci 0000:01:00.0: group_id 6 14485062ms before
 ath11k_pci 0000:01:00.0: group_id 7 66814ms before
 ath11k_pci 0000:01:00.0: group_id 8 68997ms before
 ath11k_pci 0000:01:00.0: group_id 9 67588ms before
 ath11k_pci 0000:01:00.0: group_id 10 69511ms before
 BUG: unable to handle page fault for address: ffffa007404eb010
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 100000067 P4D 100000067 PUD 10022d067 PMD 100b01067 PTE 0
 Oops: 0000 [#1] PREEMPT SMP NOPTI
 RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k]
 Call Trace:
 <TASK>
 ? __die_body+0xae/0xb0
 ? page_fault_oops+0x381/0x3e0
 ? exc_page_fault+0x69/0xa0
 ? asm_exc_page_fault+0x22/0x30
 ? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:6cea 4)]
 ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:6cea 4)]
 worker_thread+0x389/0x930
 kthread+0x149/0x170

Clear per-list ->initialized flag in ath11k_hal_srng_deinit().

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Baochen Qiang <quic_bqiang@quicinc.com>
Fixes: 5118935 ("ath11k: dump SRNG stats during FW assert")
Link: https://patch.msgid.link/20250612084551.702803-1-senozhatsky@chromium.org
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 16, 2025
[ Upstream commit 17ce3e5 ]

syzbot reported that the netfilter bpf prog can be called without
migration disabled in xmit path.

Then the assertion in __bpf_prog_run() fails, triggering the splat
below. [0]

Let's use bpf_prog_run_pin_on_cpu() in nf_hook_run_bpf().

[0]:
BUG: assuming non migratable context at ./include/linux/filter.h:703
in_atomic(): 0, irqs_disabled(): 0, migration_disabled() 0 pid: 5829, name: sshd-session
3 locks held by sshd-session/5829:
 #0: ffff88807b4e4218 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1667 [inline]
 #0: ffff88807b4e4218 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x20/0x50 net/ipv4/tcp.c:1395
 #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
 #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline]
 #1: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: __ip_queue_xmit+0x69/0x26c0 net/ipv4/ip_output.c:470
 #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
 #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:841 [inline]
 #2: ffffffff8e5c4e00 (rcu_read_lock){....}-{1:3}, at: nf_hook+0xb2/0x680 include/linux/netfilter.h:241
CPU: 0 UID: 0 PID: 5829 Comm: sshd-session Not tainted 6.16.0-rc6-syzkaller-00002-g155a3c003e55 #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x16c/0x1f0 lib/dump_stack.c:120
 __cant_migrate kernel/sched/core.c:8860 [inline]
 __cant_migrate+0x1c7/0x250 kernel/sched/core.c:8834
 __bpf_prog_run include/linux/filter.h:703 [inline]
 bpf_prog_run include/linux/filter.h:725 [inline]
 nf_hook_run_bpf+0x83/0x1e0 net/netfilter/nf_bpf_link.c:20
 nf_hook_entry_hookfn include/linux/netfilter.h:157 [inline]
 nf_hook_slow+0xbb/0x200 net/netfilter/core.c:623
 nf_hook+0x370/0x680 include/linux/netfilter.h:272
 NF_HOOK_COND include/linux/netfilter.h:305 [inline]
 ip_output+0x1bc/0x2a0 net/ipv4/ip_output.c:433
 dst_output include/net/dst.h:459 [inline]
 ip_local_out net/ipv4/ip_output.c:129 [inline]
 __ip_queue_xmit+0x1d7d/0x26c0 net/ipv4/ip_output.c:527
 __tcp_transmit_skb+0x2686/0x3e90 net/ipv4/tcp_output.c:1479
 tcp_transmit_skb net/ipv4/tcp_output.c:1497 [inline]
 tcp_write_xmit+0x1274/0x84e0 net/ipv4/tcp_output.c:2838
 __tcp_push_pending_frames+0xaf/0x390 net/ipv4/tcp_output.c:3021
 tcp_push+0x225/0x700 net/ipv4/tcp.c:759
 tcp_sendmsg_locked+0x1870/0x42b0 net/ipv4/tcp.c:1359
 tcp_sendmsg+0x2e/0x50 net/ipv4/tcp.c:1396
 inet_sendmsg+0xb9/0x140 net/ipv4/af_inet.c:851
 sock_sendmsg_nosec net/socket.c:712 [inline]
 __sock_sendmsg net/socket.c:727 [inline]
 sock_write_iter+0x4aa/0x5b0 net/socket.c:1131
 new_sync_write fs/read_write.c:593 [inline]
 vfs_write+0x6c7/0x1150 fs/read_write.c:686
 ksys_write+0x1f8/0x250 fs/read_write.c:738
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe7d365d407
Code: 48 89 fa 4c 89 df e8 38 aa 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
RSP:

Fixes: fd9c663 ("bpf: minimal support for programs hooked into netfilter framework")
Reported-by: syzbot+40f772d37250b6d10efc@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/6879466d.a00a0220.3af5df.0022.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Tested-by: syzbot+40f772d37250b6d10efc@syzkaller.appspotmail.com
Acked-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250722224041.112292-1-kuniyu@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 16, 2025
[ Upstream commit 6d19c44 ]

Hardware returns a unique identifier for a decrypted packet's xfrm
state, this state is looked up in an xarray. However, the state might
have been freed by the time of this lookup.

Currently, if the state is not found, only a counter is incremented.
The secpath (sp) extension on the skb is not removed, resulting in
sp->len becoming 0.

Subsequently, functions like __xfrm_policy_check() attempt to access
fields such as xfrm_input_state(skb)->xso.type (which dereferences
sp->xvec[sp->len - 1]) without first validating sp->len. This leads to
a crash when dereferencing an invalid state pointer.

This patch prevents the crash by explicitly removing the secpath
extension from the skb if the xfrm state is not found after hardware
decryption. This ensures downstream functions do not operate on a
zero-length secpath.

 BUG: unable to handle page fault for address: ffffffff000002c8
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 282e067 P4D 282e067 PUD 0
 Oops: Oops: 0000 [#1] SMP
 CPU: 12 UID: 0 PID: 0 Comm: swapper/12 Not tainted 6.15.0-rc7_for_upstream_min_debug_2025_05_27_22_44 #1 NONE
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 RIP: 0010:__xfrm_policy_check+0x61a/0xa30
 Code: b6 77 7f 83 e6 02 74 14 4d 8b af d8 00 00 00 41 0f b6 45 05 c1 e0 03 48 98 49 01 c5 41 8b 45 00 83 e8 01 48 98 49 8b 44 c5 10 <0f> b6 80 c8 02 00 00 83 e0 0c 3c 04 0f 84 0c 02 00 00 31 ff 80 fa
 RSP: 0018:ffff88885fb04918 EFLAGS: 00010297
 RAX: ffffffff00000000 RBX: 0000000000000002 RCX: 0000000000000000
 RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000000
 RBP: ffffffff8311af80 R08: 0000000000000020 R09: 00000000c2eda353
 R10: ffff88812be2bbc8 R11: 000000001faab533 R12: ffff88885fb049c8
 R13: ffff88812be2bbc8 R14: 0000000000000000 R15: ffff88811896ae00
 FS:  0000000000000000(0000) GS:ffff8888dca82000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffff000002c8 CR3: 0000000243050002 CR4: 0000000000372eb0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  <IRQ>
  ? try_to_wake_up+0x108/0x4c0
  ? udp4_lib_lookup2+0xbe/0x150
  ? udp_lib_lport_inuse+0x100/0x100
  ? __udp4_lib_lookup+0x2b0/0x410
  __xfrm_policy_check2.constprop.0+0x11e/0x130
  udp_queue_rcv_one_skb+0x1d/0x530
  udp_unicast_rcv_skb+0x76/0x90
  __udp4_lib_rcv+0xa64/0xe90
  ip_protocol_deliver_rcu+0x20/0x130
  ip_local_deliver_finish+0x75/0xa0
  ip_local_deliver+0xc1/0xd0
  ? ip_protocol_deliver_rcu+0x130/0x130
  ip_sublist_rcv+0x1f9/0x240
  ? ip_rcv_finish_core+0x430/0x430
  ip_list_rcv+0xfc/0x130
  __netif_receive_skb_list_core+0x181/0x1e0
  netif_receive_skb_list_internal+0x200/0x360
  ? mlx5e_build_rx_skb+0x1bc/0xda0 [mlx5_core]
  gro_receive_skb+0xfd/0x210
  mlx5e_handle_rx_cqe_mpwrq+0x141/0x280 [mlx5_core]
  mlx5e_poll_rx_cq+0xcc/0x8e0 [mlx5_core]
  ? mlx5e_handle_rx_dim+0x91/0xd0 [mlx5_core]
  mlx5e_napi_poll+0x114/0xab0 [mlx5_core]
  __napi_poll+0x25/0x170
  net_rx_action+0x32d/0x3a0
  ? mlx5_eq_comp_int+0x8d/0x280 [mlx5_core]
  ? notifier_call_chain+0x33/0xa0
  handle_softirqs+0xda/0x250
  irq_exit_rcu+0x6d/0xc0
  common_interrupt+0x81/0xa0
  </IRQ>

Fixes: b2ac754 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1753256672-337784-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 16, 2025
[ Upstream commit 181698a ]

When CONFIG_CRYPTO_DEV_CCP_DEBUGFS is enabled, rebinding
the ccp device causes the following crash:

$ echo '0000:0a:00.2' > /sys/bus/pci/drivers/ccp/unbind
$ echo '0000:0a:00.2' > /sys/bus/pci/drivers/ccp/bind

[  204.976930] BUG: kernel NULL pointer dereference, address: 0000000000000098
[  204.978026] #PF: supervisor write access in kernel mode
[  204.979126] #PF: error_code(0x0002) - not-present page
[  204.980226] PGD 0 P4D 0
[  204.981317] Oops: Oops: 0002 [#1] SMP NOPTI
...
[  204.997852] Call Trace:
[  204.999074]  <TASK>
[  205.000297]  start_creating+0x9f/0x1c0
[  205.001533]  debugfs_create_dir+0x1f/0x170
[  205.002769]  ? srso_return_thunk+0x5/0x5f
[  205.004000]  ccp5_debugfs_setup+0x87/0x170 [ccp]
[  205.005241]  ccp5_init+0x8b2/0x960 [ccp]
[  205.006469]  ccp_dev_init+0xd4/0x150 [ccp]
[  205.007709]  sp_init+0x5f/0x80 [ccp]
[  205.008942]  sp_pci_probe+0x283/0x2e0 [ccp]
[  205.010165]  ? srso_return_thunk+0x5/0x5f
[  205.011376]  local_pci_probe+0x4f/0xb0
[  205.012584]  pci_device_probe+0xdb/0x230
[  205.013810]  really_probe+0xed/0x380
[  205.015024]  __driver_probe_device+0x7e/0x160
[  205.016240]  device_driver_attach+0x2f/0x60
[  205.017457]  bind_store+0x7c/0xb0
[  205.018663]  drv_attr_store+0x28/0x40
[  205.019868]  sysfs_kf_write+0x5f/0x70
[  205.021065]  kernfs_fop_write_iter+0x145/0x1d0
[  205.022267]  vfs_write+0x308/0x440
[  205.023453]  ksys_write+0x6d/0xe0
[  205.024616]  __x64_sys_write+0x1e/0x30
[  205.025778]  x64_sys_call+0x16ba/0x2150
[  205.026942]  do_syscall_64+0x56/0x1e0
[  205.028108]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  205.029276] RIP: 0033:0x7fbc36f10104
[  205.030420] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 e1 08 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5

This patch sets ccp_debugfs_dir to NULL after destroying it in
ccp5_debugfs_destroy, allowing the directory dentry to be
recreated when rebinding the ccp device.

Tested on AMD Ryzen 7 1700X.

Fixes: 3cdbe34 ("crypto: ccp - Add debugfs entries for CCP information")
Signed-off-by: Mengbiao Xiong <xisme1998@gmail.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 16, 2025
[ Upstream commit d46e51f ]

When sending a packet with virtio_net_hdr to tun device, if the gso_type
in virtio_net_hdr is SKB_GSO_UDP and the gso_size is less than udphdr
size, below crash may happen.

  ------------[ cut here ]------------
  kernel BUG at net/core/skbuff.c:4572!
  Oops: invalid opcode: 0000 [#1] SMP NOPTI
  CPU: 0 UID: 0 PID: 62 Comm: mytest Not tainted 6.16.0-rc7 #203 PREEMPT(voluntary)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  RIP: 0010:skb_pull_rcsum+0x8e/0xa0
  Code: 00 00 5b c3 cc cc cc cc 8b 93 88 00 00 00 f7 da e8 37 44 38 00 f7 d8 89 83 88 00 00 00 48 8b 83 c8 00 00 00 5b c3 cc cc cc cc <0f> 0b 0f 0b 66 66 2e 0f 1f 84 00 000
  RSP: 0018:ffffc900001fba38 EFLAGS: 00000297
  RAX: 0000000000000004 RBX: ffff8880040c1000 RCX: ffffc900001fb948
  RDX: ffff888003e6d700 RSI: 0000000000000008 RDI: ffff88800411a062
  RBP: ffff8880040c1000 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff888003606c00 R11: 0000000000000001 R12: 0000000000000000
  R13: ffff888004060900 R14: ffff888004050000 R15: ffff888004060900
  FS:  000000002406d3c0(0000) GS:ffff888084a19000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000020000040 CR3: 0000000004007000 CR4: 00000000000006f0
  Call Trace:
   <TASK>
   udp_queue_rcv_one_skb+0x176/0x4b0 net/ipv4/udp.c:2445
   udp_queue_rcv_skb+0x155/0x1f0 net/ipv4/udp.c:2475
   udp_unicast_rcv_skb+0x71/0x90 net/ipv4/udp.c:2626
   __udp4_lib_rcv+0x433/0xb00 net/ipv4/udp.c:2690
   ip_protocol_deliver_rcu+0xa6/0x160 net/ipv4/ip_input.c:205
   ip_local_deliver_finish+0x72/0x90 net/ipv4/ip_input.c:233
   ip_sublist_rcv_finish+0x5f/0x70 net/ipv4/ip_input.c:579
   ip_sublist_rcv+0x122/0x1b0 net/ipv4/ip_input.c:636
   ip_list_rcv+0xf7/0x130 net/ipv4/ip_input.c:670
   __netif_receive_skb_list_core+0x21d/0x240 net/core/dev.c:6067
   netif_receive_skb_list_internal+0x186/0x2b0 net/core/dev.c:6210
   napi_complete_done+0x78/0x180 net/core/dev.c:6580
   tun_get_user+0xa63/0x1120 drivers/net/tun.c:1909
   tun_chr_write_iter+0x65/0xb0 drivers/net/tun.c:1984
   vfs_write+0x300/0x420 fs/read_write.c:593
   ksys_write+0x60/0xd0 fs/read_write.c:686
   do_syscall_64+0x50/0x1c0 arch/x86/entry/syscall_64.c:63
   </TASK>

To trigger gso segment in udp_queue_rcv_skb(), we should also set option
UDP_ENCAP_ESPINUDP to enable udp_sk(sk)->encap_rcv. When the encap_rcv
hook return 1 in udp_queue_rcv_one_skb(), udp_csum_pull_header() will try
to pull udphdr, but the skb size has been segmented to gso size, which
leads to this crash.

Previous commit cf329aa ("udp: cope with UDP GRO packet misdirection")
introduces segmentation in UDP receive path only for GRO, which was never
intended to be used for UFO, so drop UFO packets in udp_rcv_segment().

Link: https://lore.kernel.org/netdev/20250724083005.3918375-1-wangliang74@huawei.com/
Link: https://lore.kernel.org/netdev/20250729123907.3318425-1-wangliang74@huawei.com/
Fixes: cf329aa ("udp: cope with UDP GRO packet misdirection")
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250730101458.3470788-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 16, 2025
[ Upstream commit 5a40f8a ]

benet crashes as soon as SRIOV VFs are created:

 kernel BUG at mm/vmalloc.c:3457!
 Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
 CPU: 4 UID: 0 PID: 7408 Comm: test.sh Kdump: loaded Not tainted 6.16.0+ #1 PREEMPT(voluntary)
 [...]
 RIP: 0010:vunmap+0x5f/0x70
 [...]
 Call Trace:
  <TASK>
  __iommu_dma_free+0xe8/0x1c0
  be_cmd_set_mac_list+0x3fe/0x640 [be2net]
  be_cmd_set_mac+0xaf/0x110 [be2net]
  be_vf_eth_addr_config+0x19f/0x330 [be2net]
  be_vf_setup+0x4f7/0x990 [be2net]
  be_pci_sriov_configure+0x3a1/0x470 [be2net]
  sriov_numvfs_store+0x20b/0x380
  kernfs_fop_write_iter+0x354/0x530
  vfs_write+0x9b9/0xf60
  ksys_write+0xf3/0x1d0
  do_syscall_64+0x8c/0x3d0

be_cmd_set_mac_list() calls dma_free_coherent() under a spin_lock_bh.
Fix it by freeing only after the lock has been released.

Fixes: 1a82d19 ("be2net: fix sleeping while atomic bugs in be_ndo_bridge_getlink")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250801101338.72502-1-mschmidt@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 16, 2025
[ Upstream commit ae63338 ]

I accidentally added a bug in pptp_xmit() that syzbot caught for us.

Only call ip_rt_put() if a route has been allocated.

BUG: unable to handle page fault for address: ffffffffffffffdb
PGD df3b067 P4D df3b067 PUD df3d067 PMD 0
Oops: Oops: 0002 [#1] SMP KASAN PTI
CPU: 1 UID: 0 PID: 6346 Comm: syz.0.336 Not tainted 6.16.0-next-20250804-syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:arch_atomic_add_return arch/x86/include/asm/atomic.h:85 [inline]
RIP: 0010:raw_atomic_sub_return_release include/linux/atomic/atomic-arch-fallback.h:846 [inline]
RIP: 0010:atomic_sub_return_release include/linux/atomic/atomic-instrumented.h:327 [inline]
RIP: 0010:__rcuref_put include/linux/rcuref.h:109 [inline]
RIP: 0010:rcuref_put+0x172/0x210 include/linux/rcuref.h:173
Call Trace:
 <TASK>
 dst_release+0x24/0x1b0 net/core/dst.c:167
 ip_rt_put include/net/route.h:285 [inline]
 pptp_xmit+0x14b/0x1a90 drivers/net/ppp/pptp.c:267
 __ppp_channel_push+0xf2/0x1c0 drivers/net/ppp/ppp_generic.c:2166
 ppp_channel_push+0x123/0x660 drivers/net/ppp/ppp_generic.c:2198
 ppp_write+0x2b0/0x400 drivers/net/ppp/ppp_generic.c:544
 vfs_write+0x27b/0xb30 fs/read_write.c:684
 ksys_write+0x145/0x250 fs/read_write.c:738
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: de9c486 ("pptp: ensure minimal skb length in pptp_xmit()")
Reported-by: syzbot+27d7cfbc93457e472e00@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/689095a5.050a0220.1fc43d.0009.GAE@google.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250807142146.2877060-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 16, 2025
commit 151c0aa upstream.

1. In func configfs_composite_bind() -> composite_os_desc_req_prepare():
if kmalloc fails, the pointer cdev->os_desc_req will be freed but not
set to NULL. Then it will return a failure to the upper-level function.
2. in func configfs_composite_bind() -> composite_dev_cleanup():
it will checks whether cdev->os_desc_req is NULL. If it is not NULL, it
will attempt to use it.This will lead to a use-after-free issue.

BUG: KASAN: use-after-free in composite_dev_cleanup+0xf4/0x2c0
Read of size 8 at addr 0000004827837a00 by task init/1

CPU: 10 PID: 1 Comm: init Tainted: G           O      5.10.97-oh #1
 kasan_report+0x188/0x1cc
 __asan_load8+0xb4/0xbc
 composite_dev_cleanup+0xf4/0x2c0
 configfs_composite_bind+0x210/0x7ac
 udc_bind_to_driver+0xb4/0x1ec
 usb_gadget_probe_driver+0xec/0x21c
 gadget_dev_desc_UDC_store+0x264/0x27c

Fixes: 37a3a53 ("usb: gadget: OS Feature Descriptors support")
Cc: stable <stable@kernel.org>
Signed-off-by: Tao Xue <xuetao09@huawei.com>
Link: https://lore.kernel.org/r/20250721093908.14967-1-xuetao09@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Aug 19, 2025
btrfs_subpage_set_writeback() calls folio_start_writeback() the first time
a folio is written back, and it also clears the PAGECACHE_TAG_TOWRITE tag
even if there are still dirty blocks in the folio. This can break ordering
guarantees, such as those required by btrfs_wait_ordered_extents().

That ordering breakage leads to a real failure. For example, running
generic/464 on a zoned setup will hit the following ASSERT. This happens
because the broken ordering fails to flush existing dirty pages before the
file size is truncated.

  assertion failed: !list_empty(&ordered->list) :: 0, in fs/btrfs/zoned.c:1899
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/zoned.c:1899!
  Oops: invalid opcode: 0000 [#1] SMP NOPTI
  CPU: 2 UID: 0 PID: 1906169 Comm: kworker/u130:2 Kdump: loaded Not tainted 6.16.0-rc6-BTRFS-ZNS+ #554 PREEMPT(voluntary)
  Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021
  Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
  RIP: 0010:btrfs_finish_ordered_zoned.cold+0x50/0x52 [btrfs]
  RSP: 0018:ffffc9002efdbd60 EFLAGS: 00010246
  RAX: 000000000000004c RBX: ffff88811923c4e0 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffffff827e38b1 RDI: 00000000ffffffff
  RBP: ffff88810005d000 R08: 00000000ffffdfff R09: ffffffff831051c8
  R10: ffffffff83055220 R11: 0000000000000000 R12: ffff8881c2458c00
  R13: ffff88811923c540 R14: ffff88811923c5e8 R15: ffff8881c1bd9680
  FS:  0000000000000000(0000) GS:ffff88a04acd0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f907c7a918c CR3: 0000000004024000 CR4: 0000000000350ef0
  Call Trace:
   <TASK>
   ? srso_return_thunk+0x5/0x5f
   btrfs_finish_ordered_io+0x4a/0x60 [btrfs]
   btrfs_work_helper+0xf9/0x490 [btrfs]
   process_one_work+0x204/0x590
   ? srso_return_thunk+0x5/0x5f
   worker_thread+0x1d6/0x3d0
   ? __pfx_worker_thread+0x10/0x10
   kthread+0x118/0x230
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x205/0x260
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1a/0x30
   </TASK>

Consider process A calling writepages() with WB_SYNC_NONE. In zoned mode or
for compressed writes, it locks several folios for delalloc and starts
writing them out. Let's call the last locked folio folio X. Suppose the
write range only partially covers folio X, leaving some pages dirty.
Process A calls btrfs_subpage_set_writeback() when building a bio. This
function call clears the TOWRITE tag of folio X, whose size = 8K and
the block size = 4K. It is following state.

   0     4K    8K
   |/////|/////|  (flag: DIRTY, tag: DIRTY)
   <-----> Process A will write this range.

Now suppose process B concurrently calls writepages() with WB_SYNC_ALL. It
calls tag_pages_for_writeback() to tag dirty folios with
PAGECACHE_TAG_TOWRITE. Since folio X is still dirty, it gets tagged. Then,
B collects tagged folios using filemap_get_folios_tag() and must wait for
folio X to be written before returning from writepages().

   0     4K    8K
   |/////|/////|  (flag: DIRTY, tag: DIRTY|TOWRITE)

However, between tagging and collecting, process A may call
btrfs_subpage_set_writeback() and clear folio X's TOWRITE tag.
   0     4K    8K
   |     |/////|  (flag: DIRTY|WRITEBACK, tag: DIRTY)

As a result, process B won't see folio X in its batch, and returns without
waiting for it. This breaks the WB_SYNC_ALL ordering requirement.

Fix this by using btrfs_subpage_set_writeback_keepwrite(), which retains
the TOWRITE tag. We now manually clear the tag only after the folio becomes
clean, via the xas operation.

Fixes: 3470da3 ("btrfs: subpage: introduce helpers for writeback status")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
donald pushed a commit that referenced this issue Aug 20, 2025
An use-after-free issue occurred when __mark_inode_dirty() get the
bdi_writeback that was in the progress of switching.

CPU: 1 PID: 562 Comm: systemd-random- Not tainted 6.6.56-gb4403bd46a8e #1
......
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __mark_inode_dirty+0x124/0x418
lr : __mark_inode_dirty+0x118/0x418
sp : ffffffc08c9dbbc0
........
Call trace:
 __mark_inode_dirty+0x124/0x418
 generic_update_time+0x4c/0x60
 file_modified+0xcc/0xd0
 ext4_buffered_write_iter+0x58/0x124
 ext4_file_write_iter+0x54/0x704
 vfs_write+0x1c0/0x308
 ksys_write+0x74/0x10c
 __arm64_sys_write+0x1c/0x28
 invoke_syscall+0x48/0x114
 el0_svc_common.constprop.0+0xc0/0xe0
 do_el0_svc+0x1c/0x28
 el0_svc+0x40/0xe4
 el0t_64_sync_handler+0x120/0x12c
 el0t_64_sync+0x194/0x198

Root cause is:

systemd-random-seed                         kworker
----------------------------------------------------------------------
___mark_inode_dirty                     inode_switch_wbs_work_fn

  spin_lock(&inode->i_lock);
  inode_attach_wb
  locked_inode_to_wb_and_lock_list
     get inode->i_wb
     spin_unlock(&inode->i_lock);
     spin_lock(&wb->list_lock)
  spin_lock(&inode->i_lock)
  inode_io_list_move_locked
  spin_unlock(&wb->list_lock)
  spin_unlock(&inode->i_lock)
                                    spin_lock(&old_wb->list_lock)
                                      inode_do_switch_wbs
                                        spin_lock(&inode->i_lock)
                                        inode->i_wb = new_wb
                                        spin_unlock(&inode->i_lock)
                                    spin_unlock(&old_wb->list_lock)
                                    wb_put_many(old_wb, nr_switched)
                                      cgwb_release
                                      old wb released
  wb_wakeup_delayed() accesses wb,
  then trigger the use-after-free
  issue

Fix this race condition by holding inode spinlock until
wb_wakeup_delayed() finished.

Signed-off-by: Jiufei Xue <jiufei.xue@samsung.com>
Link: https://lore.kernel.org/20250728100715.3863241-1-jiufei.xue@samsung.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
donald pushed a commit that referenced this issue Aug 21, 2025
[ Upstream commit 736a051 ]

The hfs_find_init() method can trigger the crash
if tree pointer is NULL:

[   45.746290][ T9787] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000008: 0000 [#1] SMP KAI
[   45.747287][ T9787] KASAN: null-ptr-deref in range [0x0000000000000040-0x0000000000000047]
[   45.748716][ T9787] CPU: 2 UID: 0 PID: 9787 Comm: repro Not tainted 6.16.0-rc3 #10 PREEMPT(full)
[   45.750250][ T9787] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   45.751983][ T9787] RIP: 0010:hfs_find_init+0x86/0x230
[   45.752834][ T9787] Code: c1 ea 03 80 3c 02 00 0f 85 9a 01 00 00 4c 8d 6b 40 48 c7 45 18 00 00 00 00 48 b8 00 00 00 00 00 fc
[   45.755574][ T9787] RSP: 0018:ffffc90015157668 EFLAGS: 00010202
[   45.756432][ T9787] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff819a4d09
[   45.757457][ T9787] RDX: 0000000000000008 RSI: ffffffff819acd3a RDI: ffffc900151576e8
[   45.758282][ T9787] RBP: ffffc900151576d0 R08: 0000000000000005 R09: 0000000000000000
[   45.758943][ T9787] R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000004
[   45.759619][ T9787] R13: 0000000000000040 R14: ffff88802c50814a R15: 0000000000000000
[   45.760293][ T9787] FS:  00007ffb72734540(0000) GS:ffff8880cec64000(0000) knlGS:0000000000000000
[   45.761050][ T9787] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.761606][ T9787] CR2: 00007f9bd8225000 CR3: 000000010979a000 CR4: 00000000000006f0
[   45.762286][ T9787] Call Trace:
[   45.762570][ T9787]  <TASK>
[   45.762824][ T9787]  hfs_ext_read_extent+0x190/0x9d0
[   45.763269][ T9787]  ? submit_bio_noacct_nocheck+0x2dd/0xce0
[   45.763766][ T9787]  ? __pfx_hfs_ext_read_extent+0x10/0x10
[   45.764250][ T9787]  hfs_get_block+0x55f/0x830
[   45.764646][ T9787]  block_read_full_folio+0x36d/0x850
[   45.765105][ T9787]  ? __pfx_hfs_get_block+0x10/0x10
[   45.765541][ T9787]  ? const_folio_flags+0x5b/0x100
[   45.765972][ T9787]  ? __pfx_hfs_read_folio+0x10/0x10
[   45.766415][ T9787]  filemap_read_folio+0xbe/0x290
[   45.766840][ T9787]  ? __pfx_filemap_read_folio+0x10/0x10
[   45.767325][ T9787]  ? __filemap_get_folio+0x32b/0xbf0
[   45.767780][ T9787]  do_read_cache_folio+0x263/0x5c0
[   45.768223][ T9787]  ? __pfx_hfs_read_folio+0x10/0x10
[   45.768666][ T9787]  read_cache_page+0x5b/0x160
[   45.769070][ T9787]  hfs_btree_open+0x491/0x1740
[   45.769481][ T9787]  hfs_mdb_get+0x15e2/0x1fb0
[   45.769877][ T9787]  ? __pfx_hfs_mdb_get+0x10/0x10
[   45.770316][ T9787]  ? find_held_lock+0x2b/0x80
[   45.770731][ T9787]  ? lockdep_init_map_type+0x5c/0x280
[   45.771200][ T9787]  ? lockdep_init_map_type+0x5c/0x280
[   45.771674][ T9787]  hfs_fill_super+0x38e/0x720
[   45.772092][ T9787]  ? __pfx_hfs_fill_super+0x10/0x10
[   45.772549][ T9787]  ? snprintf+0xbe/0x100
[   45.772931][ T9787]  ? __pfx_snprintf+0x10/0x10
[   45.773350][ T9787]  ? do_raw_spin_lock+0x129/0x2b0
[   45.773796][ T9787]  ? find_held_lock+0x2b/0x80
[   45.774215][ T9787]  ? set_blocksize+0x40a/0x510
[   45.774636][ T9787]  ? sb_set_blocksize+0x176/0x1d0
[   45.775087][ T9787]  ? setup_bdev_super+0x369/0x730
[   45.775533][ T9787]  get_tree_bdev_flags+0x384/0x620
[   45.775985][ T9787]  ? __pfx_hfs_fill_super+0x10/0x10
[   45.776453][ T9787]  ? __pfx_get_tree_bdev_flags+0x10/0x10
[   45.776950][ T9787]  ? bpf_lsm_capable+0x9/0x10
[   45.777365][ T9787]  ? security_capable+0x80/0x260
[   45.777803][ T9787]  vfs_get_tree+0x8e/0x340
[   45.778203][ T9787]  path_mount+0x13de/0x2010
[   45.778604][ T9787]  ? kmem_cache_free+0x2b0/0x4c0
[   45.779052][ T9787]  ? __pfx_path_mount+0x10/0x10
[   45.779480][ T9787]  ? getname_flags.part.0+0x1c5/0x550
[   45.779954][ T9787]  ? putname+0x154/0x1a0
[   45.780335][ T9787]  __x64_sys_mount+0x27b/0x300
[   45.780758][ T9787]  ? __pfx___x64_sys_mount+0x10/0x10
[   45.781232][ T9787]  do_syscall_64+0xc9/0x480
[   45.781631][ T9787]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   45.782149][ T9787] RIP: 0033:0x7ffb7265b6ca
[   45.782539][ T9787] Code: 48 8b 0d c9 17 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48
[   45.784212][ T9787] RSP: 002b:00007ffc0c10cfb8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[   45.784935][ T9787] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffb7265b6ca
[   45.785626][ T9787] RDX: 0000200000000240 RSI: 0000200000000280 RDI: 00007ffc0c10d100
[   45.786316][ T9787] RBP: 00007ffc0c10d190 R08: 00007ffc0c10d000 R09: 0000000000000000
[   45.787011][ T9787] R10: 0000000000000048 R11: 0000000000000206 R12: 0000560246733250
[   45.787697][ T9787] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   45.788393][ T9787]  </TASK>
[   45.788665][ T9787] Modules linked in:
[   45.789058][ T9787] ---[ end trace 0000000000000000 ]---
[   45.789554][ T9787] RIP: 0010:hfs_find_init+0x86/0x230
[   45.790028][ T9787] Code: c1 ea 03 80 3c 02 00 0f 85 9a 01 00 00 4c 8d 6b 40 48 c7 45 18 00 00 00 00 48 b8 00 00 00 00 00 fc
[   45.792364][ T9787] RSP: 0018:ffffc90015157668 EFLAGS: 00010202
[   45.793155][ T9787] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff819a4d09
[   45.794123][ T9787] RDX: 0000000000000008 RSI: ffffffff819acd3a RDI: ffffc900151576e8
[   45.795105][ T9787] RBP: ffffc900151576d0 R08: 0000000000000005 R09: 0000000000000000
[   45.796135][ T9787] R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000004
[   45.797114][ T9787] R13: 0000000000000040 R14: ffff88802c50814a R15: 0000000000000000
[   45.798024][ T9787] FS:  00007ffb72734540(0000) GS:ffff8880cec64000(0000) knlGS:0000000000000000
[   45.799019][ T9787] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.799822][ T9787] CR2: 00007f9bd8225000 CR3: 000000010979a000 CR4: 00000000000006f0
[   45.800747][ T9787] Kernel panic - not syncing: Fatal exception

The hfs_fill_super() calls hfs_mdb_get() method that tries
to construct Extents Tree and Catalog Tree:

HFS_SB(sb)->ext_tree = hfs_btree_open(sb, HFS_EXT_CNID, hfs_ext_keycmp);
if (!HFS_SB(sb)->ext_tree) {
	pr_err("unable to open extent tree\n");
	goto out;
}
HFS_SB(sb)->cat_tree = hfs_btree_open(sb, HFS_CAT_CNID, hfs_cat_keycmp);
if (!HFS_SB(sb)->cat_tree) {
	pr_err("unable to open catalog tree\n");
	goto out;
}

However, hfs_btree_open() calls read_mapping_page() that
calls hfs_get_block(). And this method calls hfs_ext_read_extent():

static int hfs_ext_read_extent(struct inode *inode, u16 block)
{
	struct hfs_find_data fd;
	int res;

	if (block >= HFS_I(inode)->cached_start &&
	    block < HFS_I(inode)->cached_start + HFS_I(inode)->cached_blocks)
		return 0;

	res = hfs_find_init(HFS_SB(inode->i_sb)->ext_tree, &fd);
	if (!res) {
		res = __hfs_ext_cache_extent(&fd, inode, block);
		hfs_find_exit(&fd);
	}
	return res;
}

The problem here that hfs_find_init() is trying to use
HFS_SB(inode->i_sb)->ext_tree that is not initialized yet.
It will be initailized when hfs_btree_open() finishes
the execution.

The patch adds checking of tree pointer in hfs_find_init()
and it reworks the logic of hfs_btree_open() by reading
the b-tree's header directly from the volume. The read_mapping_page()
is exchanged on filemap_grab_folio() that grab the folio from
mapping. Then, sb_bread() extracts the b-tree's header
content and copy it into the folio.

Reported-by: Wenzhi Wang <wenzhi.wang@uwaterloo.ca>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250710213657.108285-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 21, 2025
[ Upstream commit 9445878 ]

The hfsplus_readdir() method is capable to crash by calling
hfsplus_uni2asc():

[  667.121659][ T9805] ==================================================================
[  667.122651][ T9805] BUG: KASAN: slab-out-of-bounds in hfsplus_uni2asc+0x902/0xa10
[  667.123627][ T9805] Read of size 2 at addr ffff88802592f40c by task repro/9805
[  667.124578][ T9805]
[  667.124876][ T9805] CPU: 3 UID: 0 PID: 9805 Comm: repro Not tainted 6.16.0-rc3 #1 PREEMPT(full)
[  667.124886][ T9805] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  667.124890][ T9805] Call Trace:
[  667.124893][ T9805]  <TASK>
[  667.124896][ T9805]  dump_stack_lvl+0x10e/0x1f0
[  667.124911][ T9805]  print_report+0xd0/0x660
[  667.124920][ T9805]  ? __virt_addr_valid+0x81/0x610
[  667.124928][ T9805]  ? __phys_addr+0xe8/0x180
[  667.124934][ T9805]  ? hfsplus_uni2asc+0x902/0xa10
[  667.124942][ T9805]  kasan_report+0xc6/0x100
[  667.124950][ T9805]  ? hfsplus_uni2asc+0x902/0xa10
[  667.124959][ T9805]  hfsplus_uni2asc+0x902/0xa10
[  667.124966][ T9805]  ? hfsplus_bnode_read+0x14b/0x360
[  667.124974][ T9805]  hfsplus_readdir+0x845/0xfc0
[  667.124984][ T9805]  ? __pfx_hfsplus_readdir+0x10/0x10
[  667.124994][ T9805]  ? stack_trace_save+0x8e/0xc0
[  667.125008][ T9805]  ? iterate_dir+0x18b/0xb20
[  667.125015][ T9805]  ? trace_lock_acquire+0x85/0xd0
[  667.125022][ T9805]  ? lock_acquire+0x30/0x80
[  667.125029][ T9805]  ? iterate_dir+0x18b/0xb20
[  667.125037][ T9805]  ? down_read_killable+0x1ed/0x4c0
[  667.125044][ T9805]  ? putname+0x154/0x1a0
[  667.125051][ T9805]  ? __pfx_down_read_killable+0x10/0x10
[  667.125058][ T9805]  ? apparmor_file_permission+0x239/0x3e0
[  667.125069][ T9805]  iterate_dir+0x296/0xb20
[  667.125076][ T9805]  __x64_sys_getdents64+0x13c/0x2c0
[  667.125084][ T9805]  ? __pfx___x64_sys_getdents64+0x10/0x10
[  667.125091][ T9805]  ? __x64_sys_openat+0x141/0x200
[  667.125126][ T9805]  ? __pfx_filldir64+0x10/0x10
[  667.125134][ T9805]  ? do_user_addr_fault+0x7fe/0x12f0
[  667.125143][ T9805]  do_syscall_64+0xc9/0x480
[  667.125151][ T9805]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  667.125158][ T9805] RIP: 0033:0x7fa8753b2fc9
[  667.125164][ T9805] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 48
[  667.125172][ T9805] RSP: 002b:00007ffe96f8e0f8 EFLAGS: 00000217 ORIG_RAX: 00000000000000d9
[  667.125181][ T9805] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa8753b2fc9
[  667.125185][ T9805] RDX: 0000000000000400 RSI: 00002000000063c0 RDI: 0000000000000004
[  667.125190][ T9805] RBP: 00007ffe96f8e110 R08: 00007ffe96f8e110 R09: 00007ffe96f8e110
[  667.125195][ T9805] R10: 0000000000000000 R11: 0000000000000217 R12: 0000556b1e3b4260
[  667.125199][ T9805] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  667.125207][ T9805]  </TASK>
[  667.125210][ T9805]
[  667.145632][ T9805] Allocated by task 9805:
[  667.145991][ T9805]  kasan_save_stack+0x20/0x40
[  667.146352][ T9805]  kasan_save_track+0x14/0x30
[  667.146717][ T9805]  __kasan_kmalloc+0xaa/0xb0
[  667.147065][ T9805]  __kmalloc_noprof+0x205/0x550
[  667.147448][ T9805]  hfsplus_find_init+0x95/0x1f0
[  667.147813][ T9805]  hfsplus_readdir+0x220/0xfc0
[  667.148174][ T9805]  iterate_dir+0x296/0xb20
[  667.148549][ T9805]  __x64_sys_getdents64+0x13c/0x2c0
[  667.148937][ T9805]  do_syscall_64+0xc9/0x480
[  667.149291][ T9805]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  667.149809][ T9805]
[  667.150030][ T9805] The buggy address belongs to the object at ffff88802592f000
[  667.150030][ T9805]  which belongs to the cache kmalloc-2k of size 2048
[  667.151282][ T9805] The buggy address is located 0 bytes to the right of
[  667.151282][ T9805]  allocated 1036-byte region [ffff88802592f000, ffff88802592f40c)
[  667.152580][ T9805]
[  667.152798][ T9805] The buggy address belongs to the physical page:
[  667.153373][ T9805] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x25928
[  667.154157][ T9805] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[  667.154916][ T9805] anon flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff)
[  667.155631][ T9805] page_type: f5(slab)
[  667.155997][ T9805] raw: 00fff00000000040 ffff88801b442f00 0000000000000000 dead000000000001
[  667.156770][ T9805] raw: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000
[  667.157536][ T9805] head: 00fff00000000040 ffff88801b442f00 0000000000000000 dead000000000001
[  667.158317][ T9805] head: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000
[  667.159088][ T9805] head: 00fff00000000003 ffffea0000964a01 00000000ffffffff 00000000ffffffff
[  667.159865][ T9805] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
[  667.160643][ T9805] page dumped because: kasan: bad access detected
[  667.161216][ T9805] page_owner tracks the page as allocated
[  667.161732][ T9805] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN9
[  667.163566][ T9805]  post_alloc_hook+0x1c0/0x230
[  667.164003][ T9805]  get_page_from_freelist+0xdeb/0x3b30
[  667.164503][ T9805]  __alloc_frozen_pages_noprof+0x25c/0x2460
[  667.165040][ T9805]  alloc_pages_mpol+0x1fb/0x550
[  667.165489][ T9805]  new_slab+0x23b/0x340
[  667.165872][ T9805]  ___slab_alloc+0xd81/0x1960
[  667.166313][ T9805]  __slab_alloc.isra.0+0x56/0xb0
[  667.166767][ T9805]  __kmalloc_cache_noprof+0x255/0x3e0
[  667.167255][ T9805]  psi_cgroup_alloc+0x52/0x2d0
[  667.167693][ T9805]  cgroup_mkdir+0x694/0x1210
[  667.168118][ T9805]  kernfs_iop_mkdir+0x111/0x190
[  667.168568][ T9805]  vfs_mkdir+0x59b/0x8d0
[  667.168956][ T9805]  do_mkdirat+0x2ed/0x3d0
[  667.169353][ T9805]  __x64_sys_mkdir+0xef/0x140
[  667.169784][ T9805]  do_syscall_64+0xc9/0x480
[  667.170195][ T9805]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  667.170730][ T9805] page last free pid 1257 tgid 1257 stack trace:
[  667.171304][ T9805]  __free_frozen_pages+0x80c/0x1250
[  667.171770][ T9805]  vfree.part.0+0x12b/0xab0
[  667.172182][ T9805]  delayed_vfree_work+0x93/0xd0
[  667.172612][ T9805]  process_one_work+0x9b5/0x1b80
[  667.173067][ T9805]  worker_thread+0x630/0xe60
[  667.173486][ T9805]  kthread+0x3a8/0x770
[  667.173857][ T9805]  ret_from_fork+0x517/0x6e0
[  667.174278][ T9805]  ret_from_fork_asm+0x1a/0x30
[  667.174703][ T9805]
[  667.174917][ T9805] Memory state around the buggy address:
[  667.175411][ T9805]  ffff88802592f300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  667.176114][ T9805]  ffff88802592f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  667.176830][ T9805] >ffff88802592f400: 00 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  667.177547][ T9805]                       ^
[  667.177933][ T9805]  ffff88802592f480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  667.178640][ T9805]  ffff88802592f500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  667.179350][ T9805] ==================================================================

The hfsplus_uni2asc() method operates by struct hfsplus_unistr:

struct hfsplus_unistr {
	__be16 length;
	hfsplus_unichr unicode[HFSPLUS_MAX_STRLEN];
} __packed;

where HFSPLUS_MAX_STRLEN is 255 bytes. The issue happens if length
of the structure instance has value bigger than 255 (for example,
65283). In such case, pointer on unicode buffer is going beyond of
the allocated memory.

The patch fixes the issue by checking the length value of
hfsplus_unistr instance and using 255 value in the case if length
value is bigger than HFSPLUS_MAX_STRLEN. Potential reason of such
situation could be a corruption of Catalog File b-tree's node.

Reported-by: Wenzhi Wang <wenzhi.wang@uwaterloo.ca>
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Yangtao Li <frank.li@vivo.com>
Link: https://lore.kernel.org/r/20250710230830.110500-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 21, 2025
commit 87c6efc upstream.

Shuang reported sch_ets test-case [1] crashing in ets_class_qlen_notify()
after recent changes from Lion [2]. The problem is: in ets_qdisc_change()
we purge unused DWRR queues; the value of 'q->nbands' is the new one, and
the cleanup should be done with the old one. The problem is here since my
first attempts to fix ets_qdisc_change(), but it surfaced again after the
recent qdisc len accounting fixes. Fix it purging idle DWRR queues before
assigning a new value of 'q->nbands', so that all purge operations find a
consistent configuration:

 - old 'q->nbands' because it's needed by ets_class_find()
 - old 'q->nstrict' because it's needed by ets_class_is_strict()

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 62 UID: 0 PID: 39457 Comm: tc Kdump: loaded Not tainted 6.12.0-116.el10.x86_64 #1 PREEMPT(voluntary)
 Hardware name: Dell Inc. PowerEdge R640/06DKY5, BIOS 2.12.2 07/09/2021
 RIP: 0010:__list_del_entry_valid_or_report+0x4/0x80
 Code: ff 4c 39 c7 0f 84 39 19 8e ff b8 01 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 8b 17 48 8b 4f 08 48 85 d2 0f 84 56 19 8e ff 48 85 c9 0f 84 ab
 RSP: 0018:ffffba186009f400 EFLAGS: 00010202
 RAX: 00000000000000d6 RBX: 0000000000000000 RCX: 0000000000000004
 RDX: ffff9f0fa29b69c0 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffffffffc12c2400 R08: 0000000000000008 R09: 0000000000000004
 R10: ffffffffffffffff R11: 0000000000000004 R12: 0000000000000000
 R13: ffff9f0f8cfe0000 R14: 0000000000100005 R15: 0000000000000000
 FS:  00007f2154f37480(0000) GS:ffff9f269c1c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000001530be001 CR4: 00000000007726f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  ets_class_qlen_notify+0x65/0x90 [sch_ets]
  qdisc_tree_reduce_backlog+0x74/0x110
  ets_qdisc_change+0x630/0xa40 [sch_ets]
  __tc_modify_qdisc.constprop.0+0x216/0x7f0
  tc_modify_qdisc+0x7c/0x120
  rtnetlink_rcv_msg+0x145/0x3f0
  netlink_rcv_skb+0x53/0x100
  netlink_unicast+0x245/0x390
  netlink_sendmsg+0x21b/0x470
  ____sys_sendmsg+0x39d/0x3d0
  ___sys_sendmsg+0x9a/0xe0
  __sys_sendmsg+0x7a/0xd0
  do_syscall_64+0x7d/0x160
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7f2155114084
 Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d 25 f0 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89
 RSP: 002b:00007fff1fd7a988 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 0000560ec063e5e0 RCX: 00007f2155114084
 RDX: 0000000000000000 RSI: 00007fff1fd7a9f0 RDI: 0000000000000003
 RBP: 00007fff1fd7aa60 R08: 0000000000000010 R09: 000000000000003f
 R10: 0000560ee9b3a010 R11: 0000000000000202 R12: 00007fff1fd7aae0
 R13: 000000006891ccde R14: 0000560ec063e5e0 R15: 00007fff1fd7aad0
  </TASK>

 [1] https://lore.kernel.org/netdev/e08c7f4a6882f260011909a868311c6e9b54f3e4.1639153474.git.dcaratti@redhat.com/
 [2] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/

Cc: stable@vger.kernel.org
Fixes: 103406b ("net/sched: Always pass notifications when child class becomes empty")
Fixes: c062f2a ("net/sched: sch_ets: don't remove idle classes from the round-robin list")
Fixes: dcc68b4 ("net: sch_ets: Add a new Qdisc")
Reported-by: Li Shuang <shuali@redhat.com>
Closes: https://issues.redhat.com/browse/RHEL-108026
Reviewed-by: Petr Machata <petrm@nvidia.com>
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://patch.msgid.link/7928ff6d17db47a2ae7cc205c44777b1f1950545.1755016081.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Aug 21, 2025
commit 33caa20 upstream.

The existing code move the VF NIC to new namespace when NETDEV_REGISTER is
received on netvsc NIC. During deletion of the namespace,
default_device_exit_batch() >> default_device_exit_net() is called. When
netvsc NIC is moved back and registered to the default namespace, it
automatically brings VF NIC back to the default namespace. This will cause
the default_device_exit_net() >> for_each_netdev_safe loop unable to detect
the list end, and hit NULL ptr:

[  231.449420] mana 7870:00:00.0 enP30832s1: Moved VF to namespace with: eth0
[  231.449656] BUG: kernel NULL pointer dereference, address: 0000000000000010
[  231.450246] #PF: supervisor read access in kernel mode
[  231.450579] #PF: error_code(0x0000) - not-present page
[  231.450916] PGD 17b8a8067 P4D 0
[  231.451163] Oops: Oops: 0000 [#1] SMP NOPTI
[  231.451450] CPU: 82 UID: 0 PID: 1394 Comm: kworker/u768:1 Not tainted 6.16.0-rc4+ #3 VOLUNTARY
[  231.452042] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024
[  231.452692] Workqueue: netns cleanup_net
[  231.452947] RIP: 0010:default_device_exit_batch+0x16c/0x3f0
[  231.453326] Code: c0 0c f5 b3 e8 d5 db fe ff 48 85 c0 74 15 48 c7 c2 f8 fd ca b2 be 10 00 00 00 48 8d 7d c0 e8 7b 77 25 00 49 8b 86 28 01 00 00 <48> 8b 50 10 4c 8b 2a 4c 8d 62 f0 49 83 ed 10 4c 39 e0 0f 84 d6 00
[  231.454294] RSP: 0018:ff75fc7c9bf9fd00 EFLAGS: 00010246
[  231.454610] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 61c8864680b583eb
[  231.455094] RDX: ff1fa9f71462d800 RSI: ff75fc7c9bf9fd38 RDI: 0000000030766564
[  231.455686] RBP: ff75fc7c9bf9fd78 R08: 0000000000000000 R09: 0000000000000000
[  231.456126] R10: 0000000000000001 R11: 0000000000000004 R12: ff1fa9f70088e340
[  231.456621] R13: ff1fa9f70088e340 R14: ffffffffb3f50c20 R15: ff1fa9f7103e6340
[  231.457161] FS:  0000000000000000(0000) GS:ff1faa6783a08000(0000) knlGS:0000000000000000
[  231.457707] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  231.458031] CR2: 0000000000000010 CR3: 0000000179ab2006 CR4: 0000000000b73ef0
[  231.458434] Call Trace:
[  231.458600]  <TASK>
[  231.458777]  ops_undo_list+0x100/0x220
[  231.459015]  cleanup_net+0x1b8/0x300
[  231.459285]  process_one_work+0x184/0x340

To fix it, move the ns change to a workqueue, and take rtnl_lock to avoid
changing the netdev list when default_device_exit_net() is using it.

Cc: stable@vger.kernel.org
Fixes: 4c26280 ("hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1754511711-11188-1-git-send-email-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Aug 22, 2025
…dlock

When a user creates a dualpi2 qdisc it automatically sets a timer. This
timer will run constantly and update the qdisc's probability field.
The issue is that the timer acquires the qdisc root lock and runs in
hardirq. The qdisc root lock is also acquired in dev.c whenever a packet
arrives for this qdisc. Since the dualpi2 timer callback runs in hardirq,
it may interrupt the packet processing running in softirq. If that happens
and it runs on the same CPU, it will acquire the same lock and cause a
deadlock. The following splat shows up when running a kernel compiled with
lock debugging:

[  +0.000224] WARNING: inconsistent lock state
[  +0.000224] 6.16.0+ #10 Not tainted
[  +0.000169] --------------------------------
[  +0.000029] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[  +0.000000] ping/156 [HC0[0]:SC0[2]:HE1:SE0] takes:
[  +0.000000] ffff897841242110 (&sch->root_lock_key){?.-.}-{3:3}, at: __dev_queue_xmit+0x86d/0x1140
[  +0.000000] {IN-HARDIRQ-W} state was registered at:
[  +0.000000]   lock_acquire.part.0+0xb6/0x220
[  +0.000000]   _raw_spin_lock+0x31/0x80
[  +0.000000]   dualpi2_timer+0x6f/0x270
[  +0.000000]   __hrtimer_run_queues+0x1c5/0x360
[  +0.000000]   hrtimer_interrupt+0x115/0x260
[  +0.000000]   __sysvec_apic_timer_interrupt+0x6d/0x1a0
[  +0.000000]   sysvec_apic_timer_interrupt+0x6e/0x80
[  +0.000000]   asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  +0.000000]   pv_native_safe_halt+0xf/0x20
[  +0.000000]   default_idle+0x9/0x10
[  +0.000000]   default_idle_call+0x7e/0x1e0
[  +0.000000]   do_idle+0x1e8/0x250
[  +0.000000]   cpu_startup_entry+0x29/0x30
[  +0.000000]   rest_init+0x151/0x160
[  +0.000000]   start_kernel+0x6f3/0x700
[  +0.000000]   x86_64_start_reservations+0x24/0x30
[  +0.000000]   x86_64_start_kernel+0xc8/0xd0
[  +0.000000]   common_startup_64+0x13e/0x148
[  +0.000000] irq event stamp: 6884
[  +0.000000] hardirqs last  enabled at (6883): [<ffffffffa75700b3>] neigh_resolve_output+0x223/0x270
[  +0.000000] hardirqs last disabled at (6882): [<ffffffffa7570078>] neigh_resolve_output+0x1e8/0x270
[  +0.000000] softirqs last  enabled at (6880): [<ffffffffa757006b>] neigh_resolve_output+0x1db/0x270
[  +0.000000] softirqs last disabled at (6884): [<ffffffffa755b533>] __dev_queue_xmit+0x73/0x1140
[  +0.000000]
              other info that might help us debug this:
[  +0.000000]  Possible unsafe locking scenario:

[  +0.000000]        CPU0
[  +0.000000]        ----
[  +0.000000]   lock(&sch->root_lock_key);
[  +0.000000]   <Interrupt>
[  +0.000000]     lock(&sch->root_lock_key);
[  +0.000000]
               *** DEADLOCK ***

[  +0.000000] 4 locks held by ping/156:
[  +0.000000]  #0: ffff897842332e08 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x41e/0xf40
[  +0.000000]  #1: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_output+0x2c/0x190
[  +0.000000]  #2: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_finish_output2+0xad/0x950
[  +0.000000]  #3: ffffffffa816f840 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x73/0x1140

I am able to reproduce it consistently when running the following:

tc qdisc add dev lo handle 1: root dualpi2
ping -f 127.0.0.1

To fix it, make the timer run in softirq.

Fixes: 320d031 ("sched: Struct definition and parsing of dualpi2 qdisc")
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Victor Nogueira <victor@mojatatu.com>
Link: https://patch.msgid.link/20250815135317.664993-1-victor@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
donald pushed a commit that referenced this issue Aug 22, 2025
BPF CI testing report a UAF issue:

  [   16.446633] BUG: kernel NULL pointer dereference, address: 000000000000003  0
  [   16.447134] #PF: supervisor read access in kernel mod  e
  [   16.447516] #PF: error_code(0x0000) - not-present pag  e
  [   16.447878] PGD 0 P4D   0
  [   16.448063] Oops: Oops: 0000 [#1] PREEMPT SMP NOPT  I
  [   16.448409] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:1 Tainted: G           OE      6.13.0-rc3-g89e8a75fda73-dirty #4  2
  [   16.449124] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODUL  E
  [   16.449502] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/201  4
  [   16.450201] Workqueue: smc_hs_wq smc_listen_wor  k
  [   16.450531] RIP: 0010:smc_listen_work+0xc02/0x159  0
  [   16.452158] RSP: 0018:ffffb5ab40053d98 EFLAGS: 0001024  6
  [   16.452526] RAX: 0000000000000001 RBX: 0000000000000002 RCX: 000000000000030  0
  [   16.452994] RDX: 0000000000000280 RSI: 00003513840053f0 RDI: 000000000000000  0
  [   16.453492] RBP: ffffa097808e3800 R08: ffffa09782dba1e0 R09: 000000000000000  5
  [   16.453987] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0978274640  0
  [   16.454497] R13: 0000000000000000 R14: 0000000000000000 R15: ffffa09782d4092  0
  [   16.454996] FS:  0000000000000000(0000) GS:ffffa097bbc00000(0000) knlGS:000000000000000  0
  [   16.455557] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003  3
  [   16.455961] CR2: 0000000000000030 CR3: 0000000102788004 CR4: 0000000000770ef  0
  [   16.456459] PKRU: 5555555  4
  [   16.456654] Call Trace  :
  [   16.456832]  <TASK  >
  [   16.456989]  ? __die+0x23/0x7  0
  [   16.457215]  ? page_fault_oops+0x180/0x4c  0
  [   16.457508]  ? __lock_acquire+0x3e6/0x249  0
  [   16.457801]  ? exc_page_fault+0x68/0x20  0
  [   16.458080]  ? asm_exc_page_fault+0x26/0x3  0
  [   16.458389]  ? smc_listen_work+0xc02/0x159  0
  [   16.458689]  ? smc_listen_work+0xc02/0x159  0
  [   16.458987]  ? lock_is_held_type+0x8f/0x10  0
  [   16.459284]  process_one_work+0x1ea/0x6d  0
  [   16.459570]  worker_thread+0x1c3/0x38  0
  [   16.459839]  ? __pfx_worker_thread+0x10/0x1  0
  [   16.460144]  kthread+0xe0/0x11  0
  [   16.460372]  ? __pfx_kthread+0x10/0x1  0
  [   16.460640]  ret_from_fork+0x31/0x5  0
  [   16.460896]  ? __pfx_kthread+0x10/0x1  0
  [   16.461166]  ret_from_fork_asm+0x1a/0x3  0
  [   16.461453]  </TASK  >
  [   16.461616] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod(OE)  ]
  [   16.462134] CR2: 000000000000003  0
  [   16.462380] ---[ end trace 0000000000000000 ]---
  [   16.462710] RIP: 0010:smc_listen_work+0xc02/0x1590

The direct cause of this issue is that after smc_listen_out_connected(),
newclcsock->sk may be NULL since it will releases the smcsk. Therefore,
if the application closes the socket immediately after accept,
newclcsock->sk can be NULL. A possible execution order could be as
follows:

smc_listen_work                                 | userspace
-----------------------------------------------------------------
lock_sock(sk)                                   |
smc_listen_out_connected()                      |
| \- smc_listen_out                             |
|    | \- release_sock                          |
     | |- sk->sk_data_ready()                   |
                                                | fd = accept();
                                                | close(fd);
                                                |  \- socket->sk = NULL;
/* newclcsock->sk is NULL now */
SMC_STAT_SERV_SUCC_INC(sock_net(newclcsock->sk))

Since smc_listen_out_connected() will not fail, simply swapping the order
of the code can easily fix this issue.

Fixes: 3b2dec2 ("net/smc: restructure client and server code in af_smc")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Link: https://patch.msgid.link/20250818054618.41615-1-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
donald pushed a commit that referenced this issue Aug 22, 2025
Receiving HSR frame with insufficient space to hold HSR tag in the skb
can result in a crash (kernel BUG):

[   45.390915] skbuff: skb_under_panic: text:ffffffff86f32cac len:26 put:14 head:ffff888042418000 data:ffff888042417ff4 tail:0xe end:0x180 dev:bridge_slave_1
[   45.392559] ------------[ cut here ]------------
[   45.392912] kernel BUG at net/core/skbuff.c:211!
[   45.393276] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[   45.393809] CPU: 1 UID: 0 PID: 2496 Comm: reproducer Not tainted 6.15.0 #12 PREEMPT(undef)
[   45.394433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   45.395273] RIP: 0010:skb_panic+0x15b/0x1d0

<snip registers, remove unreliable trace>

[   45.402911] Call Trace:
[   45.403105]  <IRQ>
[   45.404470]  skb_push+0xcd/0xf0
[   45.404726]  br_dev_queue_push_xmit+0x7c/0x6c0
[   45.406513]  br_forward_finish+0x128/0x260
[   45.408483]  __br_forward+0x42d/0x590
[   45.409464]  maybe_deliver+0x2eb/0x420
[   45.409763]  br_flood+0x174/0x4a0
[   45.410030]  br_handle_frame_finish+0xc7c/0x1bc0
[   45.411618]  br_handle_frame+0xac3/0x1230
[   45.413674]  __netif_receive_skb_core.constprop.0+0x808/0x3df0
[   45.422966]  __netif_receive_skb_one_core+0xb4/0x1f0
[   45.424478]  __netif_receive_skb+0x22/0x170
[   45.424806]  process_backlog+0x242/0x6d0
[   45.425116]  __napi_poll+0xbb/0x630
[   45.425394]  net_rx_action+0x4d1/0xcc0
[   45.427613]  handle_softirqs+0x1a4/0x580
[   45.427926]  do_softirq+0x74/0x90
[   45.428196]  </IRQ>

This issue was found by syzkaller.

The panic happens in br_dev_queue_push_xmit() once it receives a
corrupted skb with ETH header already pushed in linear data. When it
attempts the skb_push() call, there's not enough headroom and
skb_push() panics.

The corrupted skb is put on the queue by HSR layer, which makes a
sequence of unintended transformations when it receives a specific
corrupted HSR frame (with incomplete TAG).

Fix it by dropping and consuming frames that are not long enough to
contain both ethernet and hsr headers.

Alternative fix would be to check for enough headroom before skb_push()
in br_dev_queue_push_xmit().

In the reproducer, this is injected via AF_PACKET, but I don't easily
see why it couldn't be sent over the wire from adjacent network.

Further Details:

In the reproducer, the following network interface chain is set up:

┌────────────────┐   ┌────────────────┐
│ veth0_to_hsr   ├───┤  hsr_slave0    ┼───┐
└────────────────┘   └────────────────┘   │
                                          │ ┌──────┐
                                          ├─┤ hsr0 ├───┐
                                          │ └──────┘   │
┌────────────────┐   ┌────────────────┐   │            │┌────────┐
│ veth1_to_hsr   ┼───┤  hsr_slave1    ├───┘            └┤        │
└────────────────┘   └────────────────┘                ┌┼ bridge │
                                                       ││        │
                                                       │└────────┘
                                                       │
                                        ┌───────┐      │
                                        │  ...  ├──────┘
                                        └───────┘

To trigger the events leading up to crash, reproducer sends a corrupted
HSR frame with incomplete TAG, via AF_PACKET socket on 'veth0_to_hsr'.

The first HSR-layer function to process this frame is
hsr_handle_frame(). It and then checks if the
protocol is ETH_P_PRP or ETH_P_HSR. If it is, it calls
skb_set_network_header(skb, ETH_HLEN + HSR_HLEN), without checking that
the skb is long enough. For the crashing frame it is not, and hence the
skb->network_header and skb->mac_len fields are set incorrectly,
pointing after the end of the linear buffer.

I will call this a BUG#1 and it is what is addressed by this patch. In
the crashing scenario before the fix, the skb continues to go down the
hsr path as follows.

hsr_handle_frame() then calls this sequence
hsr_forward_skb()
  fill_frame_info()
    hsr->proto_ops->fill_frame_info()
      hsr_fill_frame_info()

hsr_fill_frame_info() contains a check that intends to check whether the
skb actually contains the HSR header. But the check relies on the
skb->mac_len field which was erroneously setup due to BUG#1, so the
check passes and the execution continues  back in the hsr_forward_skb():

hsr_forward_skb()
  hsr_forward_do()
    hsr->proto_ops->get_untagged_frame()
      hsr_get_untagged_frame()
        create_stripped_skb_hsr()

In create_stripped_skb_hsr(), a copy of the skb is created and is
further corrupted by operation that attempts to strip the HSR tag in a
call to __pskb_copy().

The skb enters create_stripped_skb_hsr() with ethernet header pushed in
linear buffer. The skb_pull(skb_in, HSR_HLEN) thus pulls 6 bytes of
ethernet header into the headroom, creating skb_in with a headroom of
size 8. The subsequent __pskb_copy() then creates an skb with headroom
of just 2 and skb->len of just 12, this is how it looks after the copy:

gdb) p skb->len
$10 = 12
(gdb) p skb->data
$11 = (unsigned char *) 0xffff888041e45382 "\252\252\252\252\252!\210\373",
(gdb) p skb->head
$12 = (unsigned char *) 0xffff888041e45380 ""

It seems create_stripped_skb_hsr() assumes that ETH header is pulled
in the headroom when it's entered, because it just pulls HSR header on
top. But that is not the case in our code-path and we end up with the
corrupted skb instead. I will call this BUG#2

*I got confused here because it seems that under no conditions can
create_stripped_skb_hsr() work well, the assumption it makes is not true
during the processing of hsr frames - since the skb_push() in
hsr_handle_frame to skb_pull in hsr_deliver_master(). I wonder whether I
missed something here.*

Next, the execution arrives in hsr_deliver_master(). It calls
skb_pull(ETH_HLEN), which just returns NULL - the SKB does not have
enough space for the pull (as it only has 12 bytes in total at this
point).

*The skb_pull() here further suggests that ethernet header is meant
to be pushed through the whole hsr processing and
create_stripped_skb_hsr() should pull it before doing the HSR header
pull.*

hsr_deliver_master() then puts the corrupted skb on the queue, it is
then picked up from there by bridge frame handling layer and finally
lands in br_dev_queue_push_xmit where it panics.

Cc: stable@kernel.org
Fixes: 48b491a ("net: hsr: fix mac_len checks")
Reported-by: syzbot+a81f2759d022496b40ab@syzkaller.appspotmail.com
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250819082842.94378-1-acsjakub@amazon.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
donald pushed a commit that referenced this issue Aug 22, 2025
[ Upstream commit 736a051 ]

The hfs_find_init() method can trigger the crash
if tree pointer is NULL:

[   45.746290][ T9787] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000008: 0000 [#1] SMP KAI
[   45.747287][ T9787] KASAN: null-ptr-deref in range [0x0000000000000040-0x0000000000000047]
[   45.748716][ T9787] CPU: 2 UID: 0 PID: 9787 Comm: repro Not tainted 6.16.0-rc3 #10 PREEMPT(full)
[   45.750250][ T9787] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   45.751983][ T9787] RIP: 0010:hfs_find_init+0x86/0x230
[   45.752834][ T9787] Code: c1 ea 03 80 3c 02 00 0f 85 9a 01 00 00 4c 8d 6b 40 48 c7 45 18 00 00 00 00 48 b8 00 00 00 00 00 fc
[   45.755574][ T9787] RSP: 0018:ffffc90015157668 EFLAGS: 00010202
[   45.756432][ T9787] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff819a4d09
[   45.757457][ T9787] RDX: 0000000000000008 RSI: ffffffff819acd3a RDI: ffffc900151576e8
[   45.758282][ T9787] RBP: ffffc900151576d0 R08: 0000000000000005 R09: 0000000000000000
[   45.758943][ T9787] R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000004
[   45.759619][ T9787] R13: 0000000000000040 R14: ffff88802c50814a R15: 0000000000000000
[   45.760293][ T9787] FS:  00007ffb72734540(0000) GS:ffff8880cec64000(0000) knlGS:0000000000000000
[   45.761050][ T9787] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.761606][ T9787] CR2: 00007f9bd8225000 CR3: 000000010979a000 CR4: 00000000000006f0
[   45.762286][ T9787] Call Trace:
[   45.762570][ T9787]  <TASK>
[   45.762824][ T9787]  hfs_ext_read_extent+0x190/0x9d0
[   45.763269][ T9787]  ? submit_bio_noacct_nocheck+0x2dd/0xce0
[   45.763766][ T9787]  ? __pfx_hfs_ext_read_extent+0x10/0x10
[   45.764250][ T9787]  hfs_get_block+0x55f/0x830
[   45.764646][ T9787]  block_read_full_folio+0x36d/0x850
[   45.765105][ T9787]  ? __pfx_hfs_get_block+0x10/0x10
[   45.765541][ T9787]  ? const_folio_flags+0x5b/0x100
[   45.765972][ T9787]  ? __pfx_hfs_read_folio+0x10/0x10
[   45.766415][ T9787]  filemap_read_folio+0xbe/0x290
[   45.766840][ T9787]  ? __pfx_filemap_read_folio+0x10/0x10
[   45.767325][ T9787]  ? __filemap_get_folio+0x32b/0xbf0
[   45.767780][ T9787]  do_read_cache_folio+0x263/0x5c0
[   45.768223][ T9787]  ? __pfx_hfs_read_folio+0x10/0x10
[   45.768666][ T9787]  read_cache_page+0x5b/0x160
[   45.769070][ T9787]  hfs_btree_open+0x491/0x1740
[   45.769481][ T9787]  hfs_mdb_get+0x15e2/0x1fb0
[   45.769877][ T9787]  ? __pfx_hfs_mdb_get+0x10/0x10
[   45.770316][ T9787]  ? find_held_lock+0x2b/0x80
[   45.770731][ T9787]  ? lockdep_init_map_type+0x5c/0x280
[   45.771200][ T9787]  ? lockdep_init_map_type+0x5c/0x280
[   45.771674][ T9787]  hfs_fill_super+0x38e/0x720
[   45.772092][ T9787]  ? __pfx_hfs_fill_super+0x10/0x10
[   45.772549][ T9787]  ? snprintf+0xbe/0x100
[   45.772931][ T9787]  ? __pfx_snprintf+0x10/0x10
[   45.773350][ T9787]  ? do_raw_spin_lock+0x129/0x2b0
[   45.773796][ T9787]  ? find_held_lock+0x2b/0x80
[   45.774215][ T9787]  ? set_blocksize+0x40a/0x510
[   45.774636][ T9787]  ? sb_set_blocksize+0x176/0x1d0
[   45.775087][ T9787]  ? setup_bdev_super+0x369/0x730
[   45.775533][ T9787]  get_tree_bdev_flags+0x384/0x620
[   45.775985][ T9787]  ? __pfx_hfs_fill_super+0x10/0x10
[   45.776453][ T9787]  ? __pfx_get_tree_bdev_flags+0x10/0x10
[   45.776950][ T9787]  ? bpf_lsm_capable+0x9/0x10
[   45.777365][ T9787]  ? security_capable+0x80/0x260
[   45.777803][ T9787]  vfs_get_tree+0x8e/0x340
[   45.778203][ T9787]  path_mount+0x13de/0x2010
[   45.778604][ T9787]  ? kmem_cache_free+0x2b0/0x4c0
[   45.779052][ T9787]  ? __pfx_path_mount+0x10/0x10
[   45.779480][ T9787]  ? getname_flags.part.0+0x1c5/0x550
[   45.779954][ T9787]  ? putname+0x154/0x1a0
[   45.780335][ T9787]  __x64_sys_mount+0x27b/0x300
[   45.780758][ T9787]  ? __pfx___x64_sys_mount+0x10/0x10
[   45.781232][ T9787]  do_syscall_64+0xc9/0x480
[   45.781631][ T9787]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   45.782149][ T9787] RIP: 0033:0x7ffb7265b6ca
[   45.782539][ T9787] Code: 48 8b 0d c9 17 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48
[   45.784212][ T9787] RSP: 002b:00007ffc0c10cfb8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[   45.784935][ T9787] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffb7265b6ca
[   45.785626][ T9787] RDX: 0000200000000240 RSI: 0000200000000280 RDI: 00007ffc0c10d100
[   45.786316][ T9787] RBP: 00007ffc0c10d190 R08: 00007ffc0c10d000 R09: 0000000000000000
[   45.787011][ T9787] R10: 0000000000000048 R11: 0000000000000206 R12: 0000560246733250
[   45.787697][ T9787] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   45.788393][ T9787]  </TASK>
[   45.788665][ T9787] Modules linked in:
[   45.789058][ T9787] ---[ end trace 0000000000000000 ]---
[   45.789554][ T9787] RIP: 0010:hfs_find_init+0x86/0x230
[   45.790028][ T9787] Code: c1 ea 03 80 3c 02 00 0f 85 9a 01 00 00 4c 8d 6b 40 48 c7 45 18 00 00 00 00 48 b8 00 00 00 00 00 fc
[   45.792364][ T9787] RSP: 0018:ffffc90015157668 EFLAGS: 00010202
[   45.793155][ T9787] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff819a4d09
[   45.794123][ T9787] RDX: 0000000000000008 RSI: ffffffff819acd3a RDI: ffffc900151576e8
[   45.795105][ T9787] RBP: ffffc900151576d0 R08: 0000000000000005 R09: 0000000000000000
[   45.796135][ T9787] R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000004
[   45.797114][ T9787] R13: 0000000000000040 R14: ffff88802c50814a R15: 0000000000000000
[   45.798024][ T9787] FS:  00007ffb72734540(0000) GS:ffff8880cec64000(0000) knlGS:0000000000000000
[   45.799019][ T9787] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.799822][ T9787] CR2: 00007f9bd8225000 CR3: 000000010979a000 CR4: 00000000000006f0
[   45.800747][ T9787] Kernel panic - not syncing: Fatal exception

The hfs_fill_super() calls hfs_mdb_get() method that tries
to construct Extents Tree and Catalog Tree:

HFS_SB(sb)->ext_tree = hfs_btree_open(sb, HFS_EXT_CNID, hfs_ext_keycmp);
if (!HFS_SB(sb)->ext_tree) {
	pr_err("unable to open extent tree\n");
	goto out;
}
HFS_SB(sb)->cat_tree = hfs_btree_open(sb, HFS_CAT_CNID, hfs_cat_keycmp);
if (!HFS_SB(sb)->cat_tree) {
	pr_err("unable to open catalog tree\n");
	goto out;
}

However, hfs_btree_open() calls read_mapping_page() that
calls hfs_get_block(). And this method calls hfs_ext_read_extent():

static int hfs_ext_read_extent(struct inode *inode, u16 block)
{
	struct hfs_find_data fd;
	int res;

	if (block >= HFS_I(inode)->cached_start &&
	    block < HFS_I(inode)->cached_start + HFS_I(inode)->cached_blocks)
		return 0;

	res = hfs_find_init(HFS_SB(inode->i_sb)->ext_tree, &fd);
	if (!res) {
		res = __hfs_ext_cache_extent(&fd, inode, block);
		hfs_find_exit(&fd);
	}
	return res;
}

The problem here that hfs_find_init() is trying to use
HFS_SB(inode->i_sb)->ext_tree that is not initialized yet.
It will be initailized when hfs_btree_open() finishes
the execution.

The patch adds checking of tree pointer in hfs_find_init()
and it reworks the logic of hfs_btree_open() by reading
the b-tree's header directly from the volume. The read_mapping_page()
is exchanged on filemap_grab_folio() that grab the folio from
mapping. Then, sb_bread() extracts the b-tree's header
content and copy it into the folio.

Reported-by: Wenzhi Wang <wenzhi.wang@uwaterloo.ca>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250710213657.108285-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 22, 2025
[ Upstream commit 9445878 ]

The hfsplus_readdir() method is capable to crash by calling
hfsplus_uni2asc():

[  667.121659][ T9805] ==================================================================
[  667.122651][ T9805] BUG: KASAN: slab-out-of-bounds in hfsplus_uni2asc+0x902/0xa10
[  667.123627][ T9805] Read of size 2 at addr ffff88802592f40c by task repro/9805
[  667.124578][ T9805]
[  667.124876][ T9805] CPU: 3 UID: 0 PID: 9805 Comm: repro Not tainted 6.16.0-rc3 #1 PREEMPT(full)
[  667.124886][ T9805] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  667.124890][ T9805] Call Trace:
[  667.124893][ T9805]  <TASK>
[  667.124896][ T9805]  dump_stack_lvl+0x10e/0x1f0
[  667.124911][ T9805]  print_report+0xd0/0x660
[  667.124920][ T9805]  ? __virt_addr_valid+0x81/0x610
[  667.124928][ T9805]  ? __phys_addr+0xe8/0x180
[  667.124934][ T9805]  ? hfsplus_uni2asc+0x902/0xa10
[  667.124942][ T9805]  kasan_report+0xc6/0x100
[  667.124950][ T9805]  ? hfsplus_uni2asc+0x902/0xa10
[  667.124959][ T9805]  hfsplus_uni2asc+0x902/0xa10
[  667.124966][ T9805]  ? hfsplus_bnode_read+0x14b/0x360
[  667.124974][ T9805]  hfsplus_readdir+0x845/0xfc0
[  667.124984][ T9805]  ? __pfx_hfsplus_readdir+0x10/0x10
[  667.124994][ T9805]  ? stack_trace_save+0x8e/0xc0
[  667.125008][ T9805]  ? iterate_dir+0x18b/0xb20
[  667.125015][ T9805]  ? trace_lock_acquire+0x85/0xd0
[  667.125022][ T9805]  ? lock_acquire+0x30/0x80
[  667.125029][ T9805]  ? iterate_dir+0x18b/0xb20
[  667.125037][ T9805]  ? down_read_killable+0x1ed/0x4c0
[  667.125044][ T9805]  ? putname+0x154/0x1a0
[  667.125051][ T9805]  ? __pfx_down_read_killable+0x10/0x10
[  667.125058][ T9805]  ? apparmor_file_permission+0x239/0x3e0
[  667.125069][ T9805]  iterate_dir+0x296/0xb20
[  667.125076][ T9805]  __x64_sys_getdents64+0x13c/0x2c0
[  667.125084][ T9805]  ? __pfx___x64_sys_getdents64+0x10/0x10
[  667.125091][ T9805]  ? __x64_sys_openat+0x141/0x200
[  667.125126][ T9805]  ? __pfx_filldir64+0x10/0x10
[  667.125134][ T9805]  ? do_user_addr_fault+0x7fe/0x12f0
[  667.125143][ T9805]  do_syscall_64+0xc9/0x480
[  667.125151][ T9805]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  667.125158][ T9805] RIP: 0033:0x7fa8753b2fc9
[  667.125164][ T9805] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 48
[  667.125172][ T9805] RSP: 002b:00007ffe96f8e0f8 EFLAGS: 00000217 ORIG_RAX: 00000000000000d9
[  667.125181][ T9805] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa8753b2fc9
[  667.125185][ T9805] RDX: 0000000000000400 RSI: 00002000000063c0 RDI: 0000000000000004
[  667.125190][ T9805] RBP: 00007ffe96f8e110 R08: 00007ffe96f8e110 R09: 00007ffe96f8e110
[  667.125195][ T9805] R10: 0000000000000000 R11: 0000000000000217 R12: 0000556b1e3b4260
[  667.125199][ T9805] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  667.125207][ T9805]  </TASK>
[  667.125210][ T9805]
[  667.145632][ T9805] Allocated by task 9805:
[  667.145991][ T9805]  kasan_save_stack+0x20/0x40
[  667.146352][ T9805]  kasan_save_track+0x14/0x30
[  667.146717][ T9805]  __kasan_kmalloc+0xaa/0xb0
[  667.147065][ T9805]  __kmalloc_noprof+0x205/0x550
[  667.147448][ T9805]  hfsplus_find_init+0x95/0x1f0
[  667.147813][ T9805]  hfsplus_readdir+0x220/0xfc0
[  667.148174][ T9805]  iterate_dir+0x296/0xb20
[  667.148549][ T9805]  __x64_sys_getdents64+0x13c/0x2c0
[  667.148937][ T9805]  do_syscall_64+0xc9/0x480
[  667.149291][ T9805]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  667.149809][ T9805]
[  667.150030][ T9805] The buggy address belongs to the object at ffff88802592f000
[  667.150030][ T9805]  which belongs to the cache kmalloc-2k of size 2048
[  667.151282][ T9805] The buggy address is located 0 bytes to the right of
[  667.151282][ T9805]  allocated 1036-byte region [ffff88802592f000, ffff88802592f40c)
[  667.152580][ T9805]
[  667.152798][ T9805] The buggy address belongs to the physical page:
[  667.153373][ T9805] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x25928
[  667.154157][ T9805] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[  667.154916][ T9805] anon flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff)
[  667.155631][ T9805] page_type: f5(slab)
[  667.155997][ T9805] raw: 00fff00000000040 ffff88801b442f00 0000000000000000 dead000000000001
[  667.156770][ T9805] raw: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000
[  667.157536][ T9805] head: 00fff00000000040 ffff88801b442f00 0000000000000000 dead000000000001
[  667.158317][ T9805] head: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000
[  667.159088][ T9805] head: 00fff00000000003 ffffea0000964a01 00000000ffffffff 00000000ffffffff
[  667.159865][ T9805] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
[  667.160643][ T9805] page dumped because: kasan: bad access detected
[  667.161216][ T9805] page_owner tracks the page as allocated
[  667.161732][ T9805] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN9
[  667.163566][ T9805]  post_alloc_hook+0x1c0/0x230
[  667.164003][ T9805]  get_page_from_freelist+0xdeb/0x3b30
[  667.164503][ T9805]  __alloc_frozen_pages_noprof+0x25c/0x2460
[  667.165040][ T9805]  alloc_pages_mpol+0x1fb/0x550
[  667.165489][ T9805]  new_slab+0x23b/0x340
[  667.165872][ T9805]  ___slab_alloc+0xd81/0x1960
[  667.166313][ T9805]  __slab_alloc.isra.0+0x56/0xb0
[  667.166767][ T9805]  __kmalloc_cache_noprof+0x255/0x3e0
[  667.167255][ T9805]  psi_cgroup_alloc+0x52/0x2d0
[  667.167693][ T9805]  cgroup_mkdir+0x694/0x1210
[  667.168118][ T9805]  kernfs_iop_mkdir+0x111/0x190
[  667.168568][ T9805]  vfs_mkdir+0x59b/0x8d0
[  667.168956][ T9805]  do_mkdirat+0x2ed/0x3d0
[  667.169353][ T9805]  __x64_sys_mkdir+0xef/0x140
[  667.169784][ T9805]  do_syscall_64+0xc9/0x480
[  667.170195][ T9805]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  667.170730][ T9805] page last free pid 1257 tgid 1257 stack trace:
[  667.171304][ T9805]  __free_frozen_pages+0x80c/0x1250
[  667.171770][ T9805]  vfree.part.0+0x12b/0xab0
[  667.172182][ T9805]  delayed_vfree_work+0x93/0xd0
[  667.172612][ T9805]  process_one_work+0x9b5/0x1b80
[  667.173067][ T9805]  worker_thread+0x630/0xe60
[  667.173486][ T9805]  kthread+0x3a8/0x770
[  667.173857][ T9805]  ret_from_fork+0x517/0x6e0
[  667.174278][ T9805]  ret_from_fork_asm+0x1a/0x30
[  667.174703][ T9805]
[  667.174917][ T9805] Memory state around the buggy address:
[  667.175411][ T9805]  ffff88802592f300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  667.176114][ T9805]  ffff88802592f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  667.176830][ T9805] >ffff88802592f400: 00 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  667.177547][ T9805]                       ^
[  667.177933][ T9805]  ffff88802592f480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  667.178640][ T9805]  ffff88802592f500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  667.179350][ T9805] ==================================================================

The hfsplus_uni2asc() method operates by struct hfsplus_unistr:

struct hfsplus_unistr {
	__be16 length;
	hfsplus_unichr unicode[HFSPLUS_MAX_STRLEN];
} __packed;

where HFSPLUS_MAX_STRLEN is 255 bytes. The issue happens if length
of the structure instance has value bigger than 255 (for example,
65283). In such case, pointer on unicode buffer is going beyond of
the allocated memory.

The patch fixes the issue by checking the length value of
hfsplus_unistr instance and using 255 value in the case if length
value is bigger than HFSPLUS_MAX_STRLEN. Potential reason of such
situation could be a corruption of Catalog File b-tree's node.

Reported-by: Wenzhi Wang <wenzhi.wang@uwaterloo.ca>
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Yangtao Li <frank.li@vivo.com>
Link: https://lore.kernel.org/r/20250710230830.110500-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 22, 2025
commit 87c6efc upstream.

Shuang reported sch_ets test-case [1] crashing in ets_class_qlen_notify()
after recent changes from Lion [2]. The problem is: in ets_qdisc_change()
we purge unused DWRR queues; the value of 'q->nbands' is the new one, and
the cleanup should be done with the old one. The problem is here since my
first attempts to fix ets_qdisc_change(), but it surfaced again after the
recent qdisc len accounting fixes. Fix it purging idle DWRR queues before
assigning a new value of 'q->nbands', so that all purge operations find a
consistent configuration:

 - old 'q->nbands' because it's needed by ets_class_find()
 - old 'q->nstrict' because it's needed by ets_class_is_strict()

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 62 UID: 0 PID: 39457 Comm: tc Kdump: loaded Not tainted 6.12.0-116.el10.x86_64 #1 PREEMPT(voluntary)
 Hardware name: Dell Inc. PowerEdge R640/06DKY5, BIOS 2.12.2 07/09/2021
 RIP: 0010:__list_del_entry_valid_or_report+0x4/0x80
 Code: ff 4c 39 c7 0f 84 39 19 8e ff b8 01 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 8b 17 48 8b 4f 08 48 85 d2 0f 84 56 19 8e ff 48 85 c9 0f 84 ab
 RSP: 0018:ffffba186009f400 EFLAGS: 00010202
 RAX: 00000000000000d6 RBX: 0000000000000000 RCX: 0000000000000004
 RDX: ffff9f0fa29b69c0 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffffffffc12c2400 R08: 0000000000000008 R09: 0000000000000004
 R10: ffffffffffffffff R11: 0000000000000004 R12: 0000000000000000
 R13: ffff9f0f8cfe0000 R14: 0000000000100005 R15: 0000000000000000
 FS:  00007f2154f37480(0000) GS:ffff9f269c1c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000001530be001 CR4: 00000000007726f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  ets_class_qlen_notify+0x65/0x90 [sch_ets]
  qdisc_tree_reduce_backlog+0x74/0x110
  ets_qdisc_change+0x630/0xa40 [sch_ets]
  __tc_modify_qdisc.constprop.0+0x216/0x7f0
  tc_modify_qdisc+0x7c/0x120
  rtnetlink_rcv_msg+0x145/0x3f0
  netlink_rcv_skb+0x53/0x100
  netlink_unicast+0x245/0x390
  netlink_sendmsg+0x21b/0x470
  ____sys_sendmsg+0x39d/0x3d0
  ___sys_sendmsg+0x9a/0xe0
  __sys_sendmsg+0x7a/0xd0
  do_syscall_64+0x7d/0x160
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7f2155114084
 Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d 25 f0 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89
 RSP: 002b:00007fff1fd7a988 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 0000560ec063e5e0 RCX: 00007f2155114084
 RDX: 0000000000000000 RSI: 00007fff1fd7a9f0 RDI: 0000000000000003
 RBP: 00007fff1fd7aa60 R08: 0000000000000010 R09: 000000000000003f
 R10: 0000560ee9b3a010 R11: 0000000000000202 R12: 00007fff1fd7aae0
 R13: 000000006891ccde R14: 0000560ec063e5e0 R15: 00007fff1fd7aad0
  </TASK>

 [1] https://lore.kernel.org/netdev/e08c7f4a6882f260011909a868311c6e9b54f3e4.1639153474.git.dcaratti@redhat.com/
 [2] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/

Cc: stable@vger.kernel.org
Fixes: 103406b ("net/sched: Always pass notifications when child class becomes empty")
Fixes: c062f2a ("net/sched: sch_ets: don't remove idle classes from the round-robin list")
Fixes: dcc68b4 ("net: sch_ets: Add a new Qdisc")
Reported-by: Li Shuang <shuali@redhat.com>
Closes: https://issues.redhat.com/browse/RHEL-108026
Reviewed-by: Petr Machata <petrm@nvidia.com>
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://patch.msgid.link/7928ff6d17db47a2ae7cc205c44777b1f1950545.1755016081.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Aug 22, 2025
commit 33caa20 upstream.

The existing code move the VF NIC to new namespace when NETDEV_REGISTER is
received on netvsc NIC. During deletion of the namespace,
default_device_exit_batch() >> default_device_exit_net() is called. When
netvsc NIC is moved back and registered to the default namespace, it
automatically brings VF NIC back to the default namespace. This will cause
the default_device_exit_net() >> for_each_netdev_safe loop unable to detect
the list end, and hit NULL ptr:

[  231.449420] mana 7870:00:00.0 enP30832s1: Moved VF to namespace with: eth0
[  231.449656] BUG: kernel NULL pointer dereference, address: 0000000000000010
[  231.450246] #PF: supervisor read access in kernel mode
[  231.450579] #PF: error_code(0x0000) - not-present page
[  231.450916] PGD 17b8a8067 P4D 0
[  231.451163] Oops: Oops: 0000 [#1] SMP NOPTI
[  231.451450] CPU: 82 UID: 0 PID: 1394 Comm: kworker/u768:1 Not tainted 6.16.0-rc4+ #3 VOLUNTARY
[  231.452042] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024
[  231.452692] Workqueue: netns cleanup_net
[  231.452947] RIP: 0010:default_device_exit_batch+0x16c/0x3f0
[  231.453326] Code: c0 0c f5 b3 e8 d5 db fe ff 48 85 c0 74 15 48 c7 c2 f8 fd ca b2 be 10 00 00 00 48 8d 7d c0 e8 7b 77 25 00 49 8b 86 28 01 00 00 <48> 8b 50 10 4c 8b 2a 4c 8d 62 f0 49 83 ed 10 4c 39 e0 0f 84 d6 00
[  231.454294] RSP: 0018:ff75fc7c9bf9fd00 EFLAGS: 00010246
[  231.454610] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 61c8864680b583eb
[  231.455094] RDX: ff1fa9f71462d800 RSI: ff75fc7c9bf9fd38 RDI: 0000000030766564
[  231.455686] RBP: ff75fc7c9bf9fd78 R08: 0000000000000000 R09: 0000000000000000
[  231.456126] R10: 0000000000000001 R11: 0000000000000004 R12: ff1fa9f70088e340
[  231.456621] R13: ff1fa9f70088e340 R14: ffffffffb3f50c20 R15: ff1fa9f7103e6340
[  231.457161] FS:  0000000000000000(0000) GS:ff1faa6783a08000(0000) knlGS:0000000000000000
[  231.457707] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  231.458031] CR2: 0000000000000010 CR3: 0000000179ab2006 CR4: 0000000000b73ef0
[  231.458434] Call Trace:
[  231.458600]  <TASK>
[  231.458777]  ops_undo_list+0x100/0x220
[  231.459015]  cleanup_net+0x1b8/0x300
[  231.459285]  process_one_work+0x184/0x340

To fix it, move the ns change to a workqueue, and take rtnl_lock to avoid
changing the netdev list when default_device_exit_net() is using it.

Cc: stable@vger.kernel.org
Fixes: 4c26280 ("hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1754511711-11188-1-git-send-email-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Aug 23, 2025
Commit 3c7ac40 ("scsi: ufs: core: Delegate the interrupt service
routine to a threaded IRQ handler") introduced an IRQ lock inversion
issue. Fix this lock inversion by changing the spin_lock_irq() calls into
spin_lock_irqsave() calls in code that can be called either from
interrupt context or from thread context. This patch fixes the following
lockdep complaint:

WARNING: possible irq lock inversion dependency detected
6.12.30-android16-5-maybe-dirty-4k #1 Tainted: G        W  OE
--------------------------------------------------------
kworker/u28:0/12 just changed the state of lock:
ffffff881e29dd60 (&hba->clk_gating.lock){-...}-{2:2}, at: ufshcd_release_scsi_cmd+0x60/0x110
but this lock took another, HARDIRQ-unsafe lock in the past:
 (shost->host_lock){+.+.}-{2:2}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(shost->host_lock);
                               local_irq_disable();
                               lock(&hba->clk_gating.lock);
                               lock(shost->host_lock);
  <Interrupt>
    lock(&hba->clk_gating.lock);

 *** DEADLOCK ***

4 locks held by kworker/u28:0/12:
 #0: ffffff8800ac6158 ((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0x1bc/0x65c
 #1: ffffffc085c93d70 ((work_completion)(&entry->work)){+.+.}-{0:0}, at: process_one_work+0x1e4/0x65c
 #2: ffffff881e29c0e0 (&shost->scan_mutex){+.+.}-{3:3}, at: __scsi_add_device+0x74/0x120
 #3: ffffff881960ea00 (&hwq->cq_lock){-...}-{2:2}, at: ufshcd_mcq_poll_cqe_lock+0x28/0x104

the shortest dependencies between 2nd lock and 1st lock:
 -> (shost->host_lock){+.+.}-{2:2} {
    HARDIRQ-ON-W at:
                      lock_acquire+0x134/0x2b4
                      _raw_spin_lock+0x48/0x64
                      ufshcd_sl_intr+0x4c/0xa08
                      ufshcd_threaded_intr+0x70/0x12c
                      irq_thread_fn+0x48/0xa8
                      irq_thread+0x130/0x1ec
                      kthread+0x110/0x134
                      ret_from_fork+0x10/0x20
    SOFTIRQ-ON-W at:
                      lock_acquire+0x134/0x2b4
                      _raw_spin_lock+0x48/0x64
                      ufshcd_sl_intr+0x4c/0xa08
                      ufshcd_threaded_intr+0x70/0x12c
                      irq_thread_fn+0x48/0xa8
                      irq_thread+0x130/0x1ec
                      kthread+0x110/0x134
                      ret_from_fork+0x10/0x20
    INITIAL USE at:
                     lock_acquire+0x134/0x2b4
                     _raw_spin_lock+0x48/0x64
                     ufshcd_sl_intr+0x4c/0xa08
                     ufshcd_threaded_intr+0x70/0x12c
                     irq_thread_fn+0x48/0xa8
                     irq_thread+0x130/0x1ec
                     kthread+0x110/0x134
                     ret_from_fork+0x10/0x20
  }
  ... key      at: [<ffffffc085ba1a98>] scsi_host_alloc.__key+0x0/0x10
  ... acquired at:
   _raw_spin_lock_irqsave+0x5c/0x80
   __ufshcd_release+0x78/0x118
   ufshcd_send_uic_cmd+0xe4/0x118
   ufshcd_dme_set_attr+0x88/0x1c8
   ufs_google_phy_initialization+0x68/0x418 [ufs]
   ufs_google_link_startup_notify+0x78/0x27c [ufs]
   ufshcd_link_startup+0x84/0x720
   ufshcd_init+0xf3c/0x1330
   ufshcd_pltfrm_init+0x728/0x7d8
   ufs_google_probe+0x30/0x84 [ufs]
   platform_probe+0xa0/0xe0
   really_probe+0x114/0x454
   __driver_probe_device+0xa4/0x160
   driver_probe_device+0x44/0x23c
   __driver_attach_async_helper+0x60/0xd4
   async_run_entry_fn+0x4c/0x17c
   process_one_work+0x26c/0x65c
   worker_thread+0x33c/0x498
   kthread+0x110/0x134
   ret_from_fork+0x10/0x20

-> (&hba->clk_gating.lock){-...}-{2:2} {
   IN-HARDIRQ-W at:
                    lock_acquire+0x134/0x2b4
                    _raw_spin_lock_irqsave+0x5c/0x80
                    ufshcd_release_scsi_cmd+0x60/0x110
                    ufshcd_compl_one_cqe+0x2c0/0x3f4
                    ufshcd_mcq_poll_cqe_lock+0xb0/0x104
                    ufs_google_mcq_intr+0x80/0xa0 [ufs]
                    __handle_irq_event_percpu+0x104/0x32c
                    handle_irq_event+0x40/0x9c
                    handle_fasteoi_irq+0x170/0x2e8
                    generic_handle_domain_irq+0x58/0x80
                    gic_handle_irq+0x48/0x104
                    call_on_irq_stack+0x3c/0x50
                    do_interrupt_handler+0x7c/0xd8
                    el1_interrupt+0x34/0x58
                    el1h_64_irq_handler+0x18/0x24
                    el1h_64_irq+0x68/0x6c
                    _raw_spin_unlock_irqrestore+0x3c/0x6c
                    debug_object_assert_init+0x16c/0x21c
                    __mod_timer+0x4c/0x48c
                    schedule_timeout+0xd4/0x16c
                    io_schedule_timeout+0x48/0x70
                    do_wait_for_common+0x100/0x194
                    wait_for_completion_io_timeout+0x48/0x6c
                    blk_execute_rq+0x124/0x17c
                    scsi_execute_cmd+0x18c/0x3f8
                    scsi_probe_and_add_lun+0x204/0xd74
                    __scsi_add_device+0xbc/0x120
                    ufshcd_async_scan+0x80/0x3c0
                    async_run_entry_fn+0x4c/0x17c
                    process_one_work+0x26c/0x65c
                    worker_thread+0x33c/0x498
                    kthread+0x110/0x134
                    ret_from_fork+0x10/0x20
   INITIAL USE at:
                   lock_acquire+0x134/0x2b4
                   _raw_spin_lock_irqsave+0x5c/0x80
                   ufshcd_hold+0x34/0x14c
                   ufshcd_send_uic_cmd+0x28/0x118
                   ufshcd_dme_set_attr+0x88/0x1c8
                   ufs_google_phy_initialization+0x68/0x418 [ufs]
                   ufs_google_link_startup_notify+0x78/0x27c [ufs]
                   ufshcd_link_startup+0x84/0x720
                   ufshcd_init+0xf3c/0x1330
                   ufshcd_pltfrm_init+0x728/0x7d8
                   ufs_google_probe+0x30/0x84 [ufs]
                   platform_probe+0xa0/0xe0
                   really_probe+0x114/0x454
                   __driver_probe_device+0xa4/0x160
                   driver_probe_device+0x44/0x23c
                   __driver_attach_async_helper+0x60/0xd4
                   async_run_entry_fn+0x4c/0x17c
                   process_one_work+0x26c/0x65c
                   worker_thread+0x33c/0x498
                   kthread+0x110/0x134
                   ret_from_fork+0x10/0x20
 }
 ... key      at: [<ffffffc085ba6fe8>] ufshcd_init.__key+0x0/0x10
 ... acquired at:
   mark_lock+0x1c4/0x224
   __lock_acquire+0x438/0x2e1c
   lock_acquire+0x134/0x2b4
   _raw_spin_lock_irqsave+0x5c/0x80
   ufshcd_release_scsi_cmd+0x60/0x110
   ufshcd_compl_one_cqe+0x2c0/0x3f4
   ufshcd_mcq_poll_cqe_lock+0xb0/0x104
   ufs_google_mcq_intr+0x80/0xa0 [ufs]
   __handle_irq_event_percpu+0x104/0x32c
   handle_irq_event+0x40/0x9c
   handle_fasteoi_irq+0x170/0x2e8
   generic_handle_domain_irq+0x58/0x80
   gic_handle_irq+0x48/0x104
   call_on_irq_stack+0x3c/0x50
   do_interrupt_handler+0x7c/0xd8
   el1_interrupt+0x34/0x58
   el1h_64_irq_handler+0x18/0x24
   el1h_64_irq+0x68/0x6c
   _raw_spin_unlock_irqrestore+0x3c/0x6c
   debug_object_assert_init+0x16c/0x21c
   __mod_timer+0x4c/0x48c
   schedule_timeout+0xd4/0x16c
   io_schedule_timeout+0x48/0x70
   do_wait_for_common+0x100/0x194
   wait_for_completion_io_timeout+0x48/0x6c
   blk_execute_rq+0x124/0x17c
   scsi_execute_cmd+0x18c/0x3f8
   scsi_probe_and_add_lun+0x204/0xd74
   __scsi_add_device+0xbc/0x120
   ufshcd_async_scan+0x80/0x3c0
   async_run_entry_fn+0x4c/0x17c
   process_one_work+0x26c/0x65c
   worker_thread+0x33c/0x498
   kthread+0x110/0x134
   ret_from_fork+0x10/0x20

stack backtrace:
CPU: 6 UID: 0 PID: 12 Comm: kworker/u28:0 Tainted: G        W  OE      6.12.30-android16-5-maybe-dirty-4k #1 ccd4020fe444bdf629efc3b86df6be920b8df7d0
Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Spacecraft board based on MALIBU (DT)
Workqueue: async async_run_entry_fn
Call trace:
 dump_backtrace+0xfc/0x17c
 show_stack+0x18/0x28
 dump_stack_lvl+0x40/0xa0
 dump_stack+0x18/0x24
 print_irq_inversion_bug+0x2fc/0x304
 mark_lock_irq+0x388/0x4fc
 mark_lock+0x1c4/0x224
 __lock_acquire+0x438/0x2e1c
 lock_acquire+0x134/0x2b4
 _raw_spin_lock_irqsave+0x5c/0x80
 ufshcd_release_scsi_cmd+0x60/0x110
 ufshcd_compl_one_cqe+0x2c0/0x3f4
 ufshcd_mcq_poll_cqe_lock+0xb0/0x104
 ufs_google_mcq_intr+0x80/0xa0 [ufs dd6f385554e109da094ab91d5f7be18625a2222a]
 __handle_irq_event_percpu+0x104/0x32c
 handle_irq_event+0x40/0x9c
 handle_fasteoi_irq+0x170/0x2e8
 generic_handle_domain_irq+0x58/0x80
 gic_handle_irq+0x48/0x104
 call_on_irq_stack+0x3c/0x50
 do_interrupt_handler+0x7c/0xd8
 el1_interrupt+0x34/0x58
 el1h_64_irq_handler+0x18/0x24
 el1h_64_irq+0x68/0x6c
 _raw_spin_unlock_irqrestore+0x3c/0x6c
 debug_object_assert_init+0x16c/0x21c
 __mod_timer+0x4c/0x48c
 schedule_timeout+0xd4/0x16c
 io_schedule_timeout+0x48/0x70
 do_wait_for_common+0x100/0x194
 wait_for_completion_io_timeout+0x48/0x6c
 blk_execute_rq+0x124/0x17c
 scsi_execute_cmd+0x18c/0x3f8
 scsi_probe_and_add_lun+0x204/0xd74
 __scsi_add_device+0xbc/0x120
 ufshcd_async_scan+0x80/0x3c0
 async_run_entry_fn+0x4c/0x17c
 process_one_work+0x26c/0x65c
 worker_thread+0x33c/0x498
 kthread+0x110/0x134
 ret_from_fork+0x10/0x20

Cc: Neil Armstrong <neil.armstrong@linaro.org>
Cc: André Draszik <andre.draszik@linaro.org>
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Fixes: 3c7ac40 ("scsi: ufs: core: Delegate the interrupt service routine to a threaded IRQ handler")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20250815155842.472867-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
donald pushed a commit that referenced this issue Aug 23, 2025
The mm/debug_vm_pagetable test allocates manually page table entries for
the tests it runs, using also its manually allocated mm_struct.  That in
itself is ok, but when it exits, at destroy_args() it fails to clear those
entries with the *_clear functions.

The problem is that leaves stale entries.  If another process allocates an
mm_struct with a pgd at the same address, it may end up running into the
stale entry.  This is happening in practice on a debug kernel with
CONFIG_DEBUG_VM_PGTABLE=y, for example this is the output with some extra
debugging I added (it prints a warning trace if pgtables_bytes goes
negative, in addition to the warning at check_mm() function):

[    2.539353] debug_vm_pgtable: [get_random_vaddr         ]: random_vaddr is 0x7ea247140000
[    2.539366] kmem_cache info
[    2.539374] kmem_cachep 0x000000002ce82385 - freelist 0x0000000000000000 - offset 0x508
[    2.539447] debug_vm_pgtable: [init_args                ]: args->mm is 0x000000002267cc9e
(...)
[    2.552800] WARNING: CPU: 5 PID: 116 at include/linux/mm.h:2841 free_pud_range+0x8bc/0x8d0
[    2.552816] Modules linked in:
[    2.552843] CPU: 5 UID: 0 PID: 116 Comm: modprobe Not tainted 6.12.0-105.debug_vm2.el10.ppc64le+debug #1 VOLUNTARY
[    2.552859] Hardware name: IBM,9009-41A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW910.00 (VL910_062) hv:phyp pSeries
[    2.552872] NIP:  c0000000007eef3c LR: c0000000007eef30 CTR: c0000000003d8c90
[    2.552885] REGS: c0000000622e73b0 TRAP: 0700   Not tainted  (6.12.0-105.debug_vm2.el10.ppc64le+debug)
[    2.552899] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002822  XER: 0000000a
[    2.552954] CFAR: c0000000008f03f0 IRQMASK: 0
[    2.552954] GPR00: c0000000007eef30 c0000000622e7650 c000000002b1ac00 0000000000000001
[    2.552954] GPR04: 0000000000000008 0000000000000000 c0000000007eef30 ffffffffffffffff
[    2.552954] GPR08: 00000000ffff00f5 0000000000000001 0000000000000048 0000000000004000
[    2.552954] GPR12: 00000003fa440000 c000000017ffa300 c0000000051d9f80 ffffffffffffffdb
[    2.552954] GPR16: 0000000000000000 0000000000000008 000000000000000a 60000000000000e0
[    2.552954] GPR20: 4080000000000000 c0000000113af038 00007fffcf130000 0000700000000000
[    2.552954] GPR24: c000000062a6a000 0000000000000001 8000000062a68000 0000000000000001
[    2.552954] GPR28: 000000000000000a c000000062ebc600 0000000000002000 c000000062ebc760
[    2.553170] NIP [c0000000007eef3c] free_pud_range+0x8bc/0x8d0
[    2.553185] LR [c0000000007eef30] free_pud_range+0x8b0/0x8d0
[    2.553199] Call Trace:
[    2.553207] [c0000000622e7650] [c0000000007eef30] free_pud_range+0x8b0/0x8d0 (unreliable)
[    2.553229] [c0000000622e7750] [c0000000007f40b4] free_pgd_range+0x284/0x3b0
[    2.553248] [c0000000622e7800] [c0000000007f4630] free_pgtables+0x450/0x570
[    2.553274] [c0000000622e78e0] [c0000000008161c0] exit_mmap+0x250/0x650
[    2.553292] [c0000000622e7a30] [c0000000001b95b8] __mmput+0x98/0x290
[    2.558344] [c0000000622e7a80] [c0000000001d1018] exit_mm+0x118/0x1b0
[    2.558361] [c0000000622e7ac0] [c0000000001d141c] do_exit+0x2ec/0x870
[    2.558376] [c0000000622e7b60] [c0000000001d1ca8] do_group_exit+0x88/0x150
[    2.558391] [c0000000622e7bb0] [c0000000001d1db8] sys_exit_group+0x48/0x50
[    2.558407] [c0000000622e7be0] [c00000000003d810] system_call_exception+0x1e0/0x4c0
[    2.558423] [c0000000622e7e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
(...)
[    2.558892] ---[ end trace 0000000000000000 ]---
[    2.559022] BUG: Bad rss-counter state mm:000000002267cc9e type:MM_ANONPAGES val:1
[    2.559037] BUG: non-zero pgtables_bytes on freeing mm: -6144

Here the modprobe process ended up with an allocated mm_struct from the
mm_struct slab that was used before by the debug_vm_pgtable test.  That is
not a problem, since the mm_struct is initialized again etc., however, if
it ends up using the same pgd table, it bumps into the old stale entry
when clearing/freeing the page table entries, so it tries to free an entry
already gone (that one which was allocated by the debug_vm_pgtable test),
which also explains the negative pgtables_bytes since it's accounting for
not allocated entries in the current process.

As far as I looked pgd_{alloc,free} etc.  does not clear entries, and
clearing of the entries is explicitly done in the free_pgtables->
free_pgd_range->free_p4d_range->free_pud_range->free_pmd_range->
free_pte_range path.  However, the debug_vm_pgtable test does not call
free_pgtables, since it allocates mm_struct and entries manually for its
test and eg.  not goes through page faults.  So it also should clear
manually the entries before exit at destroy_args().

This problem was noticed on a reboot X number of times test being done on
a powerpc host, with a debug kernel with CONFIG_DEBUG_VM_PGTABLE enabled. 
Depends on the system, but on a 100 times reboot loop the problem could
manifest once or twice, if a process ends up getting the right mm->pgd
entry with the stale entries used by mm/debug_vm_pagetable.  After using
this patch, I couldn't reproduce/experience the problems anymore.  I was
able to reproduce the problem as well on latest upstream kernel (6.16).

I also modified destroy_args() to use mmput() instead of mmdrop(), there
is no reason to hold mm_users reference and not release the mm_struct
entirely, and in the output above with my debugging prints I already had
patched it to use mmput, it did not fix the problem, but helped in the
debugging as well.

Link: https://lkml.kernel.org/r/20250731214051.4115182-1-herton@redhat.com
Fixes: 3c9b84f ("mm/debug_vm_pgtable: introduce struct pgtable_debug_args")
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
donald pushed a commit that referenced this issue Aug 23, 2025
[ Upstream commit 736a051 ]

The hfs_find_init() method can trigger the crash
if tree pointer is NULL:

[   45.746290][ T9787] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000008: 0000 [#1] SMP KAI
[   45.747287][ T9787] KASAN: null-ptr-deref in range [0x0000000000000040-0x0000000000000047]
[   45.748716][ T9787] CPU: 2 UID: 0 PID: 9787 Comm: repro Not tainted 6.16.0-rc3 #10 PREEMPT(full)
[   45.750250][ T9787] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   45.751983][ T9787] RIP: 0010:hfs_find_init+0x86/0x230
[   45.752834][ T9787] Code: c1 ea 03 80 3c 02 00 0f 85 9a 01 00 00 4c 8d 6b 40 48 c7 45 18 00 00 00 00 48 b8 00 00 00 00 00 fc
[   45.755574][ T9787] RSP: 0018:ffffc90015157668 EFLAGS: 00010202
[   45.756432][ T9787] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff819a4d09
[   45.757457][ T9787] RDX: 0000000000000008 RSI: ffffffff819acd3a RDI: ffffc900151576e8
[   45.758282][ T9787] RBP: ffffc900151576d0 R08: 0000000000000005 R09: 0000000000000000
[   45.758943][ T9787] R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000004
[   45.759619][ T9787] R13: 0000000000000040 R14: ffff88802c50814a R15: 0000000000000000
[   45.760293][ T9787] FS:  00007ffb72734540(0000) GS:ffff8880cec64000(0000) knlGS:0000000000000000
[   45.761050][ T9787] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.761606][ T9787] CR2: 00007f9bd8225000 CR3: 000000010979a000 CR4: 00000000000006f0
[   45.762286][ T9787] Call Trace:
[   45.762570][ T9787]  <TASK>
[   45.762824][ T9787]  hfs_ext_read_extent+0x190/0x9d0
[   45.763269][ T9787]  ? submit_bio_noacct_nocheck+0x2dd/0xce0
[   45.763766][ T9787]  ? __pfx_hfs_ext_read_extent+0x10/0x10
[   45.764250][ T9787]  hfs_get_block+0x55f/0x830
[   45.764646][ T9787]  block_read_full_folio+0x36d/0x850
[   45.765105][ T9787]  ? __pfx_hfs_get_block+0x10/0x10
[   45.765541][ T9787]  ? const_folio_flags+0x5b/0x100
[   45.765972][ T9787]  ? __pfx_hfs_read_folio+0x10/0x10
[   45.766415][ T9787]  filemap_read_folio+0xbe/0x290
[   45.766840][ T9787]  ? __pfx_filemap_read_folio+0x10/0x10
[   45.767325][ T9787]  ? __filemap_get_folio+0x32b/0xbf0
[   45.767780][ T9787]  do_read_cache_folio+0x263/0x5c0
[   45.768223][ T9787]  ? __pfx_hfs_read_folio+0x10/0x10
[   45.768666][ T9787]  read_cache_page+0x5b/0x160
[   45.769070][ T9787]  hfs_btree_open+0x491/0x1740
[   45.769481][ T9787]  hfs_mdb_get+0x15e2/0x1fb0
[   45.769877][ T9787]  ? __pfx_hfs_mdb_get+0x10/0x10
[   45.770316][ T9787]  ? find_held_lock+0x2b/0x80
[   45.770731][ T9787]  ? lockdep_init_map_type+0x5c/0x280
[   45.771200][ T9787]  ? lockdep_init_map_type+0x5c/0x280
[   45.771674][ T9787]  hfs_fill_super+0x38e/0x720
[   45.772092][ T9787]  ? __pfx_hfs_fill_super+0x10/0x10
[   45.772549][ T9787]  ? snprintf+0xbe/0x100
[   45.772931][ T9787]  ? __pfx_snprintf+0x10/0x10
[   45.773350][ T9787]  ? do_raw_spin_lock+0x129/0x2b0
[   45.773796][ T9787]  ? find_held_lock+0x2b/0x80
[   45.774215][ T9787]  ? set_blocksize+0x40a/0x510
[   45.774636][ T9787]  ? sb_set_blocksize+0x176/0x1d0
[   45.775087][ T9787]  ? setup_bdev_super+0x369/0x730
[   45.775533][ T9787]  get_tree_bdev_flags+0x384/0x620
[   45.775985][ T9787]  ? __pfx_hfs_fill_super+0x10/0x10
[   45.776453][ T9787]  ? __pfx_get_tree_bdev_flags+0x10/0x10
[   45.776950][ T9787]  ? bpf_lsm_capable+0x9/0x10
[   45.777365][ T9787]  ? security_capable+0x80/0x260
[   45.777803][ T9787]  vfs_get_tree+0x8e/0x340
[   45.778203][ T9787]  path_mount+0x13de/0x2010
[   45.778604][ T9787]  ? kmem_cache_free+0x2b0/0x4c0
[   45.779052][ T9787]  ? __pfx_path_mount+0x10/0x10
[   45.779480][ T9787]  ? getname_flags.part.0+0x1c5/0x550
[   45.779954][ T9787]  ? putname+0x154/0x1a0
[   45.780335][ T9787]  __x64_sys_mount+0x27b/0x300
[   45.780758][ T9787]  ? __pfx___x64_sys_mount+0x10/0x10
[   45.781232][ T9787]  do_syscall_64+0xc9/0x480
[   45.781631][ T9787]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   45.782149][ T9787] RIP: 0033:0x7ffb7265b6ca
[   45.782539][ T9787] Code: 48 8b 0d c9 17 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48
[   45.784212][ T9787] RSP: 002b:00007ffc0c10cfb8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[   45.784935][ T9787] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffb7265b6ca
[   45.785626][ T9787] RDX: 0000200000000240 RSI: 0000200000000280 RDI: 00007ffc0c10d100
[   45.786316][ T9787] RBP: 00007ffc0c10d190 R08: 00007ffc0c10d000 R09: 0000000000000000
[   45.787011][ T9787] R10: 0000000000000048 R11: 0000000000000206 R12: 0000560246733250
[   45.787697][ T9787] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   45.788393][ T9787]  </TASK>
[   45.788665][ T9787] Modules linked in:
[   45.789058][ T9787] ---[ end trace 0000000000000000 ]---
[   45.789554][ T9787] RIP: 0010:hfs_find_init+0x86/0x230
[   45.790028][ T9787] Code: c1 ea 03 80 3c 02 00 0f 85 9a 01 00 00 4c 8d 6b 40 48 c7 45 18 00 00 00 00 48 b8 00 00 00 00 00 fc
[   45.792364][ T9787] RSP: 0018:ffffc90015157668 EFLAGS: 00010202
[   45.793155][ T9787] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff819a4d09
[   45.794123][ T9787] RDX: 0000000000000008 RSI: ffffffff819acd3a RDI: ffffc900151576e8
[   45.795105][ T9787] RBP: ffffc900151576d0 R08: 0000000000000005 R09: 0000000000000000
[   45.796135][ T9787] R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000004
[   45.797114][ T9787] R13: 0000000000000040 R14: ffff88802c50814a R15: 0000000000000000
[   45.798024][ T9787] FS:  00007ffb72734540(0000) GS:ffff8880cec64000(0000) knlGS:0000000000000000
[   45.799019][ T9787] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.799822][ T9787] CR2: 00007f9bd8225000 CR3: 000000010979a000 CR4: 00000000000006f0
[   45.800747][ T9787] Kernel panic - not syncing: Fatal exception

The hfs_fill_super() calls hfs_mdb_get() method that tries
to construct Extents Tree and Catalog Tree:

HFS_SB(sb)->ext_tree = hfs_btree_open(sb, HFS_EXT_CNID, hfs_ext_keycmp);
if (!HFS_SB(sb)->ext_tree) {
	pr_err("unable to open extent tree\n");
	goto out;
}
HFS_SB(sb)->cat_tree = hfs_btree_open(sb, HFS_CAT_CNID, hfs_cat_keycmp);
if (!HFS_SB(sb)->cat_tree) {
	pr_err("unable to open catalog tree\n");
	goto out;
}

However, hfs_btree_open() calls read_mapping_page() that
calls hfs_get_block(). And this method calls hfs_ext_read_extent():

static int hfs_ext_read_extent(struct inode *inode, u16 block)
{
	struct hfs_find_data fd;
	int res;

	if (block >= HFS_I(inode)->cached_start &&
	    block < HFS_I(inode)->cached_start + HFS_I(inode)->cached_blocks)
		return 0;

	res = hfs_find_init(HFS_SB(inode->i_sb)->ext_tree, &fd);
	if (!res) {
		res = __hfs_ext_cache_extent(&fd, inode, block);
		hfs_find_exit(&fd);
	}
	return res;
}

The problem here that hfs_find_init() is trying to use
HFS_SB(inode->i_sb)->ext_tree that is not initialized yet.
It will be initailized when hfs_btree_open() finishes
the execution.

The patch adds checking of tree pointer in hfs_find_init()
and it reworks the logic of hfs_btree_open() by reading
the b-tree's header directly from the volume. The read_mapping_page()
is exchanged on filemap_grab_folio() that grab the folio from
mapping. Then, sb_bread() extracts the b-tree's header
content and copy it into the folio.

Reported-by: Wenzhi Wang <wenzhi.wang@uwaterloo.ca>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20250710213657.108285-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 23, 2025
[ Upstream commit 9445878 ]

The hfsplus_readdir() method is capable to crash by calling
hfsplus_uni2asc():

[  667.121659][ T9805] ==================================================================
[  667.122651][ T9805] BUG: KASAN: slab-out-of-bounds in hfsplus_uni2asc+0x902/0xa10
[  667.123627][ T9805] Read of size 2 at addr ffff88802592f40c by task repro/9805
[  667.124578][ T9805]
[  667.124876][ T9805] CPU: 3 UID: 0 PID: 9805 Comm: repro Not tainted 6.16.0-rc3 #1 PREEMPT(full)
[  667.124886][ T9805] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  667.124890][ T9805] Call Trace:
[  667.124893][ T9805]  <TASK>
[  667.124896][ T9805]  dump_stack_lvl+0x10e/0x1f0
[  667.124911][ T9805]  print_report+0xd0/0x660
[  667.124920][ T9805]  ? __virt_addr_valid+0x81/0x610
[  667.124928][ T9805]  ? __phys_addr+0xe8/0x180
[  667.124934][ T9805]  ? hfsplus_uni2asc+0x902/0xa10
[  667.124942][ T9805]  kasan_report+0xc6/0x100
[  667.124950][ T9805]  ? hfsplus_uni2asc+0x902/0xa10
[  667.124959][ T9805]  hfsplus_uni2asc+0x902/0xa10
[  667.124966][ T9805]  ? hfsplus_bnode_read+0x14b/0x360
[  667.124974][ T9805]  hfsplus_readdir+0x845/0xfc0
[  667.124984][ T9805]  ? __pfx_hfsplus_readdir+0x10/0x10
[  667.124994][ T9805]  ? stack_trace_save+0x8e/0xc0
[  667.125008][ T9805]  ? iterate_dir+0x18b/0xb20
[  667.125015][ T9805]  ? trace_lock_acquire+0x85/0xd0
[  667.125022][ T9805]  ? lock_acquire+0x30/0x80
[  667.125029][ T9805]  ? iterate_dir+0x18b/0xb20
[  667.125037][ T9805]  ? down_read_killable+0x1ed/0x4c0
[  667.125044][ T9805]  ? putname+0x154/0x1a0
[  667.125051][ T9805]  ? __pfx_down_read_killable+0x10/0x10
[  667.125058][ T9805]  ? apparmor_file_permission+0x239/0x3e0
[  667.125069][ T9805]  iterate_dir+0x296/0xb20
[  667.125076][ T9805]  __x64_sys_getdents64+0x13c/0x2c0
[  667.125084][ T9805]  ? __pfx___x64_sys_getdents64+0x10/0x10
[  667.125091][ T9805]  ? __x64_sys_openat+0x141/0x200
[  667.125126][ T9805]  ? __pfx_filldir64+0x10/0x10
[  667.125134][ T9805]  ? do_user_addr_fault+0x7fe/0x12f0
[  667.125143][ T9805]  do_syscall_64+0xc9/0x480
[  667.125151][ T9805]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  667.125158][ T9805] RIP: 0033:0x7fa8753b2fc9
[  667.125164][ T9805] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 48
[  667.125172][ T9805] RSP: 002b:00007ffe96f8e0f8 EFLAGS: 00000217 ORIG_RAX: 00000000000000d9
[  667.125181][ T9805] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa8753b2fc9
[  667.125185][ T9805] RDX: 0000000000000400 RSI: 00002000000063c0 RDI: 0000000000000004
[  667.125190][ T9805] RBP: 00007ffe96f8e110 R08: 00007ffe96f8e110 R09: 00007ffe96f8e110
[  667.125195][ T9805] R10: 0000000000000000 R11: 0000000000000217 R12: 0000556b1e3b4260
[  667.125199][ T9805] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  667.125207][ T9805]  </TASK>
[  667.125210][ T9805]
[  667.145632][ T9805] Allocated by task 9805:
[  667.145991][ T9805]  kasan_save_stack+0x20/0x40
[  667.146352][ T9805]  kasan_save_track+0x14/0x30
[  667.146717][ T9805]  __kasan_kmalloc+0xaa/0xb0
[  667.147065][ T9805]  __kmalloc_noprof+0x205/0x550
[  667.147448][ T9805]  hfsplus_find_init+0x95/0x1f0
[  667.147813][ T9805]  hfsplus_readdir+0x220/0xfc0
[  667.148174][ T9805]  iterate_dir+0x296/0xb20
[  667.148549][ T9805]  __x64_sys_getdents64+0x13c/0x2c0
[  667.148937][ T9805]  do_syscall_64+0xc9/0x480
[  667.149291][ T9805]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  667.149809][ T9805]
[  667.150030][ T9805] The buggy address belongs to the object at ffff88802592f000
[  667.150030][ T9805]  which belongs to the cache kmalloc-2k of size 2048
[  667.151282][ T9805] The buggy address is located 0 bytes to the right of
[  667.151282][ T9805]  allocated 1036-byte region [ffff88802592f000, ffff88802592f40c)
[  667.152580][ T9805]
[  667.152798][ T9805] The buggy address belongs to the physical page:
[  667.153373][ T9805] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x25928
[  667.154157][ T9805] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[  667.154916][ T9805] anon flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff)
[  667.155631][ T9805] page_type: f5(slab)
[  667.155997][ T9805] raw: 00fff00000000040 ffff88801b442f00 0000000000000000 dead000000000001
[  667.156770][ T9805] raw: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000
[  667.157536][ T9805] head: 00fff00000000040 ffff88801b442f00 0000000000000000 dead000000000001
[  667.158317][ T9805] head: 0000000000000000 0000000080080008 00000000f5000000 0000000000000000
[  667.159088][ T9805] head: 00fff00000000003 ffffea0000964a01 00000000ffffffff 00000000ffffffff
[  667.159865][ T9805] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
[  667.160643][ T9805] page dumped because: kasan: bad access detected
[  667.161216][ T9805] page_owner tracks the page as allocated
[  667.161732][ T9805] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN9
[  667.163566][ T9805]  post_alloc_hook+0x1c0/0x230
[  667.164003][ T9805]  get_page_from_freelist+0xdeb/0x3b30
[  667.164503][ T9805]  __alloc_frozen_pages_noprof+0x25c/0x2460
[  667.165040][ T9805]  alloc_pages_mpol+0x1fb/0x550
[  667.165489][ T9805]  new_slab+0x23b/0x340
[  667.165872][ T9805]  ___slab_alloc+0xd81/0x1960
[  667.166313][ T9805]  __slab_alloc.isra.0+0x56/0xb0
[  667.166767][ T9805]  __kmalloc_cache_noprof+0x255/0x3e0
[  667.167255][ T9805]  psi_cgroup_alloc+0x52/0x2d0
[  667.167693][ T9805]  cgroup_mkdir+0x694/0x1210
[  667.168118][ T9805]  kernfs_iop_mkdir+0x111/0x190
[  667.168568][ T9805]  vfs_mkdir+0x59b/0x8d0
[  667.168956][ T9805]  do_mkdirat+0x2ed/0x3d0
[  667.169353][ T9805]  __x64_sys_mkdir+0xef/0x140
[  667.169784][ T9805]  do_syscall_64+0xc9/0x480
[  667.170195][ T9805]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  667.170730][ T9805] page last free pid 1257 tgid 1257 stack trace:
[  667.171304][ T9805]  __free_frozen_pages+0x80c/0x1250
[  667.171770][ T9805]  vfree.part.0+0x12b/0xab0
[  667.172182][ T9805]  delayed_vfree_work+0x93/0xd0
[  667.172612][ T9805]  process_one_work+0x9b5/0x1b80
[  667.173067][ T9805]  worker_thread+0x630/0xe60
[  667.173486][ T9805]  kthread+0x3a8/0x770
[  667.173857][ T9805]  ret_from_fork+0x517/0x6e0
[  667.174278][ T9805]  ret_from_fork_asm+0x1a/0x30
[  667.174703][ T9805]
[  667.174917][ T9805] Memory state around the buggy address:
[  667.175411][ T9805]  ffff88802592f300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  667.176114][ T9805]  ffff88802592f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  667.176830][ T9805] >ffff88802592f400: 00 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  667.177547][ T9805]                       ^
[  667.177933][ T9805]  ffff88802592f480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  667.178640][ T9805]  ffff88802592f500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  667.179350][ T9805] ==================================================================

The hfsplus_uni2asc() method operates by struct hfsplus_unistr:

struct hfsplus_unistr {
	__be16 length;
	hfsplus_unichr unicode[HFSPLUS_MAX_STRLEN];
} __packed;

where HFSPLUS_MAX_STRLEN is 255 bytes. The issue happens if length
of the structure instance has value bigger than 255 (for example,
65283). In such case, pointer on unicode buffer is going beyond of
the allocated memory.

The patch fixes the issue by checking the length value of
hfsplus_unistr instance and using 255 value in the case if length
value is bigger than HFSPLUS_MAX_STRLEN. Potential reason of such
situation could be a corruption of Catalog File b-tree's node.

Reported-by: Wenzhi Wang <wenzhi.wang@uwaterloo.ca>
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Yangtao Li <frank.li@vivo.com>
Link: https://lore.kernel.org/r/20250710230830.110500-1-slava@dubeyko.com
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
donald pushed a commit that referenced this issue Aug 23, 2025
commit 87c6efc upstream.

Shuang reported sch_ets test-case [1] crashing in ets_class_qlen_notify()
after recent changes from Lion [2]. The problem is: in ets_qdisc_change()
we purge unused DWRR queues; the value of 'q->nbands' is the new one, and
the cleanup should be done with the old one. The problem is here since my
first attempts to fix ets_qdisc_change(), but it surfaced again after the
recent qdisc len accounting fixes. Fix it purging idle DWRR queues before
assigning a new value of 'q->nbands', so that all purge operations find a
consistent configuration:

 - old 'q->nbands' because it's needed by ets_class_find()
 - old 'q->nstrict' because it's needed by ets_class_is_strict()

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 62 UID: 0 PID: 39457 Comm: tc Kdump: loaded Not tainted 6.12.0-116.el10.x86_64 #1 PREEMPT(voluntary)
 Hardware name: Dell Inc. PowerEdge R640/06DKY5, BIOS 2.12.2 07/09/2021
 RIP: 0010:__list_del_entry_valid_or_report+0x4/0x80
 Code: ff 4c 39 c7 0f 84 39 19 8e ff b8 01 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 8b 17 48 8b 4f 08 48 85 d2 0f 84 56 19 8e ff 48 85 c9 0f 84 ab
 RSP: 0018:ffffba186009f400 EFLAGS: 00010202
 RAX: 00000000000000d6 RBX: 0000000000000000 RCX: 0000000000000004
 RDX: ffff9f0fa29b69c0 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffffffffc12c2400 R08: 0000000000000008 R09: 0000000000000004
 R10: ffffffffffffffff R11: 0000000000000004 R12: 0000000000000000
 R13: ffff9f0f8cfe0000 R14: 0000000000100005 R15: 0000000000000000
 FS:  00007f2154f37480(0000) GS:ffff9f269c1c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000001530be001 CR4: 00000000007726f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  ets_class_qlen_notify+0x65/0x90 [sch_ets]
  qdisc_tree_reduce_backlog+0x74/0x110
  ets_qdisc_change+0x630/0xa40 [sch_ets]
  __tc_modify_qdisc.constprop.0+0x216/0x7f0
  tc_modify_qdisc+0x7c/0x120
  rtnetlink_rcv_msg+0x145/0x3f0
  netlink_rcv_skb+0x53/0x100
  netlink_unicast+0x245/0x390
  netlink_sendmsg+0x21b/0x470
  ____sys_sendmsg+0x39d/0x3d0
  ___sys_sendmsg+0x9a/0xe0
  __sys_sendmsg+0x7a/0xd0
  do_syscall_64+0x7d/0x160
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7f2155114084
 Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d 25 f0 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89
 RSP: 002b:00007fff1fd7a988 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 0000560ec063e5e0 RCX: 00007f2155114084
 RDX: 0000000000000000 RSI: 00007fff1fd7a9f0 RDI: 0000000000000003
 RBP: 00007fff1fd7aa60 R08: 0000000000000010 R09: 000000000000003f
 R10: 0000560ee9b3a010 R11: 0000000000000202 R12: 00007fff1fd7aae0
 R13: 000000006891ccde R14: 0000560ec063e5e0 R15: 00007fff1fd7aad0
  </TASK>

 [1] https://lore.kernel.org/netdev/e08c7f4a6882f260011909a868311c6e9b54f3e4.1639153474.git.dcaratti@redhat.com/
 [2] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/

Cc: stable@vger.kernel.org
Fixes: 103406b ("net/sched: Always pass notifications when child class becomes empty")
Fixes: c062f2a ("net/sched: sch_ets: don't remove idle classes from the round-robin list")
Fixes: dcc68b4 ("net: sch_ets: Add a new Qdisc")
Reported-by: Li Shuang <shuali@redhat.com>
Closes: https://issues.redhat.com/browse/RHEL-108026
Reviewed-by: Petr Machata <petrm@nvidia.com>
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://patch.msgid.link/7928ff6d17db47a2ae7cc205c44777b1f1950545.1755016081.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
donald pushed a commit that referenced this issue Aug 23, 2025
commit 33caa20 upstream.

The existing code move the VF NIC to new namespace when NETDEV_REGISTER is
received on netvsc NIC. During deletion of the namespace,
default_device_exit_batch() >> default_device_exit_net() is called. When
netvsc NIC is moved back and registered to the default namespace, it
automatically brings VF NIC back to the default namespace. This will cause
the default_device_exit_net() >> for_each_netdev_safe loop unable to detect
the list end, and hit NULL ptr:

[  231.449420] mana 7870:00:00.0 enP30832s1: Moved VF to namespace with: eth0
[  231.449656] BUG: kernel NULL pointer dereference, address: 0000000000000010
[  231.450246] #PF: supervisor read access in kernel mode
[  231.450579] #PF: error_code(0x0000) - not-present page
[  231.450916] PGD 17b8a8067 P4D 0
[  231.451163] Oops: Oops: 0000 [#1] SMP NOPTI
[  231.451450] CPU: 82 UID: 0 PID: 1394 Comm: kworker/u768:1 Not tainted 6.16.0-rc4+ #3 VOLUNTARY
[  231.452042] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024
[  231.452692] Workqueue: netns cleanup_net
[  231.452947] RIP: 0010:default_device_exit_batch+0x16c/0x3f0
[  231.453326] Code: c0 0c f5 b3 e8 d5 db fe ff 48 85 c0 74 15 48 c7 c2 f8 fd ca b2 be 10 00 00 00 48 8d 7d c0 e8 7b 77 25 00 49 8b 86 28 01 00 00 <48> 8b 50 10 4c 8b 2a 4c 8d 62 f0 49 83 ed 10 4c 39 e0 0f 84 d6 00
[  231.454294] RSP: 0018:ff75fc7c9bf9fd00 EFLAGS: 00010246
[  231.454610] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 61c8864680b583eb
[  231.455094] RDX: ff1fa9f71462d800 RSI: ff75fc7c9bf9fd38 RDI: 0000000030766564
[  231.455686] RBP: ff75fc7c9bf9fd78 R08: 0000000000000000 R09: 0000000000000000
[  231.456126] R10: 0000000000000001 R11: 0000000000000004 R12: ff1fa9f70088e340
[  231.456621] R13: ff1fa9f70088e340 R14: ffffffffb3f50c20 R15: ff1fa9f7103e6340
[  231.457161] FS:  0000000000000000(0000) GS:ff1faa6783a08000(0000) knlGS:0000000000000000
[  231.457707] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  231.458031] CR2: 0000000000000010 CR3: 0000000179ab2006 CR4: 0000000000b73ef0
[  231.458434] Call Trace:
[  231.458600]  <TASK>
[  231.458777]  ops_undo_list+0x100/0x220
[  231.459015]  cleanup_net+0x1b8/0x300
[  231.459285]  process_one_work+0x184/0x340

To fix it, move the ns change to a workqueue, and take rtnl_lock to avoid
changing the netdev list when default_device_exit_net() is using it.

Cc: stable@vger.kernel.org
Fixes: 4c26280 ("hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1754511711-11188-1-git-send-email-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant