Skip to content

linux: Defer initialisation of struct pages to kthreads a.k.a. select DEFERRED_STRUCT_PAGE_INIT #2015

Open
pmenzel opened this issue Dec 4, 2020 · 6 comments

Comments

@pmenzel
Copy link
Collaborator

pmenzel commented Dec 4, 2020

This should initialize memory in parallel and decrease boot time.

config DEFERRED_STRUCT_PAGE_INIT
        bool "Defer initialisation of struct pages to kthreads"
        default n
        depends on NO_BOOTMEM
        depends on SPARSEMEM
        depends on !NEED_PER_CPU_KM
        depends on 64BIT
        help
          Ordinarily all struct pages are initialised during early boot in a
          single thread. On very large machines this can take a considerable
          amount of time. If this option is set, large machines will bring up
          a subset of memmap at boot and then initialise the rest in parallel
          by starting one-off "pgdatinitX" kernel thread for each node X. This
          has a potential performance impact on processes running early in the
          lifetime of the system until these kthreads finish the
          initialisation.
@donald
Copy link
Collaborator

donald commented Dec 14, 2020

Hmmm. "On very large machines". We don't really care that much, how long our "very large machines" take to boot. But this would add more timing randomness and unpredictability to the boot. Think about file system recovery after a crash with not yet all memory available. The default is "n". How much time do we really save? I doubt, its worth the risk.

@pmenzel
Copy link
Collaborator Author

pmenzel commented Dec 14, 2020

At least Debian enables it:

/boot/config-5.9.0-4-amd64:CONFIG_DEFERRED_STRUCT_PAGE_INIT=y

So, it’s well tested in my opinion.

How much time exactly it saves on 1 TB or or 2 TB machines needs to be tested.

@donald
Copy link
Collaborator

donald commented Dec 14, 2020

How many log recoveries of xfs filesystems on 100 TB raid 6 software raids were tested? I've got the feeling, often things are a bit unique here. Why do we hit nfs bugs, xfs bugs, hba driver bugs?

And maybe Debian has a smarter synchronization during startup than we have.

Let's see, what the timing results are.

@pmenzel
Copy link
Collaborator Author

pmenzel commented Dec 14, 2020

How many log recoveries of xfs filesystems on 100 TB raid 6 software raids were tested? I've got the feeling, often things are a bit unique here. Why do we hit nfs bugs, xfs bugs, hba driver bugs?

Log recovery is already in user space. It’s my understanding that all of the memory is going to be available, once you are able to do anything on the machine.

And maybe Debian has a smarter synchronization during startup than we have.

Could be. I have no idea.

Let's see, what the timing results are.

Ok, I am going to build a test Linux kernel.

@donald
Copy link
Collaborator

donald commented Dec 14, 2020

Log recovery is already in user space. It’s my understanding that all of the memory is going to be available, once you are able to do anything on the machine.

Oh, I just assumed that userspace (init from initramfs) was started before pgdatinitX have finished and all memory is available. Is that not true and the kernel waits for all memory before init is started?

@pmenzel
Copy link
Collaborator Author

pmenzel commented Dec 14, 2020

Maybe for machines with several terrabytes of memory. On the 2 TB RAM server nomnomnom, currently the system with the greatest amount of RAM, Linux takes over 90 seconds to initialize.

@nomnomnom$ systemd-analyze  # log ring buffer already overflown (dmesg)
Startup finished in 1min 33.850s (kernel) + 17.413s (userspace) = 1min 51.264s 
multi-user.target reached after 17.403s in userspace

On the 96 GB RAM server claptrap it takes less time.

@claptrap$ systemd-analyze 
Startup finished in 11.561s (kernel) + 23.092s (userspace) = 34.654s 
multi-user.target reached after 23.082s in userspace

So, maybe you are right, that there could be a small overlay. But until tested, I cannot say for sure.

Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants