Skip to content

slowio-mem-control: increase default MaxMem and decrease exception #311

Merged
merged 2 commits into from Apr 3, 2023

Conversation

david
Copy link
Contributor

@david david commented Mar 14, 2023

[Tue Mar 14 13:25:16 2023] tokio-runtime-w invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
[Tue Mar 14 13:25:16 2023] CPU: 7 PID: 17037 Comm: tokio-runtime-w Kdump: loaded Not tainted 5.15.77.mx64.440 #1
[Tue Mar 14 13:25:16 2023] Hardware name: Dell Inc. PowerEdge T440/021KCD, BIOS 2.11.2 04/22/2021
[Tue Mar 14 13:25:16 2023] Call Trace:
[Tue Mar 14 13:25:16 2023]  <TASK>
[Tue Mar 14 13:25:16 2023]  dump_stack_lvl+0x34/0x48
[Tue Mar 14 13:25:16 2023]  dump_header+0x4a/0x1f1
[Tue Mar 14 13:25:16 2023]  oom_kill_process.cold+0xb/0x10
[Tue Mar 14 13:25:16 2023]  out_of_memory+0x250/0x500
[Tue Mar 14 13:25:16 2023]  mem_cgroup_out_of_memory+0x110/0x160
[Tue Mar 14 13:25:16 2023]  try_charge_memcg+0x70e/0x7d0
[Tue Mar 14 13:25:16 2023]  ? __alloc_pages+0x1a8/0x310
[Tue Mar 14 13:25:16 2023]  charge_memcg+0x40/0x90
[Tue Mar 14 13:25:16 2023]  __mem_cgroup_charge+0x29/0x80
[Tue Mar 14 13:25:16 2023]  __handle_mm_fault+0xb7f/0x1760
[Tue Mar 14 13:25:16 2023]  ? asm_sysvec_call_function_single+0x16/0x20
[Tue Mar 14 13:25:16 2023]  handle_mm_fault+0xca/0x290
[Tue Mar 14 13:25:16 2023]  do_user_addr_fault+0x1cb/0x670
[Tue Mar 14 13:25:16 2023]  exc_page_fault+0x65/0x120
[Tue Mar 14 13:25:16 2023]  asm_exc_page_fault+0x22/0x30
[Tue Mar 14 13:25:16 2023] RIP: 0033:0x7f96fa8e7b6f
[Tue Mar 14 13:25:16 2023] Code: 00 62 e1 fe 28 6f be 60 10 00 00 48 83 ee 80 62 e1 7d 28 e7 07 62 e1 7d 28 e7 4f 01 62 e1 7d 28 e7 57 02 62 e1 7d 28 e7 5f 03 <62> e1 7d 28 e7 a7 00 10 00 00 62 e1 7d 28 e7 af 20 10 00 00 62 e1
[Tue Mar 14 13:25:16 2023] RSP: 002b:00007f96b3dfd008 EFLAGS: 00010203
[Tue Mar 14 13:25:16 2023] RAX: 00007f9621976c50 RBX: 0000000000cbb610 RCX: 0000000000000019
[Tue Mar 14 13:25:16 2023] RDX: 00000000000015d8 RSI: 00007f96abd6d420 RDI: 00007f96223af000
[Tue Mar 14 13:25:16 2023] RBP: 00007f96ab334fe0 R08: ffffffffffffffd0 R09: 00007f9620000000
[Tue Mar 14 13:25:16 2023] R10: 0000000000000141 R11: 0000000001ec0000 R12: 00007f96a8000030
[Tue Mar 14 13:25:16 2023] R13: 0000000001976c10 R14: 00007f96abff05f0 R15: 0000000000000008
[Tue Mar 14 13:25:16 2023]  </TASK>
[Tue Mar 14 13:25:16 2023] memory: usage 2097084kB, limit 2097152kB, failcnt 2241246567
[Tue Mar 14 13:25:16 2023] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[Tue Mar 14 13:25:16 2023] Memory cgroup stats for /slowio.slice:
[Tue Mar 14 13:25:16 2023] anon 2113126400
                           file 258048
                           kernel_stack 344064
                           pagetables 4550656
                           percpu 56704
                           sock 0
                           shmem 0
                           file_mapped 49152
                           file_dirty 0
                           file_writeback 0
                           swapcached 0
                           anon_thp 23068672
                           file_thp 0
                           shmem_thp 0
                           inactive_anon 2113118208
                           active_anon 4096
                           inactive_file 139264
                           active_file 32768
                           unevictable 0
                           slab_reclaimable 28819360
                           slab_unreclaimable 263672
                           slab 29083032
                           workingset_refault_anon 0
                           workingset_refault_file 24765250
                           workingset_activate_anon 0
                           workingset_activate_file 1042738
                           workingset_restore_anon 0
                           workingset_restore_file 100254
                           workingset_nodereclaim 46641718
                           pgfault 139227665
                           pgmajfault 10636
                           pgrefill 1227406
                           pgscan 2826681947
                           pgsteal 2801429255
                           pgactivate 137577
                           pgdeactivate 1177928
                           pglazyfree 0
                           pglazyfreed 0
                           thp_fault_alloc 44842
                           thp_collapse_alloc 10968
[Tue Mar 14 13:25:16 2023] Tasks state (memory values in pages):
[Tue Mar 14 13:25:16 2023] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[Tue Mar 14 13:25:16 2023] [  16984]     0 16984    11299     6472    98304        0             0 mxproxmox
[Tue Mar 14 13:25:16 2023] [  17001]     0 17001   818241   511589  4468736        0             0 proxmox-backup-
[Tue Mar 14 13:25:16 2023] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0-1,oom_memcg=/slowio.slice,task_memcg=/slowio.slice/proxmox-backup.service,task=proxmox-backup-,pid=17001,uid=0
[Tue Mar 14 13:25:16 2023] Memory cgroup out of memory: Killed process 17001 (proxmox-backup-) total-vm:3272964kB, anon-rss:2045364kB, file-rss:992kB, shmem-rss:0kB, UID:0 pgtables:4364kB oom_score_adj:0

@donald
Copy link
Collaborator

donald commented Mar 15, 2023

IMO you can directly commit into the master branch. Can you add sirhammerlock as well?
Perhaps we should just add all server ?
And/Or catch this from netlog not randomly.

@david
Copy link
Contributor Author

david commented Mar 15, 2023

or increase the default MemoryMax to 4G and then remove all server

And increase MemoryMax for slowio-mem-control to 6G or 8G?

@donald
Copy link
Collaborator

donald commented Mar 15, 2023

increase the default MemoryMax to 4G

We have 17 workstations with 7,7 GB. So up to half of the memory would be used by backup.

Maybe find a formular based on the amount of memory? Perhaps max(2 GB, min(8 GB, memory/10) )

@david
Copy link
Contributor Author

david commented Mar 15, 2023

Ok, I forgot the workstations

@david
Copy link
Contributor Author

david commented Mar 15, 2023

Maybe we can do it like this ?

if [ -e /node/tags/server ]; then
	HOST=server
else
	HOST=$(hostname -s)
fi
case "$HOST" in
server)
...

@david
Copy link
Contributor Author

david commented Mar 16, 2023

Es scheint keine richtige min max Funktion fuer die Bash zugeben.
Ich glaube das ganze ueber eine Formel zumachen, ist spaeter schwer zulesen.
Es wuerde dann so in der Art aussehen

[[ `grep MemTotal /proc/meminfo` =~ MemTotal:[[:space:]]*([0-9]*) ]]
MEM=$(( (${BASH_REMATCH[1]}/1024/1024)/10 ))
min=$(( $MEM < 8 ? $MEM : 8 ))
max=$(( 2 > $min ? 2 : $min ))

echo "min=$min   max=$max"

Vieleicht den standard MaxMemory erhoehen und fuer die Workstations eine Ausnahme machen ?

@donald
Copy link
Collaborator

donald commented Mar 16, 2023

Ich finds jetzt nicht so schwer zu lesen.
Wäre es mit shell functions leichter zu parsen?

#! /bin/bash
min() { echo $(( $1 < $2 ? $1 : $2 )); }
max() { echo $(( $1 > $2 ? $1 : $2 )); }

[[ `grep MemTotal /proc/meminfo` =~ MemTotal:[[:space:]]*([0-9]*) ]]
MEM=$(( (${BASH_REMATCH[1]}/1024/1024)/10 ))

limit=$(max 2 $(min 8 $MEM ))

Vieleicht den standard MaxMemory erhoehen und fuer die Workstations eine Ausnahme machen ?

Ja, auch gut. 8 GB in den unit-File und im generator für workstations auf 2 GB runterfixen.

@david david force-pushed the increase-mem-furoncles-proxmox branch from dee2097 to eafd2c2 Compare March 17, 2023 11:45
@david david force-pushed the increase-mem-furoncles-proxmox branch from eafd2c2 to 56e2477 Compare March 17, 2023 11:54
@david david changed the title slowio-mem-control: Add furoncles slowio-mem-control: increase default MaxMem and decrease exception Apr 3, 2023
@david david merged commit b73e912 into master Apr 3, 2023
Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants