gpu-policy: add orpheus #235

thomas · 2022-03-16T16:35:22Z

2 Cards 'NVIDIA TITAN X (Pascal)' 12GB each from 2016/17

donald · 2022-03-16T16:36:18Z

LGTM, go ahead.
You can test that by calling in with "init" manually as root. It should return the number of gpu.

donald · 2022-03-16T16:42:18Z

You might want to add cuda_12gb (or cuda_10gb) to the tags of oprheus. That way a user could submit with "--gpu --prere cuda_10gb" to avoid the job being started on a card with 5gb. See jabberwocky.

thomas · 2022-03-16T17:11:38Z

Nameing it cuda_10gb/cuda_12gb is a bit of the question. finer grain, or classes?

Right now an user who needs more than 5GB must use cuda_10gb and thus ends on the A100. IMHO that should suffice until we see what other problems arise when we use different GPU-capabilities (Pascal/Ampere or 2016/2021) under the same 'hood'.

donald · 2022-03-17T08:07:48Z

Okay. Anyway, I think we should plan a tag schema now, so that a user can formulate the requirement of his job. If we are doing it 'as needed' it will probably become a mess not much better than --whitelist.

thomas · 2022-03-17T14:18:43Z

Hostconfig for orpheus is now
orpheus tag amd mx64 server cuda confidential cuda_10g

Anything else to consider?

donald · 2022-03-17T14:24:54Z

No, looks good.

gpu-policy: add orpheus

35618aa

2 Cards 'NVIDIA TITAN X (Pascal)' 12GB each from 2016/17

thomas merged commit fdd0025 into master Mar 17, 2022

gpu-policy: add orpheus #235

gpu-policy: add orpheus #235

thomas commented Mar 16, 2022

donald commented Mar 16, 2022

donald commented Mar 16, 2022

thomas commented Mar 16, 2022

donald commented Mar 17, 2022

thomas commented Mar 17, 2022

donald commented Mar 17, 2022

gpu-policy: add orpheus #235

gpu-policy: add orpheus #235

Conversation

thomas commented Mar 16, 2022

donald commented Mar 16, 2022

donald commented Mar 16, 2022

thomas commented Mar 16, 2022

donald commented Mar 17, 2022

thomas commented Mar 17, 2022

donald commented Mar 17, 2022