Skip to content

gpu-policy: add orpheus #235

Merged
merged 1 commit into from
Mar 17, 2022
Merged

gpu-policy: add orpheus #235

merged 1 commit into from
Mar 17, 2022

Conversation

thomas
Copy link
Contributor

@thomas thomas commented Mar 16, 2022

2 Cards 'NVIDIA TITAN X (Pascal)' 12GB each from 2016/17

2 Cards 'NVIDIA TITAN X (Pascal)' 12GB each from 2016/17
@donald
Copy link
Collaborator

donald commented Mar 16, 2022

LGTM, go ahead.
You can test that by calling in with "init" manually as root. It should return the number of gpu.

@donald
Copy link
Collaborator

donald commented Mar 16, 2022

You might want to add cuda_12gb (or cuda_10gb) to the tags of oprheus. That way a user could submit with "--gpu --prere cuda_10gb" to avoid the job being started on a card with 5gb. See jabberwocky.

@thomas
Copy link
Contributor Author

thomas commented Mar 16, 2022

Nameing it cuda_10gb/cuda_12gb is a bit of the question. finer grain, or classes?

Right now an user who needs more than 5GB must use cuda_10gb and thus ends on the A100. IMHO that should suffice until we see what other problems arise when we use different GPU-capabilities (Pascal/Ampere or 2016/2021) under the same 'hood'.

@donald
Copy link
Collaborator

donald commented Mar 17, 2022

Okay. Anyway, I think we should plan a tag schema now, so that a user can formulate the requirement of his job. If we are doing it 'as needed' it will probably become a mess not much better than --whitelist.

@thomas
Copy link
Contributor Author

thomas commented Mar 17, 2022

Hostconfig for orpheus is now
orpheus tag amd mx64 server cuda confidential cuda_10g

Anything else to consider?

@donald
Copy link
Collaborator

donald commented Mar 17, 2022

No, looks good.

@thomas thomas merged commit fdd0025 into master Mar 17, 2022
Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants