-
Notifications
You must be signed in to change notification settings - Fork 3
mxqsub: force new group/reuse of old group for new jobs.. #22
Comments
options should be:
|
Hello, another idea: The usual pattern is, that the user submits several jobs via mxqsub from a script, a shell loop or even from a session with command recall. The expectation would be, that these jobs are in a unique group and not mixed up with other jobs. Option 1: Do this automatically: only add to existing group if submitter has same PID. If user should be able to override, --submitter-pid might do. Option 2: Let the submitter decide by declaring 'now I want a new group' with the first submit or even standalone (without submitting a job at the same time) , which may be easier to use in programs: mxqsub --group-name blabla --newgroup job-1 Perhaps --newgroup could have the semantic of "close all matching groups", where "closed" means that the groups will no longer accept new jobs. I don't think this should be a new group state, because it may make sense to "close" groups of all states. It should be a flag (so we have running+closed or waiting+closed etc. ) |
btw: to make a PID unique over systems, reboots and pid recycling I came to some concatenation of |
offtopic: |
and at the moment you always have the possibility to start a new group by just giving it a new unique name. tag=$(date +"%Y-%m-%d-%s")
mxqsub -N "group1-$tag" job-1
mxqsub -N "group1-$tag" job-2
mxqsub -N "group1-$tag" job-3 or tag=$(cat /proc/sys/kernel/random/uuid) |
this is part of mariux64#22
mariux@79fcd63 implements B (Option 2) |
mariux@a6491d8 implements D (other interpretation of Option 1 where you can force reactivation of a specific group) |
btw.. cancelled groups will never be reused. |
so A to E are available in some form now. |
implements #22 * mxqsub: mxqsub: Add option --group-id=ID to force reusing group with group_id=ID mxqsub: Add jobs to the group with greatest group_id when reusing groups mxqsub: Add option --new-group to froce new group mxqsub: Fix typos in usage (--help)
I don't like the Idea, that the grouping depends on timing (Option A,B,C ). It shouldn't make a difference, whether older jobs are still running or not or whether they ware canceled or not. And no, my "option 2" proposal is not at all "reuse running groups" For my "Option 1" proposal of course PID was referring to the process calling mxqsub, so - yes - the ppid of the mxsub process. However it should be a ident made from boot_id+stime+pid or somthing like that. --submitter-pid ca still use a pid only because it implies the running process on the current system with this pid. Typical use case would be --submitter-pid $$ |
but still option2 can be solved by giving a unique group name. I can't see the need to have 2 groups running with the same name at the same time with exactly the same resources running the same program. If a user really has 2 groups running under the same name the user needs to track the group_id to distinguish between them. So what is the real life use case for adding this extra complexity? option1: where is the real life use case to add complexity of tracking bootid/pid/* for the user? you can reuse a group at anytime by specifying the group_id. |
use case for option 1: I have a perl script to do the accounting It calls sys(
'mxqsub',
'-t',30,
'-o',"tmp/$host.out",
'-e',"tmp/$host.log",
'./classify',($live ? '--live' : () ),'--onefile',$data_file, for a number of files (little below 100). Maybe I sit in a directory 2015-06 to re-run the accounting for a previous period after I've sent emails and got some corrections. At the same time, cron might start the same script from another directory (LIVE) to produce accounting based on current usage. This is a daily cron job. I'd love to have two groups running now without the need to do any additional complex things for it like inventing session-specific group names. D. |
Logic works well with diff --git a/classify b/classify
index 620884b..6cf01f8 100755
--- a/classify
+++ b/classify
@@ -978,16 +978,37 @@ sub sys {
}
+sub first_line_of_file {
+ our ($filename)=@_;
+ my $fh=new IO::File ($filename,'<') or die "$filename: $!\n";
+ my $line=$fh->getline();
+ chomp($line);
+ return $line;
+}
+
+our $BOOT_ID; # system incarnation - lazy init
+
+sub mxq_group_name {
+ my ($pid)=@_;
+ my $stat;
+ defined $BOOT_ID or $BOOT_ID=first_line_of_file('/proc/sys/kernel/random/boot_id');
+ $stat=first_line_of_file("/proc/$pid/stat");
+ my $stime=(split (" ",$stat))[21];
+ return $BOOT_ID.'.'.$stime.'.'.$pid;
+}
+
sub main_cluster {
setup_global();
read_cf_files();
empty_dir("tmp");
+ my $group_name=mxq_group_name($$);
for my $data_file(scan_data_files()) {
my ($host)=$data_file=~/([^\/]+)$/;
sys(
'mxqsub',
+ '-N',$group_name,
'-t',30,
'-o',"tmp/$host.out",
'-e',"tmp/$host.log",
we just get ugly group names now. |
ok.. got it.. but automatic grouping is based on who and what without caring about where and when. besides this will break existing code where people submit jobs from the cluster, where host,pid,bootid etc. differ for every submission. but for this exact use case: the group-name is not used at all so it will be 'default' atm. If you do not care about the group name (by not specifying it) just use it:
why not
but: I don't care for real ;)
|
i like |
p.s.: and still it's
|
agreed. will change that |
there are three types of groups:
mxqsub does not care about status and reuses groups no matter what.
there needs to be an option in mxqsub to tell the cluster to start a new group or to reuse an existing (finished) group.
The text was updated successfully, but these errors were encountered: