mxqsub: force new group/reuse of old group for new jobs.. #22

mariux · 2015-08-25T09:29:32Z

there are three types of groups:

running/waiting
finished
active (finished within last x seconds or running) (in mxqdump)

mxqsub does not care about status and reuses groups no matter what.

there needs to be an option in mxqsub to tell the cluster to start a new group or to reuse an existing (finished) group.

by default only running/waiting groups should be used to add jobs to ?
by default only active groups should be used to add jobs to ?

mariux · 2015-08-25T09:38:59Z

options should be:

A: reuse active groups (max minutes since group_date_end) (new default?)
B: reuse running groups only (new default?)
C: reuse last active group/latest finished group (current default behavior of mxqsub)
D: reuse group X
E: force new group (no matter what? better not!)

donald · 2015-08-25T19:21:08Z

Hello,

another idea: The usual pattern is, that the user submits several jobs via mxqsub from a script, a shell loop or even from a session with command recall. The expectation would be, that these jobs are in a unique group and not mixed up with other jobs.

Option 1: Do this automatically: only add to existing group if submitter has same PID. If user should be able to override, --submitter-pid might do.

Option 2: Let the submitter decide by declaring 'now I want a new group' with the first submit or even standalone (without submitting a job at the same time) , which may be easier to use in programs:

mxqsub --group-name blabla --newgroup job-1
mxqsub --group-name blabla job-2
mxqsub --group-name blabla job-3
or
mxqsub --group-name blabla --newgroup
for job in job-*;do mxqsub --group-name blabla $job;done

Perhaps --newgroup could have the semantic of "close all matching groups", where "closed" means that the groups will no longer accept new jobs. I don't think this should be a new group state, because it may make sense to "close" groups of all states. It should be a flag (so we have running+closed or waiting+closed etc. )

donald · 2015-08-25T19:43:47Z

btw: to make a PID unique over systems, reboots and pid recycling I came to some concatenation of
/proc/sys/kernel/random/boot_id + process stime + pid :-)

mariux · 2015-08-26T08:19:29Z

offtopic: boot_id + PID/starttime + PID (not stime?)

mariux · 2015-08-26T08:32:09Z

option1 is too complex since people already start new jobs from within the cluster to q follow-ups. in addition every mxqsub call will have a new PID.. so PPID might do the job.
option2 sound like: 'reuse running groups only (new default?)' so you can do mxqsub --new-group in a loop and they will group as long as the group is runnning/waiting. so no need to do a initial submit to "close" groups. (every group with group_date_end > 0 can be considered closed anyway)

and at the moment you always have the possibility to start a new group by just giving it a new unique name.

tag=$(date +"%Y-%m-%d-%s")
mxqsub -N "group1-$tag" job-1
mxqsub -N "group1-$tag" job-2
mxqsub -N "group1-$tag" job-3

or

tag=$(cat /proc/sys/kernel/random/uuid)

this is part of mariux64#22

mariux · 2015-08-26T09:30:16Z

mariux@79fcd63 implements B (Option 2)

this is part of mariux64#22

mariux · 2015-08-26T11:39:35Z

mariux@a6491d8 implements D (other interpretation of Option 1 where you can force reactivation of a specific group)

mariux · 2015-08-26T11:39:50Z

btw.. cancelled groups will never be reused.

mariux · 2015-08-26T11:42:08Z

B,C,D implemented
E can be achived by setting uniq group-names
A is a mix of C and D

so A to E are available in some form now.

implements #22 * mxqsub: mxqsub: Add option --group-id=ID to force reusing group with group_id=ID mxqsub: Add jobs to the group with greatest group_id when reusing groups mxqsub: Add option --new-group to froce new group mxqsub: Fix typos in usage (--help)

donald · 2015-08-26T12:18:36Z

I don't like the Idea, that the grouping depends on timing (Option A,B,C ). It shouldn't make a difference, whether older jobs are still running or not or whether they ware canceled or not. And no, my "option 2" proposal is not at all "reuse running groups"

For my "Option 1" proposal of course PID was referring to the process calling mxqsub, so - yes - the ppid of the mxsub process. However it should be a ident made from boot_id+stime+pid or somthing like that. --submitter-pid ca still use a pid only because it implies the running process on the current system with this pid. Typical use case would be --submitter-pid $$

mariux · 2015-08-26T12:54:13Z

but still option2 can be solved by giving a unique group name.

I can't see the need to have 2 groups running with the same name at the same time with exactly the same resources running the same program. If a user really has 2 groups running under the same name the user needs to track the group_id to distinguish between them. So what is the real life use case for adding this extra complexity?

option1: where is the real life use case to add complexity of tracking bootid/pid/* for the user? you can reuse a group at anytime by specifying the group_id.

donald · 2015-08-26T13:14:19Z

use case for option 1: I have a perl script to do the accounting It calls

               sys(
                        'mxqsub',
                        '-t',30,
                        '-o',"tmp/$host.out",
                        '-e',"tmp/$host.log",
                        './classify',($live ? '--live' : () ),'--onefile',$data_file,

for a number of files (little below 100).

Maybe I sit in a directory 2015-06 to re-run the accounting for a previous period after I've sent emails and got some corrections.

At the same time, cron might start the same script from another directory (LIVE) to produce accounting based on current usage. This is a daily cron job.

I'd love to have two groups running now without the need to do any additional complex things for it like inventing session-specific group names.

D.

donald · 2015-08-26T13:42:23Z

Logic works well with

diff --git a/classify b/classify
index 620884b..6cf01f8 100755
--- a/classify
+++ b/classify
@@ -978,16 +978,37 @@ sub sys {

 }

+sub first_line_of_file {
+       our ($filename)=@_;
+       my $fh=new IO::File ($filename,'<') or die "$filename: $!\n";
+       my $line=$fh->getline();
+       chomp($line);
+       return $line;
+}
+
+our $BOOT_ID;  # system incarnation - lazy init
+
+sub mxq_group_name {
+       my ($pid)=@_;
+       my $stat;
+       defined $BOOT_ID or $BOOT_ID=first_line_of_file('/proc/sys/kernel/random/boot_id');
+       $stat=first_line_of_file("/proc/$pid/stat");
+       my $stime=(split (" ",$stat))[21];
+       return $BOOT_ID.'.'.$stime.'.'.$pid;
+}
+
 sub main_cluster {
        setup_global();
        read_cf_files();

        empty_dir("tmp");
+       my $group_name=mxq_group_name($$);

        for my $data_file(scan_data_files()) {
                my ($host)=$data_file=~/([^\/]+)$/;
                sys(
                        'mxqsub',
+                       '-N',$group_name,
                        '-t',30,
                        '-o',"tmp/$host.out",
                        '-e',"tmp/$host.log",

we just get ugly group names now.

mariux · 2015-08-26T13:45:42Z

ok.. got it..

but automatic grouping is based on who and what without caring about where and when.
why? because you can reuse the already accumulated stats of previous jobs running exactly the same thing.. no matter when or where the jobs were submitted from - because when and where do not change runtime behaviour.

besides this will break existing code where people submit jobs from the cluster, where host,pid,bootid etc. differ for every submission.

but for this exact use case: the group-name is not used at all so it will be 'default' atm. If you do not care about the group name (by not specifying it) just use it:

--group-name "${PWD}" (directory based)
--group-name "live${live}" (live based)
--group-name "${PWD} ${live}"
--group-name "${bootid}${pid}${stime}"

why not where by default?

new column in group table
user can already influence grouping by --group-name and --command-alias

but: I don't care for real ;)

add new column
patch mxqsub
- default to bootid/pid/stime
- add option to overwrite by commandline
- add option to overwrite/irgnore by environment variable
patch mxqd
- set overwrite environment in execution environment to not break existing code
patch mxqdump and cgi
- query new column
- show new column (escape special or disallow special in mxqsub)
commit ;)

mariux · 2015-08-26T13:49:11Z

i like e03cb706-d7f2-4d08-92aa-95a70395b8e5.60457597.26323 as group name.. since new-column would look exactly the same when displayed ;)

mariux · 2015-08-26T13:54:31Z

p.s.:

and still it's starttime field 22 (index 21)

stime is in field 15 (index 14) so naming the variable $stime may lead to confusion ;)

donald · 2015-08-26T13:57:57Z

agreed. will change that

mariux self-assigned this Aug 25, 2015

mariux added this to the 1.0 milestone Aug 25, 2015

mariux added a commit to mariux/mxq that referenced this issue Aug 26, 2015

mxqsub: Add option --new-group to froce new group

79fcd63

this is part of mariux64#22

mariux added a commit to mariux/mxq that referenced this issue Aug 26, 2015

mxqsub: Add option --group-id=ID to force reusing group with group_id=ID

a6491d8

this is part of mariux64#22

mariux closed this as completed Aug 26, 2015

This was referenced Aug 26, 2015

mxqd: set MXQ_HOSTID to something unique #23

Closed

close/reopen groups #24

Closed

mariux64 locked and limited conversation to collaborators Aug 26, 2015

mxqsub: force new group/reuse of old group for new jobs.. #22

mxqsub: force new group/reuse of old group for new jobs.. #22

mariux commented Aug 25, 2015

mariux commented Aug 25, 2015

donald commented Aug 25, 2015

donald commented Aug 25, 2015

mariux commented Aug 26, 2015

mariux commented Aug 26, 2015

mariux commented Aug 26, 2015

mariux commented Aug 26, 2015

mariux commented Aug 26, 2015

mariux commented Aug 26, 2015

donald commented Aug 26, 2015

mariux commented Aug 26, 2015

donald commented Aug 26, 2015

donald commented Aug 26, 2015

mariux commented Aug 26, 2015

mariux commented Aug 26, 2015

mariux commented Aug 26, 2015

donald commented Aug 26, 2015

mxqsub: force new group/reuse of old group for new jobs.. #22

mxqsub: force new group/reuse of old group for new jobs.. #22

Comments

mariux commented Aug 25, 2015

mariux commented Aug 25, 2015

donald commented Aug 25, 2015

donald commented Aug 25, 2015

mariux commented Aug 26, 2015

mariux commented Aug 26, 2015

mariux commented Aug 26, 2015

mariux commented Aug 26, 2015

mariux commented Aug 26, 2015

mariux commented Aug 26, 2015

donald commented Aug 26, 2015

mariux commented Aug 26, 2015

donald commented Aug 26, 2015

donald commented Aug 26, 2015

mariux commented Aug 26, 2015

mariux commented Aug 26, 2015

mariux commented Aug 26, 2015

donald commented Aug 26, 2015