Skip to content

Commit

Permalink
MultiQueue Skiplist Scheduler v0.205
Browse files Browse the repository at this point in the history
  • Loading branch information
Con Kolivas committed Dec 31, 2020
1 parent 2c85ebc commit 35f6640
Show file tree
Hide file tree
Showing 46 changed files with 10,332 additions and 50 deletions.
8 changes: 8 additions & 0 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4652,6 +4652,14 @@
Memory area to be used by remote processor image,
managed by CMA.

rqshare= [X86] Select the MuQSS scheduler runqueue sharing type.
Format: <string>
smt -- Share SMT (hyperthread) sibling runqueues
mc -- Share MC (multicore) sibling runqueues
smp -- Share SMP runqueues
none -- So not share any runqueues
Default value is mc

rw [KNL] Mount root device read-write on boot

S [KNL] Run init in single mode
Expand Down
34 changes: 34 additions & 0 deletions Documentation/admin-guide/sysctl/kernel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,16 @@ this allows system administrators to override the
``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.


iso_cpu: (MuQSS CPU scheduler only)
===================================

This sets the percentage cpu that the unprivileged SCHED_ISO tasks can
run effectively at realtime priority, averaged over a rolling five
seconds over the -whole- system, meaning all cpus.

Set to 70 (percent) by default.


kexec_load_disabled
===================

Expand Down Expand Up @@ -1077,6 +1087,20 @@ ROM/Flash boot loader. Maybe to tell it what to do after
rebooting. ???


rr_interval: (MuQSS CPU scheduler only)
=======================================

This is the smallest duration that any cpu process scheduling unit
will run for. Increasing this value can increase throughput of cpu
bound tasks substantially but at the expense of increased latencies
overall. Conversely decreasing it will decrease average and maximum
latencies but at the expense of throughput. This value is in
milliseconds and the default value chosen depends on the number of
cpus available at scheduler initialisation with a minimum of 6.

Valid values are from 1-1000.


sched_energy_aware
==================

Expand Down Expand Up @@ -1515,3 +1539,13 @@ is 10 seconds.

The softlockup threshold is (``2 * watchdog_thresh``). Setting this
tunable to zero will disable lockup detection altogether.


yield_type: (MuQSS CPU scheduler only)
======================================

This determines what type of yield calls to sched_yield will perform.

0: No yield.
1: Yield only to better priority/deadline tasks. (default)
2: Expire timeslice and recalculate deadline.
351 changes: 351 additions & 0 deletions Documentation/scheduler/sched-BFS.txt

Large diffs are not rendered by default.

373 changes: 373 additions & 0 deletions Documentation/scheduler/sched-MuQSS.txt

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions arch/alpha/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -667,6 +667,8 @@ config HZ
default 1200 if HZ_1200
default 1024

source "kernel/Kconfig.MuQSS"

config SRM_ENV
tristate "SRM environment through procfs"
depends on PROC_FS
Expand Down
2 changes: 2 additions & 0 deletions arch/arm/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -1236,6 +1236,8 @@ config SCHED_SMT
MultiThreading at a cost of slightly increased overhead in some
places. If unsure say N here.

source "kernel/Kconfig.MuQSS"

config HAVE_ARM_SCU
bool
help
Expand Down
2 changes: 2 additions & 0 deletions arch/arm64/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -976,6 +976,8 @@ config SCHED_SMT
MultiThreading at a cost of slightly increased overhead in some
places. If unsure say N here.

source "kernel/Kconfig.MuQSS"

config NR_CPUS
int "Maximum number of CPUs (2-4096)"
range 2 4096
Expand Down
2 changes: 2 additions & 0 deletions arch/powerpc/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -888,6 +888,8 @@ config SCHED_SMT
when dealing with POWER5 cpus at a cost of slightly increased
overhead in some places. If unsure say N here.

source "kernel/Kconfig.MuQSS"

config PPC_DENORMALISATION
bool "PowerPC denormalisation exception handling"
depends on PPC_BOOK3S_64
Expand Down
5 changes: 0 additions & 5 deletions arch/powerpc/platforms/cell/spufs/sched.c
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,6 @@ static struct task_struct *spusched_task;
static struct timer_list spusched_timer;
static struct timer_list spuloadavg_timer;

/*
* Priority of a normal, non-rt, non-niced'd process (aka nice level 0).
*/
#define NORMAL_PRIO 120

/*
* Frequency of the spu scheduler tick. By default we do one SPU scheduler
* tick for every 10 CPU scheduler ticks.
Expand Down
18 changes: 18 additions & 0 deletions arch/x86/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -1008,6 +1008,22 @@ config NR_CPUS
config SCHED_SMT
def_bool y if SMP

config SMT_NICE
bool "SMT (Hyperthreading) aware nice priority and policy support"
depends on SCHED_MUQSS && SCHED_SMT
default y
help
Enabling Hyperthreading on Intel CPUs decreases the effectiveness
of the use of 'nice' levels and different scheduling policies
(e.g. realtime) due to sharing of CPU power between hyperthreads.
SMT nice support makes each logical CPU aware of what is running on
its hyperthread siblings, maintaining appropriate distribution of
CPU according to nice levels and scheduling policies at the expense
of slightly increased overhead.

If unsure say Y here.


config SCHED_MC
def_bool y
prompt "Multi-core scheduler support"
Expand Down Expand Up @@ -1038,6 +1054,8 @@ config SCHED_MC_PRIO

If unsure say Y here.

source "kernel/Kconfig.MuQSS"

config UP_LATE_INIT
def_bool y
depends on !SMP && X86_LOCAL_APIC
Expand Down
2 changes: 1 addition & 1 deletion fs/proc/base.c
Original file line number Diff line number Diff line change
Expand Up @@ -479,7 +479,7 @@ static int proc_pid_schedstat(struct seq_file *m, struct pid_namespace *ns,
seq_puts(m, "0 0 0\n");
else
seq_printf(m, "%llu %llu %lu\n",
(unsigned long long)task->se.sum_exec_runtime,
(unsigned long long)tsk_seruntime(task),
(unsigned long long)task->sched_info.run_delay,
task->sched_info.pcount);

Expand Down
4 changes: 4 additions & 0 deletions include/linux/init_task.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,11 @@ extern struct cred init_cred;
#define INIT_PREV_CPUTIME(x)
#endif

#ifdef CONFIG_SCHED_MUQSS
#define INIT_TASK_COMM "MuQSS"
#else
#define INIT_TASK_COMM "swapper"
#endif

/* Attach to the init_task data structure for proper alignment */
#ifdef CONFIG_ARCH_TASK_STRUCT_ON_STACK
Expand Down
2 changes: 2 additions & 0 deletions include/linux/ioprio.h
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ enum {
*/
static inline int task_nice_ioprio(struct task_struct *task)
{
if (iso_task(task))
return 0;
return (task_nice(task) + 20) / 5;
}

Expand Down
61 changes: 60 additions & 1 deletion include/linux/sched.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@
#include <linux/seqlock.h>
#include <linux/kcsan.h>

#ifdef CONFIG_SCHED_MUQSS
#include <linux/skip_list.h>
#endif

/* task_struct member predeclarations (sorted alphabetically): */
struct audit_context;
struct backing_dev_info;
Expand Down Expand Up @@ -660,8 +664,10 @@ struct task_struct {
unsigned int flags;
unsigned int ptrace;

#ifdef CONFIG_SMP
#if defined(CONFIG_SMP) || defined(CONFIG_SCHED_MUQSS)
int on_cpu;
#endif
#ifdef CONFIG_SMP
struct __call_single_node wake_entry;
#ifdef CONFIG_THREAD_INFO_IN_TASK
/* Current CPU: */
Expand All @@ -687,10 +693,25 @@ struct task_struct {
int static_prio;
int normal_prio;
unsigned int rt_priority;
#ifdef CONFIG_SCHED_MUQSS
int time_slice;
u64 deadline;
skiplist_node node; /* Skip list node */
u64 last_ran;
u64 sched_time; /* sched_clock time spent running */
#ifdef CONFIG_SMT_NICE
int smt_bias; /* Policy/nice level bias across smt siblings */
#endif
#ifdef CONFIG_HOTPLUG_CPU
bool zerobound; /* Bound to CPU0 for hotplug */
#endif
unsigned long rt_timeout;
#else /* CONFIG_SCHED_MUQSS */

const struct sched_class *sched_class;
struct sched_entity se;
struct sched_rt_entity rt;
#endif
#ifdef CONFIG_CGROUP_SCHED
struct task_group *sched_task_group;
#endif
Expand Down Expand Up @@ -886,6 +907,10 @@ struct task_struct {
#ifdef CONFIG_ARCH_HAS_SCALED_CPUTIME
u64 utimescaled;
u64 stimescaled;
#endif
#ifdef CONFIG_SCHED_MUQSS
/* Unbanked cpu time */
unsigned long utime_ns, stime_ns;
#endif
u64 gtime;
struct prev_cputime prev_cputime;
Expand Down Expand Up @@ -1365,6 +1390,40 @@ struct task_struct {
*/
};

#ifdef CONFIG_SCHED_MUQSS
#define tsk_seruntime(t) ((t)->sched_time)
#define tsk_rttimeout(t) ((t)->rt_timeout)

static inline void tsk_cpus_current(struct task_struct *p)
{
}

void print_scheduler_version(void);

static inline bool iso_task(struct task_struct *p)
{
return (p->policy == SCHED_ISO);
}
#else /* CFS */
#define tsk_seruntime(t) ((t)->se.sum_exec_runtime)
#define tsk_rttimeout(t) ((t)->rt.timeout)

static inline void tsk_cpus_current(struct task_struct *p)
{
p->nr_cpus_allowed = current->nr_cpus_allowed;
}

static inline void print_scheduler_version(void)
{
printk(KERN_INFO "CFS CPU scheduler.\n");
}

static inline bool iso_task(struct task_struct *p)
{
return false;
}
#endif /* CONFIG_SCHED_MUQSS */

static inline struct pid *task_pid(struct task_struct *task)
{
return task->thread_pid;
Expand Down
9 changes: 9 additions & 0 deletions include/linux/sched/deadline.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,16 @@ static inline bool dl_time_before(u64 a, u64 b)
#ifdef CONFIG_SMP

struct root_domain;
#ifdef CONFIG_SCHED_MUQSS
static inline void dl_clear_root_domain(struct root_domain *rd)
{
}
static inline void dl_add_task_root_domain(struct task_struct *p)
{
}
#else /* CONFIG_SCHED_MUQSS */
extern void dl_add_task_root_domain(struct task_struct *p);
extern void dl_clear_root_domain(struct root_domain *rd);
#endif /* CONFIG_SCHED_MUQSS */

#endif /* CONFIG_SMP */
2 changes: 1 addition & 1 deletion include/linux/sched/nohz.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ extern int get_nohz_timer_target(void);
static inline void nohz_balance_enter_idle(int cpu) { }
#endif

#ifdef CONFIG_NO_HZ_COMMON
#if defined(CONFIG_NO_HZ_COMMON) && !defined(CONFIG_SCHED_MUQSS)
void calc_load_nohz_start(void);
void calc_load_nohz_remote(struct rq *rq);
void calc_load_nohz_stop(void);
Expand Down
12 changes: 12 additions & 0 deletions include/linux/sched/prio.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,20 @@
*/

#define MAX_USER_RT_PRIO 100

#ifdef CONFIG_SCHED_MUQSS
/* Note different MAX_RT_PRIO */
#define MAX_RT_PRIO (MAX_USER_RT_PRIO + 1)

#define ISO_PRIO (MAX_RT_PRIO)
#define NORMAL_PRIO (MAX_RT_PRIO + 1)
#define IDLE_PRIO (MAX_RT_PRIO + 2)
#define PRIO_LIMIT ((IDLE_PRIO) + 1)
#else /* CONFIG_SCHED_MUQSS */
#define MAX_RT_PRIO MAX_USER_RT_PRIO

#endif /* CONFIG_SCHED_MUQSS */

#define MAX_PRIO (MAX_RT_PRIO + NICE_WIDTH)
#define DEFAULT_PRIO (MAX_RT_PRIO + NICE_WIDTH / 2)

Expand Down
2 changes: 2 additions & 0 deletions include/linux/sched/rt.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,10 @@ static inline bool task_is_realtime(struct task_struct *tsk)

if (policy == SCHED_FIFO || policy == SCHED_RR)
return true;
#ifndef CONFIG_SCHED_MUQSS
if (policy == SCHED_DEADLINE)
return true;
#endif
return false;
}

Expand Down
2 changes: 1 addition & 1 deletion include/linux/sched/task.h
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ int kernel_wait(pid_t pid, int *stat);
extern void free_task(struct task_struct *tsk);

/* sched_exec is called by processes performing an exec */
#ifdef CONFIG_SMP
#if defined(CONFIG_SMP) && !defined(CONFIG_SCHED_MUQSS)
extern void sched_exec(void);
#else
#define sched_exec() {}
Expand Down
33 changes: 33 additions & 0 deletions include/linux/skip_list.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#ifndef _LINUX_SKIP_LISTS_H
#define _LINUX_SKIP_LISTS_H
typedef u64 keyType;
typedef void *valueType;

typedef struct nodeStructure skiplist_node;

struct nodeStructure {
int level; /* Levels in this structure */
keyType key;
valueType value;
skiplist_node *next[8];
skiplist_node *prev[8];
};

typedef struct listStructure {
int entries;
int level; /* Maximum level of the list
(1 more than the number of levels in the list) */
skiplist_node *header; /* pointer to header */
} skiplist;

void skiplist_init(skiplist_node *slnode);
skiplist *new_skiplist(skiplist_node *slnode);
void free_skiplist(skiplist *l);
void skiplist_node_init(skiplist_node *node);
void skiplist_insert(skiplist *l, skiplist_node *node, keyType key, valueType value, unsigned int randseed);
void skiplist_delete(skiplist *l, skiplist_node *node);

static inline bool skiplist_node_empty(skiplist_node *node) {
return (!node->next[0]);
}
#endif /* _LINUX_SKIP_LISTS_H */
9 changes: 8 additions & 1 deletion include/uapi/linux/sched.h
Original file line number Diff line number Diff line change
Expand Up @@ -115,9 +115,16 @@ struct clone_args {
#define SCHED_FIFO 1
#define SCHED_RR 2
#define SCHED_BATCH 3
/* SCHED_ISO: reserved but not implemented yet */
/* SCHED_ISO: Implemented on MuQSS only */
#define SCHED_IDLE 5
#ifdef CONFIG_SCHED_MUQSS
#define SCHED_ISO 4
#define SCHED_IDLEPRIO SCHED_IDLE
#define SCHED_MAX (SCHED_IDLEPRIO)
#define SCHED_RANGE(policy) ((policy) <= SCHED_MAX)
#else /* CONFIG_SCHED_MUQSS */
#define SCHED_DEADLINE 6
#endif /* CONFIG_SCHED_MUQSS */

/* Can be ORed in to make sure the process is reverted back to SCHED_NORMAL on fork */
#define SCHED_RESET_ON_FORK 0x40000000
Expand Down
Loading

0 comments on commit 35f6640

Please sign in to comment.