diff options
author | Mel Gorman <mgorman@suse.de> | 2013-10-07 11:29:00 +0100 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2013-10-09 12:40:26 +0200 |
commit | 3a7053b3224f4a8b0e8184166190076593621617 (patch) | |
tree | dfe404bfbc1306fccbc00f2177becf1482504e45 /kernel/sched/core.c | |
parent | 745d61476ddb737aad3495fa6d9a8f8c2ee59f86 (diff) |
sched/numa: Favour moving tasks towards the preferred node
This patch favours moving tasks towards NUMA node that recorded a higher
number of NUMA faults during active load balancing. Ideally this is
self-reinforcing as the longer the task runs on that node, the more faults
it should incur causing task_numa_placement to keep the task running on that
node. In reality a big weakness is that the nodes CPUs can be overloaded
and it would be more efficient to queue tasks on an idle node and migrate
to the new node. This would require additional smarts in the balancer so
for now the balancer will simply prefer to place the task on the preferred
node for a PTE scans which is controlled by the numa_balancing_settle_count
sysctl. Once the settle_count number of scans has complete the schedule
is free to place the task on an alternative node if the load is imbalanced.
[srikar@linux.vnet.ibm.com: Fixed statistics]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[ Tunable and use higher faults instead of preferred. ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-23-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'kernel/sched/core.c')
-rw-r--r-- | kernel/sched/core.c | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 064a0af4454..b7e6b6f9c5f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1631,7 +1631,7 @@ static void __sched_fork(struct task_struct *p) p->node_stamp = 0ULL; p->numa_scan_seq = p->mm ? p->mm->numa_scan_seq : 0; - p->numa_migrate_seq = p->mm ? p->mm->numa_scan_seq - 1 : 0; + p->numa_migrate_seq = 0; p->numa_scan_period = sysctl_numa_balancing_scan_delay; p->numa_preferred_nid = -1; p->numa_work.next = &p->numa_work; @@ -5656,6 +5656,7 @@ sd_numa_init(struct sched_domain_topology_level *tl, int cpu) | 0*SD_SHARE_PKG_RESOURCES | 1*SD_SERIALIZE | 0*SD_PREFER_SIBLING + | 1*SD_NUMA | sd_local_flags(level) , .last_balance = jiffies, |