Age | Commit message (Collapse) | Author |
|
As Tetsuo Handa pointed out, request_module() can stress the system
while the oom-killed caller sleeps in TASK_UNINTERRUPTIBLE.
The task T uses "almost all" memory, then it does something which
triggers request_module(). Say, it can simply call sys_socket(). This
in turn needs more memory and leads to OOM. oom-killer correctly
chooses T and kills it, but this can't help because it sleeps in
TASK_UNINTERRUPTIBLE and after that oom-killer becomes "disabled" by the
TIF_MEMDIE task T.
Make __request_module() killable. The only necessary change is that
call_modprobe() should kmalloc argv and module_name, they can't live in
the stack if we use UMH_KILLABLE. This memory is freed via
call_usermodehelper_freeinfo()->cleanup.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
No functional changes. Move the call_usermodehelper code from
__request_module() into the new simple helper, call_modprobe().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Minor cleanup. ____call_usermodehelper() can simply return, no need to
call do_exit() explicitely.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
No functional changes. It is not sane to use UMH_KILLABLE with enum
umh_wait, but obviously we do not want another argument in
call_usermodehelper_* helpers. Kill this enum, use the plain int.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Implement UMH_KILLABLE, should be used along with UMH_WAIT_EXEC/PROC.
The caller must ensure that subprocess_info->path/etc can not go away
until call_usermodehelper_freeinfo().
call_usermodehelper_exec(UMH_KILLABLE) does
wait_for_completion_killable. If it fails, it uses
xchg(&sub_info->complete, NULL) to serialize with umh_complete() which
does the same xhcg() to access sub_info->complete.
If call_usermodehelper_exec wins, it can safely return. umh_complete()
should get NULL and call call_usermodehelper_freeinfo().
Otherwise we know that umh_complete() was already called, in this case
call_usermodehelper_exec() falls back to wait_for_completion() which
should succeed "very soon".
Note: UMH_NO_WAIT == -1 but it obviously should not be used with
UMH_KILLABLE. We delay the neccessary cleanup to simplify the back
porting.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Preparation. Add the new trivial helper, umh_complete(). Currently it
simply does complete(sub_info->complete).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Change zap_pid_ns_processes() to use SEND_SIG_FORCED, it looks more
clear compared to SEND_SIG_NOINFO which relies on from_ancestor_ns logic
send_signal().
It is also more efficient if we need to kill a lot of tasks because it
doesn't alloc sigqueue.
While at it, add the __fatal_signal_pending(task) check as a minor
optimization.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Cosmetic, rename the from_ancestor_ns argument in prepare_signal()
paths. After the previous change it doesn't match the reality.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
force_sig_info() and friends have the special semantics for synchronous
signals, this interface should not be used if the target is not current.
And it needs the fixes, in particular the clearing of SIGNAL_UNKILLABLE
is not exactly right.
However there are callers which have to use force_ exactly because it
clears SIGNAL_UNKILLABLE and thus it can kill the CLONE_NEWPID tasks,
although this is almost always is wrong by various reasons.
With this patch SEND_SIG_FORCED ignores SIGNAL_UNKILLABLE, like we do if
the signal comes from the ancestor namespace.
This makes the naming in prepare_signal() paths insane, fixed by the
next cleanup.
Note: this only affects SIGKILL/SIGSTOP, but this is enough for
force_sig() abusers.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
PTRACE_SEIZE code is tested and ready for production use, remove the
code which requires special bit in data argument to make PTRACE_SEIZE
work.
Strace team prepares for a new release of strace, and we would like to
ship the code which uses PTRACE_SEIZE, preferably after this change goes
into released kernel.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This can be used to close a few corner cases in strace where we get
unwanted racy behavior after attach, but before we have a chance to set
options (the notorious post-execve SIGTRAP comes to mind), and removes
the need to track "did we set opts for this task" state in strace
internals.
While we are at it:
Make it possible to extend SEIZE in the future with more functionality
by passing non-zero 'addr' parameter. To that end, error out if 'addr'
is non-zero. PTRACE_ATTACH did not (and still does not) have such
check, and users (strace) do pass garbage there... let's avoid
repeating this mistake with SEIZE.
Set all task->ptrace bits in one operation - before this change, we were
adding PT_SEIZED and PT_PTRACE_CAP with task->ptrace |= BIT ops. This
was probably ok (not a bug), but let's be on a safer side.
Changes since v2: use (unsigned long) casts instead of (long) ones, move
PTRACE_SEIZE_DEVEL-related code to separate lines of code.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Pedro Alves <palves@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes
PT_option bits contiguous and therefore makes code in
ptrace_setoptions() much simpler.
Every PTRACE_O_TRACEevent is defined to (1 << PTRACE_EVENT_event)
instead of using explicit numeric constants, to ensure we don't mess up
relationship between bit positions and event ids.
PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
value of PT_EVENT_FLAG_SHIFT-1 is easier to use.
PT_TRACE_MASK constant is nuked, the only its use is replaced by
(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
On ptrace(PTRACE_SETOPTIONS, pid, 0, <opts>), we used to set those
option bits which are known, and then fail with -EINVAL if there are
some unknown bits in <opts>.
This is inconsistent with typical error handling, which does not change
any state if input is invalid.
This patch changes PTRACE_SETOPTIONS behavior so that in this case, we
return -EINVAL and don't change any bits in task->ptrace.
It's very unlikely that there is userspace code in the wild which will
be affected by this change: it should have the form
ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_BOGUSOPT)
where PTRACE_O_BOGUSOPT is a constant unknown to the kernel. But kernel
headers, naturally, don't contain any PTRACE_O_BOGUSOPTs, thus the only
way userspace can use one if it defines one itself. I can't see why
anyone would do such a thing deliberately.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Revelation from Peter.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@tglx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
It fixes some 80-col wordwrappings and adds some consistency.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If the system is loaded while hotplugging a CPU we might end up with a
bogus hardlockup detection. This has been seen during LTP pounder test
executed in parallel with hotplug test.
The main problem is that enable_watchdog (called when CPU is brought up)
registers perf event which periodically checks per-cpu counter
(hrtimer_interrupts), updated from a hrtimer callback, but the hrtimer
is fired from the kernel thread.
This means that while we already do check for the hard lockup the kernel
thread might be sitting on the runqueue with zillions of tasks so there
is nobody to update the value we rely on and so we KABOOM.
Let's fix this by boosting the watchdog thread priority before we wake
it up rather than when it's already running. This still doesn't handle
a case where we have the same amount of high prio FIFO tasks but that
doesn't seem to be common. The current implementation doesn't handle
that case anyway so this is not worse at least.
Unfortunately, we cannot start perf counter from the watchdog thread
because we could miss a real lock up and also we cannot start the
hrtimer watchdog_enable because we there is no way (at least I don't
know any) to start a hrtimer from a different CPU.
[dzickus@redhat.com: fix compile issue with param]
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
I just received another user's pleas for help when their init
mysteriously died. I again explained that they need to check whether it
died because of bad instruction, a segv, or something else. Which was
an annoying detour into writing a trivial C program to spawn his init
and print its exit code:
http://lists.busybox.net/pipermail/busybox/2012-January/077172.html
I hear you saying "just test it under /bin/sh". Well, the crashing init
_was_ /bin/sh.
Which prompted me to make kernel do this first step automatically. We can
print exit code, which makes it possible to see that death was from e.g.
SIGILL without writing test programs.
[akpm@linux-foundation.org: add 0x to hex number output]
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Userspace service managers/supervisors need to track their started
services. Many services daemonize by double-forking and get implicitly
re-parented to PID 1. The service manager will no longer be able to
receive the SIGCHLD signals for them, and is no longer in charge of
reaping the children with wait(). All information about the children is
lost at the moment PID 1 cleans up the re-parented processes.
With this prctl, a service manager process can mark itself as a sort of
'sub-init', able to stay as the parent for all orphaned processes
created by the started services. All SIGCHLD signals will be delivered
to the service manager.
Receiving SIGCHLD and doing wait() is in cases of a service-manager much
preferred over any possible asynchronous notification about specific
PIDs, because the service manager has full access to the child process
data in /proc and the PID can not be re-used until the wait(), the
service-manager itself is in charge of, has happened.
As a side effect, the relevant parent PID information does not get lost
by a double-fork, which results in a more elaborate process tree and
'ps' output:
before:
# ps afx
253 ? Ss 0:00 /bin/dbus-daemon --system --nofork
294 ? Sl 0:00 /usr/libexec/polkit-1/polkitd
328 ? S 0:00 /usr/sbin/modem-manager
608 ? Sl 0:00 /usr/libexec/colord
658 ? Sl 0:00 /usr/libexec/upowerd
819 ? Sl 0:00 /usr/libexec/imsettings-daemon
916 ? Sl 0:00 /usr/libexec/udisks-daemon
917 ? S 0:00 \_ udisks-daemon: not polling any devices
after:
# ps afx
294 ? Ss 0:00 /bin/dbus-daemon --system --nofork
426 ? Sl 0:00 \_ /usr/libexec/polkit-1/polkitd
449 ? S 0:00 \_ /usr/sbin/modem-manager
635 ? Sl 0:00 \_ /usr/libexec/colord
705 ? Sl 0:00 \_ /usr/libexec/upowerd
959 ? Sl 0:00 \_ /usr/libexec/udisks-daemon
960 ? S 0:00 | \_ udisks-daemon: not polling any devices
977 ? Sl 0:00 \_ /usr/libexec/packagekitd
This prctl is orthogonal to PID namespaces. PID namespaces are isolated
from each other, while a service management process usually requires the
services to live in the same namespace, to be able to talk to each
other.
Users of this will be the systemd per-user instance, which provides
init-like functionality for the user's login session and D-Bus, which
activates bus services on-demand. Both need init-like capabilities to
be able to properly keep track of the services they start.
Many thanks to Oleg for several rounds of review and insights.
[akpm@linux-foundation.org: fix comment layout and spelling]
[akpm@linux-foundation.org: add lengthy code comment from Oleg]
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb
Pull KGDB/KDB updates from Jason Wessel:
"Fixes:
- Fix KDB keyboard repeat scan codes and leaked keyboard events
- Fix kernel crash with kdb_printf() for users who compile new
kdb_printf()'s in early code
- Return all segment registers to gdb on x86_64
Features:
- KDB/KGDB hook the reboot notifier and end user can control if it
stops, detaches or does nothing (updated docs as well)
- Notify users who use CONFIG_DEBUG_RODATA to use hw breakpoints"
* tag 'for_linus-3.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
kdb: Add message about CONFIG_DEBUG_RODATA on failure to install breakpoint
kdb: Avoid using dbg_io_ops until it is initialized
kgdb,debug_core: add the ability to control the reboot notifier
KDB: Fix usability issues relating to the 'enter' key.
kgdb,debug-core,gdbstub: Hook the reboot notifier for debugger detach
kgdb: Respect that flush op is optional
kgdb: x86: Return all segment registers also in 64-bit mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem updates from Dmitry Torokhov:
"- we finally merged driver for USB version of Synaptics touchpads
(I guess most commonly found in IBM/Lenovo keyboard/touchpad combo);
- a bunch of new drivers for embedded platforms (Cypress
touchscreens, DA9052 OnKey, MAX8997-haptic, Ilitek ILI210x
touchscreens, TI touchscreen);
- input core allows clients to specify desired clock source for
timestamps on input events (EVIOCSCLOCKID ioctl);
- input core allows querying state of all MT slots for given event
code via EVIOCGMTSLOTS ioctl;
- various driver fixes and improvements."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (45 commits)
Input: ili210x - add support for Ilitek ILI210x based touchscreens
Input: altera_ps2 - use of_match_ptr()
Input: synaptics_usb - switch to module_usb_driver()
Input: convert I2C drivers to use module_i2c_driver()
Input: convert SPI drivers to use module_spi_driver()
Input: omap4-keypad - move platform_data to <linux/platform_data>
Input: kxtj9 - who_am_i check value and initial data rate fixes
Input: add driver support for MAX8997-haptic
Input: tegra-kbc - revise device tree support
Input: of_keymap - add device tree bindings for simple key matrices
Input: wacom - fix physical size calculation for 3rd-gen Bamboo
Input: twl4030-vibra - really switch from #if to #ifdef
Input: hp680_ts_input - ensure arguments to request_irq and free_irq are compatible
Input: max8925_onkey - avoid accessing input device too early
Input: max8925_onkey - allow to be used as a wakeup source
Input: atmel-wm97xx - convert to dev_pm_ops
Input: atmel-wm97xx - set driver owner
Input: add cyttsp touchscreen maintainer entry
Input: cyttsp - remove useless checks in cyttsp_probe()
Input: usbtouchscreen - add support for Data Modul EasyTouch TP 72037
...
|
|
On x86, if CONFIG_DEBUG_RODATA is set, one cannot set breakpoints
via KDB. Apparently this is a well-known problem, as at least one distribution
now ships with both KDB enabled and CONFIG_DEBUG_RODATA=y for security reasons.
This patch adds an printk message to the breakpoint failure case,
in order to provide suggestions about how to use the debugger.
Reported-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Tim Bird <tim.bird@am.sony.com>
|
|
This fixes a bug with setting a breakpoint during kdb initialization
(from kdb_cmds). Any call to kdb_printf() before the initialization
of the kgdboc serial console driver (which happens much later during
bootup than kdb_init), results in kernel panic due to the use of
dbg_io_ops before it is initialized.
Signed-off-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
|
|
Sometimes it is desirable to stop the kernel debugger before allowing
a system to reboot either with kdb or kgdb. This patch adds the
ability to turn the reboot notifier on and off or enter the debugger
and stop kernel execution before rebooting.
It is possible to change the setting after booting the kernel with the
following:
echo 1 > /sys/module/debug_core/parameters/kgdbreboot
It is also possible to change this setting using kdb / kgdb to
manipulate the variable directly.
Using KDB:
mm kgdbreboot 1
Using gdb:
set kgdbreboot=1
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
|
|
This fixes the following problems:
1) Typematic-repeat of 'enter' gives warning message
and leaks make/break if KDB exits. Repeats
look something like 0x1c 0x1c .... 0x9c
2) Use of 'keypad enter' gives warning message and
leaks the ENTER break/make code out if KDB exits.
KP ENTER repeats look someting like 0xe0 0x1c
0xe0 0x1c ... 0xe0 0x9c.
3) Lag on the order of seconds between "break" and "make" when
expecting the enter "break" code. Seen under virtualized
environments such as VMware ESX.
The existing special enter handler tries to glob the enter break code,
but this fails if the other (KP) enter was used, or if there was a key
repeat. It also fails if you mashed some keys along with enter, and
you ended up with a non-enter make or non-enter break code coming
after the enter make code. So first, we modify the handler to handle
these cases. But performing these actions on every enter is annoying
since now you can't hold ENTER down to scroll <more>d messages in
KDB. Since this special behaviour is only necessary to handle the
exiting KDB ('g' + ENTER) without leaking scancodes to the OS. This
cleanup needs to get executed anytime the kdb_main loop exits.
Tested on QEMU. Set a bp on atkbd.c to verify no scan code was leaked.
Cc: Andrei Warkentin <andreiw@vmware.com>
[jason.wessel@windriver.com: move cleanup calls to kdb_main.c]
Signed-off-by: Andrei Warkentin <andrey.warkentin@gmail.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
|
|
The gdbstub and kdb should get detached if the system is rebooting.
Calling gdbstub_exit() will set the proper debug core state and send a
message to any debugger that is connected to correctly detach.
An attached debugger will receive the exit code from
include/linux/reboot.h based on SYS_HALT, SYS_REBOOT, etc...
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
|
|
Not all kgdb I/O drivers implement a flush operation. Adjust
gdbstub_exit accordingly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
|
|
Merge first batch of patches from Andrew Morton:
"A few misc things and all the MM queue"
* emailed from Andrew Morton <akpm@linux-foundation.org>: (92 commits)
memcg: avoid THP split in task migration
thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE
memcg: clean up existing move charge code
mm/memcontrol.c: remove unnecessary 'break' in mem_cgroup_read()
mm/memcontrol.c: remove redundant BUG_ON() in mem_cgroup_usage_unregister_event()
mm/memcontrol.c: s/stealed/stolen/
memcg: fix performance of mem_cgroup_begin_update_page_stat()
memcg: remove PCG_FILE_MAPPED
memcg: use new logic for page stat accounting
memcg: remove PCG_MOVE_LOCK flag from page_cgroup
memcg: simplify move_account() check
memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
memcg: kill dead prev_priority stubs
memcg: remove PCG_CACHE page_cgroup flag
memcg: let css_get_next() rely upon rcu_read_lock()
cgroup: revert ss_id_lock to spinlock
idr: make idr_get_next() good for rcu_read_lock()
memcg: remove unnecessary thp check in page stat accounting
memcg: remove redundant returns
memcg: enum lru_list lru
...
|
|
Remove lock and unlock around css_get_next()'s call to idr_get_next().
memcg iterators (only users of css_get_next) already did rcu_read_lock(),
and its comment demands that; but add a WARN_ON_ONCE to make sure of it.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit c1e2ee2dc436 ("memcg: replace ss->id_lock with a rwlock") has now
been seen to cause the unfair behavior we should have expected from
converting a spinlock to an rwlock: softlockup in cgroup_mkdir(), whose
get_new_cssid() is waiting for the wlock, while there are 19 tasks using
the rlock in css_get_next() to get on with their memcg workload (in an
artificial test, admittedly). Yet lib/idr.c was made suitable for RCU
way back: revert that commit, restoring ss->id_lock to a spinlock.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
sync_mm_rss() can only be used for current to avoid race conditions in
iterating and clearing its per-task counters. Remove the task argument
for it and its helper function, __sync_task_rss_stat(), to avoid thinking
it can be used safely for anything other than current.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") wins a super prize for the largest number of
memory barriers entered into fast paths for one commit.
[get|put]_mems_allowed is incredibly heavy with pairs of full memory
barriers inserted into a number of hot paths. This was detected while
investigating at large page allocator slowdown introduced some time
after 2.6.32. The largest portion of this overhead was shown by
oprofile to be at an mfence introduced by this commit into the page
allocator hot path.
For extra style points, the commit introduced the use of yield() in an
implementation of what looks like a spinning mutex.
This patch replaces the full memory barriers on both read and write
sides with a sequence counter with just read barriers on the fast path
side. This is much cheaper on some architectures, including x86. The
main bulk of the patch is the retry logic if the nodemask changes in a
manner that can cause a false failure.
While updating the nodemask, a check is made to see if a false failure
is a risk. If it is, the sequence number gets bumped and parallel
allocators will briefly stall while the nodemask update takes place.
In a page fault test microbenchmark, oprofile samples from
__alloc_pages_nodemask went from 4.53% of all samples to 1.15%. The
actual results were
3.3.0-rc3 3.3.0-rc3
rc3-vanilla nobarrier-v2r1
Clients 1 UserTime 0.07 ( 0.00%) 0.08 (-14.19%)
Clients 2 UserTime 0.07 ( 0.00%) 0.07 ( 2.72%)
Clients 4 UserTime 0.08 ( 0.00%) 0.07 ( 3.29%)
Clients 1 SysTime 0.70 ( 0.00%) 0.65 ( 6.65%)
Clients 2 SysTime 0.85 ( 0.00%) 0.82 ( 3.65%)
Clients 4 SysTime 1.41 ( 0.00%) 1.41 ( 0.32%)
Clients 1 WallTime 0.77 ( 0.00%) 0.74 ( 4.19%)
Clients 2 WallTime 0.47 ( 0.00%) 0.45 ( 3.73%)
Clients 4 WallTime 0.38 ( 0.00%) 0.37 ( 1.58%)
Clients 1 Flt/sec/cpu 497620.28 ( 0.00%) 520294.53 ( 4.56%)
Clients 2 Flt/sec/cpu 414639.05 ( 0.00%) 429882.01 ( 3.68%)
Clients 4 Flt/sec/cpu 257959.16 ( 0.00%) 258761.48 ( 0.31%)
Clients 1 Flt/sec 495161.39 ( 0.00%) 517292.87 ( 4.47%)
Clients 2 Flt/sec 820325.95 ( 0.00%) 850289.77 ( 3.65%)
Clients 4 Flt/sec 1020068.93 ( 0.00%) 1022674.06 ( 0.26%)
MMTests Statistics: duration
Sys Time Running Test (seconds) 135.68 132.17
User+Sys Time Running Test (seconds) 164.2 160.13
Total Elapsed Time (seconds) 123.46 120.87
The overall improvement is small but the System CPU time is much
improved and roughly in correlation to what oprofile reported (these
performance figures are without profiling so skew is expected). The
actual number of page faults is noticeably improved.
For benchmarks like kernel builds, the overall benefit is marginal but
the system CPU time is slightly reduced.
To test the actual bug the commit fixed I opened two terminals. The
first ran within a cpuset and continually ran a small program that
faulted 100M of anonymous data. In a second window, the nodemask of the
cpuset was continually randomised in a loop.
Without the commit, the program would fail every so often (usually
within 10 seconds) and obviously with the commit everything worked fine.
With this patch applied, it also worked fine so the fix should be
functionally equivalent.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Warn about non-zero rss counters at final mmdrop.
This check will prevent reoccurences of bugs such as that fixed in "mm:
fix rss count leakage during migration".
I didn't hide this check under CONFIG_VM_DEBUG because it rather small and
rss counters cover whole page-table management, so this is a good
invariant.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
"This is _not_ all; in particular, Miklos' and Jan's stuff is not there
yet."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
ext4: initialization of ext4_li_mtx needs to be done earlier
debugfs-related mode_t whack-a-mole
hfsplus: add an ioctl to bless files
hfsplus: change finder_info to u32
hfsplus: initialise userflags
qnx4: new helper - try_extent()
qnx4: get rid of qnx4_bread/qnx4_getblk
take removal of PF_FORKNOEXEC to flush_old_exec()
trim includes in inode.c
um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
um: embed ->stub_pages[] into mmu_context
gadgetfs: list_for_each_safe() misuse
ocfs2: fix leaks on failure exits in module_init
ecryptfs: make register_filesystem() the last potential failure exit
ntfs: forgets to unregister sysctls on register_filesystem() failure
logfs: missing cleanup on register_filesystem() failure
jfs: mising cleanup on register_filesystem() failure
make configfs_pin_fs() return root dentry on success
configfs: configfs_create_dir() has parent dentry in dentry->d_parent
configfs: sanitize configfs_create()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates for 3.4 from James Morris:
"The main addition here is the new Yama security module from Kees Cook,
which was discussed at the Linux Security Summit last year. Its
purpose is to collect miscellaneous DAC security enhancements in one
place. This also marks a departure in policy for LSM modules, which
were previously limited to being standalone access control systems.
Chromium OS is using Yama, and I believe there are plans for Ubuntu,
at least.
This patchset also includes maintenance updates for AppArmor, TOMOYO
and others."
Fix trivial conflict in <net/sock.h> due to the jumo_label->static_key
rename.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits)
AppArmor: Fix location of const qualifier on generated string tables
TOMOYO: Return error if fails to delete a domain
AppArmor: add const qualifiers to string arrays
AppArmor: Add ability to load extended policy
TOMOYO: Return appropriate value to poll().
AppArmor: Move path failure information into aa_get_name and rename
AppArmor: Update dfa matching routines.
AppArmor: Minor cleanup of d_namespace_path to consolidate error handling
AppArmor: Retrieve the dentry_path for error reporting when path lookup fails
AppArmor: Add const qualifiers to generated string tables
AppArmor: Fix oops in policy unpack auditing
AppArmor: Fix error returned when a path lookup is disconnected
KEYS: testing wrong bit for KEY_FLAG_REVOKED
TOMOYO: Fix mount flags checking order.
security: fix ima kconfig warning
AppArmor: Fix the error case for chroot relative path name lookup
AppArmor: fix mapping of META_READ to audit and quiet flags
AppArmor: Fix underflow in xindex calculation
AppArmor: Fix dropping of allowed operations that are force audited
AppArmor: Add mising end of structure test to caps unpacking
...
|
|
Pull crypto update from Herbert Xu:
"* sha512 bug fixes (already in your tree).
* SHA224/SHA384 AEAD support in caam.
* X86-64 optimised version of Camellia.
* Tegra AES support.
* Bulk algorithm registration interface to make driver registration easier.
* padata race fixes.
* Misc fixes."
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (31 commits)
padata: Fix race on sequence number wrap
padata: Fix race in the serialization path
crypto: camellia - add assembler implementation for x86_64
crypto: camellia - rename camellia.c to camellia_generic.c
crypto: camellia - fix checkpatch warnings
crypto: camellia - rename camellia module to camellia_generic
crypto: tcrypt - add more camellia tests
crypto: testmgr - add more camellia test vectors
crypto: camellia - simplify key setup and CAMELLIA_ROUNDSM macro
crypto: twofish-x86_64/i586 - set alignmask to zero
crypto: blowfish-x86_64 - set alignmask to zero
crypto: serpent-sse2 - combine ablk_*_init functions
crypto: blowfish-x86_64 - use crypto_[un]register_algs
crypto: twofish-x86_64-3way - use crypto_[un]register_algs
crypto: serpent-sse2 - use crypto_[un]register_algs
crypto: serpent-sse2 - remove dead code from serpent_sse2_glue.c::serpent_sse2_init()
crypto: twofish-x86 - Remove dead code from twofish_glue_3way.c::init()
crypto: In crypto_add_alg(), 'exact' wants to be initialized to 0
crypto: caam - fix gcc 4.6 warning
crypto: Add bulk algorithm registration interface
...
|
|
Pull irq_domain support for all architectures from Grant Likely:
"Generialize powerpc's irq_host as irq_domain
This branch takes the PowerPC irq_host infrastructure (reverse mapping
from Linux IRQ numbers to hardware irq numbering), generalizes it,
renames it to irq_domain, and makes it available to all architectures.
Originally the plan has been to create an all-new irq_domain
implementation which addresses some of the powerpc shortcomings such
as not handling 1:1 mappings well, but doing that proved to be far
more difficult and invasive than generalizing the working code and
refactoring it in-place. So, this branch rips out the 'new'
irq_domain and replaces it with the modified powerpc version (in a
fully bisectable way of course). It converts all users over to the
new API and makes irq_domain selectable on any architecture.
No architecture is forced to enable irq_domain, but the infrastructure
is required for doing OpenFirmware style irq translations. It will
even work on SPARC even though SPARC has it's own mechanism for
translating irqs at boot time. MIPS, microblaze, embedded x86 and c6x
are converted too.
The resulting irq_domain code is probably still too verbose and can be
optimized more, but that can be done incrementally and is a task for
follow-on patches."
* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6: (31 commits)
dt: fix twl4030 for non-dt compile on x86
mfd: twl-core: Add IRQ_DOMAIN dependency
devicetree: Add empty of_platform_populate() for !CONFIG_OF_ADDRESS (sparc)
irq_domain: Centralize definition of irq_dispose_mapping()
irq_domain/mips: Allow irq_domain on MIPS
irq_domain/x86: Convert x86 (embedded) to use common irq_domain
ppc-6xx: fix build failure in flipper-pic.c and hlwd-pic.c
irq_domain/microblaze: Convert microblaze to use irq_domains
irq_domain/powerpc: Replace custom xlate functions with library functions
irq_domain/powerpc: constify irq_domain_ops
irq_domain/c6x: Use library of xlate functions
irq_domain/c6x: constify irq_domain structures
irq_domain/c6x: Convert c6x to use generic irq_domain support.
irq_domain: constify irq_domain_ops
irq_domain: Create common xlate functions that device drivers can use
irq_domain: Remove irq_domain_add_simple()
irq_domain: Remove 'new' irq_domain in favour of the ppc one
mfd: twl-core.c: Fix the number of interrupts managed by twl4030
of/address: add empty static inlines for !CONFIG_OF
irq_domain: Add support for base irq and hwirq in legacy mappings
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates for 3.4 from Rafael Wysocki:
"Assorted extensions and fixes including:
* Introduction of early/late suspend/hibernation device callbacks.
* Generic PM domains extensions and fixes.
* devfreq updates from Axel Lin and MyungJoo Ham.
* Device PM QoS updates.
* Fixes of concurrency problems with wakeup sources.
* System suspend and hibernation fixes."
* tag 'pm-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (43 commits)
PM / Domains: Check domain status during hibernation restore of devices
PM / devfreq: add relation of recommended frequency.
PM / shmobile: Make MTU2 driver use pm_genpd_dev_always_on()
PM / shmobile: Make CMT driver use pm_genpd_dev_always_on()
PM / shmobile: Make TMU driver use pm_genpd_dev_always_on()
PM / Domains: Introduce "always on" device flag
PM / Domains: Fix hibernation restore of devices, v2
PM / Domains: Fix handling of wakeup devices during system resume
sh_mmcif / PM: Use PM QoS latency constraint
tmio_mmc / PM: Use PM QoS latency constraint
PM / QoS: Make it possible to expose PM QoS latency constraints
PM / Sleep: JBD and JBD2 missing set_freezable()
PM / Domains: Fix include for PM_GENERIC_DOMAINS=n case
PM / Freezer: Remove references to TIF_FREEZE in comments
PM / Sleep: Add more wakeup source initialization routines
PM / Hibernate: Enable usermodehelpers in hibernate() error path
PM / Sleep: Make __pm_stay_awake() delete wakeup source timers
PM / Sleep: Fix race conditions related to wakeup source timer function
PM / Sleep: Fix possible infinite loop during wakeup source destruction
PM / Hibernate: print physical addresses consistently with other parts of kernel
...
|
|
Pull kmap_atomic cleanup from Cong Wang.
It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().
Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.
* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
drbd: remove the second argument of k[un]map_atomic()
zcache: remove the second argument of k[un]map_atomic()
gma500: remove the second argument of k[un]map_atomic()
dm: remove the second argument of k[un]map_atomic()
tomoyo: remove the second argument of k[un]map_atomic()
sunrpc: remove the second argument of k[un]map_atomic()
rds: remove the second argument of k[un]map_atomic()
net: remove the second argument of k[un]map_atomic()
mm: remove the second argument of k[un]map_atomic()
lib: remove the second argument of k[un]map_atomic()
power: remove the second argument of k[un]map_atomic()
kdb: remove the second argument of k[un]map_atomic()
udf: remove the second argument of k[un]map_atomic()
ubifs: remove the second argument of k[un]map_atomic()
squashfs: remove the second argument of k[un]map_atomic()
reiserfs: remove the second argument of k[un]map_atomic()
ocfs2: remove the second argument of k[un]map_atomic()
ntfs: remove the second argument of k[un]map_atomic()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
"It's indeed trivial -- mostly documentation updates and a bunch of
typo fixes from Masanari.
There are also several linux/version.h include removals from Jesper."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits)
kcore: fix spelling in read_kcore() comment
constify struct pci_dev * in obvious cases
Revert "char: Fix typo in viotape.c"
init: fix wording error in mm_init comment
usb: gadget: Kconfig: fix typo for 'different'
Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c"
writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
writeback: fix typo in the writeback_control comment
Documentation: Fix multiple typo in Documentation
tpm_tis: fix tis_lock with respect to RCU
Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c"
Doc: Update numastat.txt
qla4xxx: Add missing spaces to error messages
compiler.h: Fix typo
security: struct security_operations kerneldoc fix
Documentation: broken URL in libata.tmpl
Documentation: broken URL in filesystems.tmpl
mtd: simplify return logic in do_map_probe()
mm: fix comment typo of truncate_inode_pages_range
power: bq27x00: Fix typos in comment
...
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Pull workqueue changes from Tejun Heo:
"This contains only one commit which cleans up UP allocation path a
bit."
* 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: use percpu allocator for cwq on UP
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
"Out of the 8 commits, one fixes a long-standing locking issue around
tasklist walking and others are cleanups."
* 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Walk task list under tasklist_lock in cgroup_enable_task_cg_list
cgroup: Remove wrong comment on cgroup_enable_task_cg_list()
cgroup: remove cgroup_subsys argument from callbacks
cgroup: remove extra calls to find_existing_css_set
cgroup: replace tasklist_lock with rcu_read_lock
cgroup: simplify double-check locking in cgroup_attach_proc
cgroup: move struct cgroup_pidlist out from the header file
cgroup: remove cgroup_attach_task_current_cg()
|
|
exit_notify() changes ->exit_signal if the parent already did exec.
This doesn't really work, we are not going to send the signal now
if there is another live thread or the exiting task is traced. The
parent can exec before the last dies or the tracer detaches.
Move this check into do_notify_parent() which actually sends the
signal.
The user-visible change is that we do not change ->exit_signal,
and thus the exiting task is still "clone children" for
do_wait()->eligible_child(__WCLONE). Hopefully this is fine, the
current logic is racy anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
exit_notify() checks "tsk->self_exec_id != tsk->parent_exec_id"
to handle the "we have changed execution domain" case.
We can change do_thread() to always set ->exit_signal = SIGCHLD
and remove this check to simplify the code.
We could change setup_new_exec() instead, this looks more logical
because it increments ->self_exec_id. But note that de_thread()
already resets ->exit_signal if it changes the leader, let's keep
both changes close to each other.
Note that we change ->exit_signal lockless, this changes the rules.
Thereafter ->exit_signal is not stable under tasklist but this is
fine, the only possible change is OLDSIG -> SIGCHLD. This can race
with eligible_child() but the race is harmless. We can race with
reparent_leader() which changes our ->exit_signal in parallel, but
it does the same change to SIGCHLD.
The noticeable user-visible change is that the execing task is not
"visible" to do_wait()->eligible_child(__WCLONE) right after exec.
To me this looks more logical, and this is consistent with mt case.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The child must not control its ->exit_signal, it is the parent who
decides which signal the child should use for notification.
This means that CLONE_PARENT should not use "clone_flags & CSIGNAL",
the forking task is the sibling of the new process and their parent
doesn't control exit_signal in this case.
This patch uses ->exit_signal of the forking process, but perhaps
we should simply use SIGCHLD.
We read group_leader->exit_signal lockless, this can race with the
ORIGINAL_SIGNAL -> SIGCHLD transition, but this is fine.
Potentially this change allows to kill self_exec_id/parent_exec_id.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer changes for v3.4 from Ingo Molnar
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
ntp: Fix integer overflow when setting time
math: Introduce div64_long
cs5535-clockevt: Allow the MFGPT IRQ to be shared
cs5535-clockevt: Don't ignore MFGPT on SMP-capable kernels
x86/time: Eliminate unused irq0_irqs counter
clocksource: scx200_hrt: Fix the build
x86/tsc: Reduce the TSC sync check time for core-siblings
timer: Fix bad idle check on irq entry
nohz: Remove ts->Einidle checks before restarting the tick
nohz: Remove update_ts_time_stat from tick_nohz_start_idle
clockevents: Leave the broadcast device in shutdown mode when not needed
clocksource: Load the ACPI PM clocksource asynchronously
clocksource: scx200_hrt: Convert scx200 to use clocksource_register_hz
clocksource: Get rid of clocksource_calc_mult_shift()
clocksource: dbx500: convert to clocksource_register_hz()
clocksource: scx200_hrt: use pr_<level> instead of printk
time: Move common updates to a function
time: Reorder so the hot data is together
time: Remove most of xtime_lock usage in timekeeping.c
ntp: Add ntp_lock to replace xtime_locking
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes for v3.4 from Ingo Molnar
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
printk: Make it compile with !CONFIG_PRINTK
sched/x86: Fix overflow in cyc2ns_offset
sched: Fix nohz load accounting -- again!
sched: Update yield() docs
printk/sched: Introduce special printk_sched() for those awkward moments
sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer
sched: Cleanup cpu_active madness
sched: Fix load-balance wreckage
sched: Clean up parameter passing of proc_sched_autogroup_set_nice()
sched: Ditch per cgroup task lists for load-balancing
sched: Rename load-balancing fields
sched: Move load-balancing arguments into helper struct
sched/rt: Do not submit new work when PI-blocked
sched/rt: Prevent idle task boosting
sched/wait: Add __wake_up_all_locked() API
sched/rt: Document scheduler related skip-resched-check sites
sched/rt: Use schedule_preempt_disabled()
sched/rt: Add schedule_preempt_disabled()
sched/rt: Do not throttle when PI boosting
sched/rt: Keep period timer ticking when rt throttling is active
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events changes for v3.4 from Ingo Molnar:
- New "hardware based branch profiling" feature both on the kernel and
the tooling side, on CPUs that support it. (modern x86 Intel CPUs
with the 'LBR' hardware feature currently.)
This new feature is basically a sophisticated 'magnifying glass' for
branch execution - something that is pretty difficult to extract from
regular, function histogram centric profiles.
The simplest mode is activated via 'perf record -b', and the result
looks like this in perf report:
$ perf record -b any_call,u -e cycles:u branchy
$ perf report -b --sort=symbol
52.34% [.] main [.] f1
24.04% [.] f1 [.] f3
23.60% [.] f1 [.] f2
0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow
0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn
0.01% [k] _IO_vfprintf_internal [k] strchrnul
0.01% [k] __printf [k] _IO_vfprintf_internal
0.01% [k] main [k] __printf
This output shows from/to branch columns and shows the highest
percentage (from,to) jump combinations - i.e. the most likely taken
branches in the system. "branches" can also include function calls
and any other synchronous and asynchronous transitions of the
instruction pointer that are not 'next instruction' - such as system
calls, traps, interrupts, etc.
This feature comes with (hopefully intuitive) flat ascii and TUI
support in perf report.
- Various 'perf annotate' visual improvements for us assembly junkies.
It will now recognize function calls in the TUI and by hitting enter
you can follow the call (recursively) and back, amongst other
improvements.
- Multiple threads/processes recording support in perf record, perf
stat, perf top - which is activated via a comma-list of PIDs:
perf top -p 21483,21485
perf stat -p 21483,21485 -ddd
perf record -p 21483,21485
- Support for per UID views, via the --uid paramter to perf top, perf
report, etc. For example 'perf top --uid mingo' will only show the
tasks that I am running, excluding other users, root, etc.
- Jump label restructurings and improvements - this includes the
factoring out of the (hopefully much clearer) include/linux/static_key.h
generic facility:
struct static_key key = STATIC_KEY_INIT_FALSE;
...
if (static_key_false(&key))
do unlikely code
else
do likely code
...
static_key_slow_inc();
...
static_key_slow_inc();
...
The static_key_false() branch will be generated into the code with as
little impact to the likely code path as possible. the
static_key_slow_*() APIs flip the branch via live kernel code patching.
This facility can now be used more widely within the kernel to
micro-optimize hot branches whose likelihood matches the static-key
usage and fast/slow cost patterns.
- SW function tracer improvements: perf support and filtering support.
- Various hardenings of the perf.data ABI, to make older perf.data's
smoother on newer tool versions, to make new features integrate more
smoothly, to support cross-endian recording/analyzing workflows
better, etc.
- Restructuring of the kprobes code, the splitting out of 'optprobes',
and a corner case bugfix.
- Allow the tracing of kernel console output (printk).
- Improvements/fixes to user-space RDPMC support, allowing user-space
self-profiling code to extract PMU counts without performing any
system calls, while playing nice with the kernel side.
- 'perf bench' improvements
- ... and lots of internal restructurings, cleanups and fixes that made
these features possible. And, as usual this list is incomplete as
there were also lots of other improvements
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits)
perf report: Fix annotate double quit issue in branch view mode
perf report: Remove duplicate annotate choice in branch view mode
perf/x86: Prettify pmu config literals
perf report: Enable TUI in branch view mode
perf report: Auto-detect branch stack sampling mode
perf record: Add HEADER_BRANCH_STACK tag
perf record: Provide default branch stack sampling mode option
perf tools: Make perf able to read files from older ABIs
perf tools: Fix ABI compatibility bug in print_event_desc()
perf tools: Enable reading of perf.data files from different ABI rev
perf: Add ABI reference sizes
perf report: Add support for taken branch sampling
perf record: Add support for sampling taken branch
perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK
x86/kprobes: Split out optprobe related code to kprobes-opt.c
x86/kprobes: Fix a bug which can modify kernel code permanently
x86/kprobes: Fix instruction recovery on optimized path
perf: Add callback to flush branch_stack on context switch
perf: Disable PERF_SAMPLE_BRANCH_* when not supported
perf/x86: Add LBR software filter support for Intel CPUs
...
|