summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)Author
2011-12-02tick-broadcast: Stop active broadcast device when replacing itThomas Gleixner
When a better rated broadcast device is installed, then the current active device is not disabled, which results in two running broadcast devices. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
2011-12-02genirq: Fix race condition when stopping the irq threadIdo Yariv
In irq_wait_for_interrupt(), the should_stop member is verified before setting the task's state to TASK_INTERRUPTIBLE and calling schedule(). In case kthread_stop sets should_stop and wakes up the process after should_stop is checked by the irq thread but before the task's state is changed, the irq thread might never exit: kthread_stop irq_wait_for_interrupt ------------ ---------------------- ... ... while (!kthread_should_stop()) { kthread->should_stop = 1; wake_up_process(k); wait_for_completion(&kthread->exited); ... set_current_state(TASK_INTERRUPTIBLE); ... schedule(); } Fix this by checking if the thread should stop after modifying the task's state. [ tglx: Simplified it a bit ] Signed-off-by: Ido Yariv <ido@wizery.com> Link: http://lkml.kernel.org/r/1322740508-22640-1-git-send-email-ido@wizery.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org
2011-12-01trace_events_filter: Use rcu_assign_pointer() when setting ↵Tejun Heo
ftrace_event_call->filter ftrace_event_call->filter is sched RCU protected but didn't use rcu_assign_pointer(). Use it. TODO: Add proper __rcu annotation to call->filter and all its users. -v2: Use RCU_INIT_POINTER() for %NULL clearing as suggested by Eric. Link: http://lkml.kernel.org/r/20111123164949.GA29639@google.com Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@kernel.org # (2.6.39+) Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-01clocksource: Fix bug with max_deferment margin calculationYang Honggang (Joseph)
In order to leave a margin of 12.5% we should >> 3 not >> 5. CC: stable@kernel.org Signed-off-by: Yang Honggang (Joseph) <eagle.rtlinux@gmail.com> [jstultz: Modified commit subject] Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-11-29Merge branch 'pm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: Update comments describing device power management callbacks PM / Sleep: Update documentation related to system wakeup PM / Runtime: Make documentation follow the new behavior of irq_safe PM / Sleep: Correct inaccurate information in devices.txt PM / Domains: Document how PM domains are used by the PM core PM / Hibernate: Do not leak memory in error/test code paths
2011-11-29clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGRPaul Bolle
There's no Kconfig symbol GENERIC_CLOCKEVENTS_MIGR, so the check for it will always fail. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-11-28Merge branch 'for-3.2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup * 'for-3.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup_freezer: fix freezing groups with stopped tasks
2011-11-28Merge branch 'master' into x86/memblockTejun Heo
Conflicts & resolutions: * arch/x86/xen/setup.c dc91c728fd "xen: allow extra memory to be in multiple regions" 24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..." conflicted on xen_add_extra_mem() updates. The resolution is trivial as the latter just want to replace memblock_x86_reserve_range() with memblock_reserve(). * drivers/pci/intel-iommu.c 166e9278a3f "x86/ia64: intel-iommu: move to drivers/iommu/" 5dfe8660a3d "bootmem: Replace work_with_active_regions() with..." conflicted as the former moved the file under drivers/iommu/. Resolved by applying the chnages from the latter on the moved file. * mm/Kconfig 6661672053a "memblock: add NO_BOOTMEM config symbol" c378ddd53f9 "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option" conflicted trivially. Both added config options. Just letting both add their own options resolves the conflict. * mm/memblock.c d1f0ece6cdc "mm/memblock.c: small function definition fixes" ed7b56a799c "memblock: Remove memblock_memory_can_coalesce()" confliected. The former updates function removed by the latter. Resolution is trivial. Signed-off-by: Tejun Heo <tj@kernel.org>
2011-11-28Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Fix extra wakeups from __remove_hrtimer() timekeeping: add arch_offset hook to ktime_get functions clocksource: Avoid selecting mult values that might overflow when adjusted time: Improve documentation of timekeeeping_adjust()
2011-11-28Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Don't allow per cpu interrupts to be suspended
2011-11-28genirq: fix regression in irqfixup, irqpollEdward Donovan
Commit fa27271bc8d2("genirq: Fixup poll handling") introduced a regression that broke irqfixup/irqpoll for some hardware configurations. Amidst reorganizing 'try_one_irq', that patch removed a test that checked for 'action->handler' returning IRQ_HANDLED, before acting on the interrupt. Restoring this test back returns the functionality lost since 2.6.39. In the current set of tests, after 'action' is set, it must precede '!action->next' to take effect. With this and my previous patch to irq/spurious.c, c75d720fca8a, all IRQ regressions that I have encountered are fixed. Signed-off-by: Edward Donovan <edward.donovan@numble.net> Reported-and-tested-by: Rogério Brito <rbrito@ime.usp.br> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org (2.6.39+) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-24cgroup_freezer: fix freezing groups with stopped tasksMichal Hocko
2d3cbf8b (cgroup_freezer: update_freezer_state() does incorrect state transitions) removed is_task_frozen_enough and replaced it with a simple frozen call. This, however, breaks freezing for a group with stopped tasks because those cannot be frozen and so the group remains in CGROUP_FREEZING state (update_if_frozen doesn't count stopped tasks) and never reaches CGROUP_FROZEN. Let's add is_task_frozen_enough back and use it at the original locations (update_if_frozen and try_to_freeze_cgroup). Semantically we consider stopped tasks as frozen enough so we should consider both cases when testing frozen tasks. Testcase: mkdir /dev/freezer mount -t cgroup -o freezer none /dev/freezer mkdir /dev/freezer/foo sleep 1h & pid=$! kill -STOP $pid echo $pid > /dev/freezer/foo/tasks echo FROZEN > /dev/freezer/foo/freezer.state while true do cat /dev/freezer/foo/freezer.state [ "`cat /dev/freezer/foo/freezer.state`" = "FROZEN" ] && break sleep 1 done echo OK Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Tomasz Buchert <tomasz.buchert@inria.fr> Cc: Paul Menage <paul@paulmenage.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Tejun Heo <htejun@gmail.com>
2011-11-23CPU: Add right qualifiers for alloc_frozen_cpus() and cpu_hotplug_pm_sync_init()Fenghua Yu
Add __init for functions alloc_frozen_cpus() and cpu_hotplug_pm_sync_init() because they are only called during boot time. Add static for function cpu_hotplug_pm_sync_init() because its scope is limited in this file only. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-23PM / Usermodehelper: Cleanup remnants of usermodehelper_pm_callback()Srivatsa S. Bhat
usermodehelper_pm_callback() no longer exists in the kernel. There are 2 comments in kernel/kmod.c that still refer to it. Also, the patch that introduced usermodehelper_pm_callback(), #included two header files: <linux/notifier.h> and <linux/suspend.h>. But these are no longer necessary. This patch updates the comments as appropriate and removes the unnecessary header file inclusions. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-23PM / Hibernate: Refactor and simplify hibernation_snapshot() codeSrivatsa S. Bhat
The goto statements in hibernation_snapshot() are a bit complex. Refactor the code to remove some of them, thereby simplifying the implementation. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-23PM: Fix indentation and remove extraneous whitespaces in kernel/power/main.cSrivatsa S. Bhat
Lack of proper indentation of the goto statement decreases the readability of code significantly. In fact, this made me look twice at the code to check whether it really does what it should be doing. Fix this. And in the same file, there are some extra whitespaces. Get rid of them too. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-23Merge branch 'pm-freezer' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into pm-freezer * 'pm-freezer' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc: (24 commits) freezer: fix wait_event_freezable/__thaw_task races freezer: kill unused set_freezable_with_signal() dmatest: don't use set_freezable_with_signal() usb_storage: don't use set_freezable_with_signal() freezer: remove unused @sig_only from freeze_task() freezer: use lock_task_sighand() in fake_signal_wake_up() freezer: restructure __refrigerator() freezer: fix set_freezable[_with_signal]() race freezer: remove should_send_signal() and update frozen() freezer: remove now unused TIF_FREEZE freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE cgroup_freezer: prepare for removal of TIF_FREEZE freezer: clean up freeze_processes() failure path freezer: kill PF_FREEZING freezer: test freezable conditions while holding freezer_lock freezer: make freezing indicate freeze condition in effect freezer: use dedicated lock instead of task_lock() + memory barrier freezer: don't distinguish nosig tasks on thaw freezer: remove racy clear_freeze_flag() and set PF_NOFREEZE on dead tasks freezer: rename thaw_process() to __thaw_task() and simplify the implementation ...
2011-11-23PM / Hibernate: Do not leak memory in error/test code pathsRafael J. Wysocki
The hibernation core code forgets to release memory preallocated for hibernation if there's an error in its early stages or if test modes causing hibernation_snapshot() to return early are used. This causes the system to be hardly usable, because the amount of preallocated memory is usually huge. Fix this problem. Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
2011-11-23timer: Use debugobjects to catch deletion of uninitialized timersChristine Chan
del_timer_sync() calls debug_object_assert_init() to assert that a timer has been initialized before calling lock_timer_base(). lock_timer_base() would spin forever on a NULL(uninit-ed) base. The check is added to del_timer() to prevent silent failure, even though it would not get stuck in an infinite loop. [ sboyd@codeaurora.org: Remove WARN, intialize timer function] Signed-off-by: Christine Chan <cschan@codeaurora.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1320724108-20788-4-git-send-email-sboyd@codeaurora.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-23timer: Setup uninitialized timer with a stub callbackStephen Boyd
Remove the WARN_ON() in timer_fixup_activate() as we now get the debugobjects printout in the debugobjects activate check. We also assign a dummy timer callback so that if the timer is actually set to fire we don't oops. [ tglx@linutronix.de: Split out the debugobjects vs. the timer change ] Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Christine Chan <cschan@codeaurora.org> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1320724108-20788-2-git-send-email-sboyd@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-23freezer: kill unused set_freezable_with_signal()Tejun Heo
There's no in-kernel user of set_freezable_with_signal() left. Mixing TIF_SIGPENDING with kernel threads can lead to nasty corner cases as kernel threads never travel signal delivery path on their own. e.g. the current implementation is buggy in the cancelation path of __thaw_task(). It calls recalc_sigpending_and_wake() in an attempt to clear TIF_SIGPENDING but the function never clears it regardless of sigpending state. This means that signallable freezable kthreads may continue executing with !freezing() && stuck TIF_SIGPENDING, which can be troublesome. This patch removes set_freezable_with_signal() along with PF_FREEZER_NOSIG and recalc_sigpending*() calls in freezer. User tasks get TIF_SIGPENDING, kernel tasks get woken up and the spurious sigpending is dealt with in the usual signal delivery path. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com>
2011-11-22Merge branch 'writeback-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: remove vm_dirties and task->dirties writeback: hard throttle 1000+ dd on a slow USB stick mm: Make task in balance_dirty_pages() killable
2011-11-21time: Fix spelling mistakes in new commentsJohn Stultz
Fixup spelling issues caught by Richard CC: Richard Cochran <richardcochran@gmail.com> CC: Chen Jie <chenj@lemote.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-11-21time: fix bogus comment in timekeeping_get_ns_rawDan McGee
The whole point of this function is to return a value not touched by NTP; unfortunately the comment got copied wholesale without adjustment from the timekeeping_get_ns function above. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-11-21freezer: remove unused @sig_only from freeze_task()Tejun Heo
After "freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE", freezing() returns authoritative answer on whether the current task should freeze or not and freeze_task() doesn't need or use @sig_only. Remove it. While at it, rewrite function comment for freeze_task() and rename @sig_only to @user_only in try_to_freeze_tasks(). This patch doesn't cause any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Oleg Nesterov <oleg@redhat.com>
2011-11-21freezer: use lock_task_sighand() in fake_signal_wake_up()Tejun Heo
cgroup_freezer calls freeze_task() without holding tasklist_lock and, if the task is exiting, its ->sighand may be gone by the time fake_signal_wake_up() is called. Use lock_task_sighand() instead of accessing ->sighand directly. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Paul Menage <paul@paulmenage.org>
2011-11-21freezer: restructure __refrigerator()Tejun Heo
If another freeze happens before all tasks leave FROZEN state after being thawed, the freezer can see the existing FROZEN and consider the tasks to be frozen but they can clear FROZEN without checking the new freezing(). Oleg suggested restructuring __refrigerator() such that there's single condition check section inside freezer_lock and sigpending is cleared afterwards, which fixes the problem and simplifies the code. Restructure accordingly. -v2: Frozen loop exited without releasing freezer_lock. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
2011-11-21freezer: fix set_freezable[_with_signal]() raceTejun Heo
A kthread doing set_freezable*() may race with on-going PM freeze and the freezer might think all tasks are frozen while the new freezable kthread is merrily proceeding to execute code paths which aren't supposed to be executing during PM freeze. Reimplement set_freezable[_with_signal]() using __set_freezable() such that freezable PF flags are modified under freezer_lock and try_to_freeze() is called afterwards. This eliminates race condition against freezing. Note: Separated out from larger patch to resolve fix order dependency Oleg pointed out. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com>
2011-11-21freezer: remove should_send_signal() and update frozen()Tejun Heo
should_send_signal() is only used in freezer.c. Exporting them only increases chance of abuse. Open code the two users and remove it. Update frozen() to return bool. Signed-off-by: Tejun Heo <tj@kernel.org>
2011-11-21freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZETejun Heo
Using TIF_FREEZE for freezing worked when there was only single freezing condition (the PM one); however, now there is also the cgroup_freezer and single bit flag is getting clumsy. thaw_processes() is already testing whether cgroup freezing in in effect to avoid thawing tasks which were frozen by both PM and cgroup freezers. This is racy (nothing prevents race against cgroup freezing) and fragile. A much simpler way is to test actual freeze conditions from freezing() - ie. directly test whether PM or cgroup freezing is in effect. This patch adds variables to indicate whether and what type of freezing conditions are in effect and reimplements freezing() such that it directly tests whether any of the two freezing conditions is active and the task should freeze. On fast path, freezing() is still very cheap - it only tests system_freezing_cnt. This makes the clumsy dancing aroung TIF_FREEZE unnecessary and freeze/thaw operations more usual - updating state variables for the new state and nudging target tasks so that they notice the new state and comply. As long as the nudging happens after state update, it's race-free. * This allows use of freezing() in freeze_task(). Replace the open coded tests with freezing(). * p != current test is added to warning printing conditions in try_to_freeze_tasks() failure path. This is necessary as freezing() is now true for the task which initiated freezing too. -v2: Oleg pointed out that re-freezing FROZEN cgroup could increment system_freezing_cnt. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Paul Menage <paul@paulmenage.org> (for the cgroup portions)
2011-11-21cgroup_freezer: prepare for removal of TIF_FREEZETejun Heo
TIF_FREEZE will be removed soon and freezing() will directly test whether any freezing condition is in effect. Make the following changes in preparation. * Rename cgroup_freezing_or_frozen() to cgroup_freezing() and make it return bool. * Make cgroup_freezing() access task_freezer() under rcu read lock instead of task_lock(). This makes the state dereferencing racy against task moving to another cgroup; however, it was already racy without this change as ->state dereference wasn't synchronized. This will be later dealt with using attach hooks. * freezer->state is now set before trying to push tasks into the target state. -v2: Oleg pointed out that freeze_change_state() was setting freeze->state incorrectly to CGROUP_FROZEN instead of CGROUP_FREEZING. Fixed. -v3: Matt pointed out that setting CGROUP_FROZEN used to always invoke try_to_freeze_cgroup() regardless of the current state. Patch updated such that the actual freeze/thaw operations are always performed on invocation. This shouldn't make any difference unless something is broken. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Paul Menage <paul@paulmenage.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com>
2011-11-21freezer: clean up freeze_processes() failure pathTejun Heo
freeze_processes() failure path is rather messy. Freezing is canceled for workqueues and tasks which aren't frozen yet but frozen tasks are left alone and should be thawed by the caller and of course some callers (xen and kexec) didn't do it. This patch updates __thaw_task() to handle cancelation correctly and makes freeze_processes() and freeze_kernel_threads() call thaw_processes() on failure instead so that the system is fully thawed on failure. Unnecessary [suspend_]thaw_processes() calls are removed from kernel/power/hibernate.c, suspend.c and user.c. While at it, restructure error checking if clause in suspend_prepare() to be less weird. -v2: Srivatsa spotted missing removal of suspend_thaw_processes() in suspend_prepare() and error in commit message. Updated. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
2011-11-21freezer: kill PF_FREEZINGTejun Heo
With the previous changes, there's no meaningful difference between PF_FREEZING and PF_FROZEN. Remove PF_FREEZING and use PF_FROZEN instead in task_contributes_to_load(). Signed-off-by: Tejun Heo <tj@kernel.org>
2011-11-21freezer: test freezable conditions while holding freezer_lockTejun Heo
try_to_freeze_tasks() and thaw_processes() use freezable() and frozen() as preliminary tests before initiating operations on a task. These are done without any synchronization and hinder with synchronization cleanup without any real performance benefits. In try_to_freeze_tasks(), open code self test and move PF_NOFREEZE and frozen() tests inside freezer_lock in freeze_task(). thaw_processes() can simply drop freezable() test as frozen() test in __thaw_task() is enough. Note: This used to be a part of larger patch to fix set_freezable() race. Separated out to satisfy ordering among dependent fixes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com>
2011-11-21freezer: make freezing indicate freeze condition in effectTejun Heo
Currently freezing (TIF_FREEZE) and frozen (PF_FROZEN) states are interlocked - freezing is set to request freeze and when the task actually freezes, it clears freezing and sets frozen. This interlocking makes things more complex than necessary - freezing doesn't mean there's freezing condition in effect and frozen doesn't match the task actually entering and leaving frozen state (it's cleared by the thawing task). This patch makes freezing indicate that freeze condition is in effect. A task enters and stays frozen if freezing. This makes PF_FROZEN manipulation done only by the task itself and prevents wakeup from __thaw_task() leaking outside of refrigerator. The only place which needs to tell freezing && !frozen is try_to_freeze_task() to whine about tasks which don't enter frozen. It's updated to test the condition explicitly. With the change, frozen() state my linger after __thaw_task() until the task wakes up and exits fridge. This can trigger BUG_ON() in update_if_frozen(). Work it around by testing freezing() && frozen() instead of frozen(). -v2: Oleg pointed out missing re-check of freezing() when trying to clear FROZEN and possible spurious BUG_ON() trigger in update_if_frozen(). Both fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Menage <paul@paulmenage.org>
2011-11-21freezer: use dedicated lock instead of task_lock() + memory barrierTejun Heo
Freezer synchronization is needlessly complicated - it's by no means a hot path and the priority is staying unintrusive and safe. This patch makes it simply use a dedicated lock instead of piggy-backing on task_lock() and playing with memory barriers. On the failure path of try_to_freeze_tasks(), locking is moved from it to cancel_freezing(). This makes the frozen() test racy but the race here is a non-issue as the warning is printed for tasks which failed to enter frozen for 20 seconds and race on PF_FROZEN at the last moment doesn't change anything. This simplifies freezer implementation and eases further changes including some race fixes. Signed-off-by: Tejun Heo <tj@kernel.org>
2011-11-21freezer: don't distinguish nosig tasks on thawTejun Heo
There's no point in thawing nosig tasks before others. There's no ordering requirement between the two groups on thaw, which the staged thawing can't guarantee anyway. Simplify thaw_processes() by removing the distinction and collapsing thaw_tasks() into thaw_processes(). This will help further updates to freezer. Signed-off-by: Tejun Heo <tj@kernel.org>
2011-11-21freezer: remove racy clear_freeze_flag() and set PF_NOFREEZE on dead tasksTejun Heo
clear_freeze_flag() in exit_mm() is racy. Freezing can start afterwards. Remove it. Skipping freezer for exiting task will be properly implemented later. Also, freezable() was testing exit_state directly to make system freezer ignore dead tasks. Let the exiting task set PF_NOFREEZE after entering TASK_DEAD instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com>
2011-11-21freezer: rename thaw_process() to __thaw_task() and simplify the implementationTejun Heo
thaw_process() now has only internal users - system and cgroup freezers. Remove the unnecessary return value, rename, unexport and collapse __thaw_process() into it. This will help further updates to the freezer code. -v3: oom_kill grew a use of thaw_process() while this patch was pending. Convert it to use __thaw_task() for now. In the longer term, this should be handled by allowing tasks to die if killed even if it's frozen. -v2: minor style update as suggested by Matt. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Paul Menage <menage@google.com> Cc: Matt Helsley <matthltc@us.ibm.com>
2011-11-21freezer: implement and use kthread_freezable_should_stop()Tejun Heo
Writeback and thinkpad_acpi have been using thaw_process() to prevent deadlock between the freezer and kthread_stop(); unfortunately, this is inherently racy - nothing prevents freezing from happening between thaw_process() and kthread_stop(). This patch implements kthread_freezable_should_stop() which enters refrigerator if necessary but is guaranteed to return if kthread_stop() is invoked. Both thaw_process() users are converted to use the new function. Note that this deadlock condition exists for many of freezable kthreads. They need to be converted to use the new should_stop or freezable workqueue. Tested with synthetic test case. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Henrique de Moraes Holschuh <ibm-acpi@hmh.eng.br> Cc: Jens Axboe <axboe@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com>
2011-11-21freezer: unexport refrigerator() and update try_to_freeze() slightlyTejun Heo
There is no reason to export two functions for entering the refrigerator. Calling refrigerator() instead of try_to_freeze() doesn't save anything noticeable or removes any race condition. * Rename refrigerator() to __refrigerator() and make it return bool indicating whether it scheduled out for freezing. * Update try_to_freeze() to return bool and relay the return value of __refrigerator() if freezing(). * Convert all refrigerator() users to try_to_freeze(). * Update documentation accordingly. * While at it, add might_sleep() to try_to_freeze(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Christoph Hellwig <hch@infradead.org>
2011-11-21freezer: fix current->state restoration race in refrigerator()Tejun Heo
refrigerator() saves current->state before entering frozen state and restores it before returning using __set_current_state(); however, this is racy, for example, please consider the following sequence. set_current_state(TASK_INTERRUPTIBLE); try_to_freeze(); if (kthread_should_stop()) break; schedule(); If kthread_stop() races with ->state restoration, the restoration can restore ->state to TASK_INTERRUPTIBLE after kthread_stop() sets it to TASK_RUNNING but kthread_should_stop() may still see zero ->should_stop because there's no memory barrier between restoring TASK_INTERRUPTIBLE and kthread_should_stop() test. This isn't restricted to kthread_should_stop(). current->state is often used in memory barrier based synchronization and silently restoring it w/o mb breaks them. Use set_current_state() instead. Signed-off-by: Tejun Heo <tj@kernel.org>
2011-11-20Merge branch 'pm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / Suspend: Fix bug in suspend statistics update PM / Hibernate: Fix the early termination of test modes PM / shmobile: Fix build of sh7372_pm_init() for CONFIG_PM unset PM Sleep: Do not extend wakeup paths to devices with ignore_children set PM / driver core: disable device's runtime PM during shutdown PM / devfreq: correct Kconfig dependency PM / devfreq: fix use after free in devfreq_remove_device PM / shmobile: Avoid restoring the INTCS state during initialization PM / devfreq: Remove compiler error after irq.h update PM / QoS: Properly use the WARN() macro in dev_pm_qos_add_request() PM / Clocks: Only disable enabled clocks in pm_clk_suspend() ARM: mach-shmobile: sh7372 A3SP no_suspend_console fix PM / shmobile: Don't skip debugging output in pd_power_up()
2011-11-19PM / Suspend: Fix bug in suspend statistics updateSrivatsa S. Bhat
After commit 2a77c46de1e3dace73745015635ebbc648eca69c (PM / Suspend: Add statistics debugfs file for suspend to RAM) a missing pair of braces inside the state_store() function causes even invalid arguments to suspend to be wrongly treated as failed suspend attempts. Fix this. [rjw: Put the hash/subject of the buggy commit into the changelog.] Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-19hrtimer: Fix extra wakeups from __remove_hrtimer()Jeff Ohlstein
__remove_hrtimer() attempts to reprogram the clockevent device when the timer being removed is the next to expire. However, __remove_hrtimer() reprograms the clockevent *before* removing the timer from the timerqueue and thus when hrtimer_force_reprogram() finds the next timer to expire it finds the timer we're trying to remove. This is especially noticeable when the system switches to NOHz mode and the system tick is removed. The timer tick is removed from the system but the clockevent is programmed to wakeup in another HZ anyway. Silence the extra wakeup by removing the timer from the timerqueue before calling hrtimer_force_reprogram() so that we actually program the clockevent for the next timer to expire. This was broken by 998adc3 "hrtimers: Convert hrtimers to use timerlist infrastructure". Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1321660030-8520-1-git-send-email-johlstei@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-18PM / Hibernate: Fix the early termination of test modesSrivatsa S. Bhat
Commit 2aede851ddf08666f68ffc17be446420e9d2a056 (PM / Hibernate: Freeze kernel threads after preallocating memory) postponed the freezing of kernel threads to after preallocating memory for hibernation. But while doing that, the hibernation test TEST_FREEZER and the test mode HIBERNATION_TESTPROC were not moved accordingly. As a result, when using these test modes, it only goes upto the freezing of userspace and exits, when in fact it should go till the complete end of task freezing stage, namely the freezing of kernel threads as well. So, move these points of exit to appropriate places so that freezing of kernel threads is also tested while using these test harnesses. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-17timekeeping: add arch_offset hook to ktime_get functionsHector Palacios
ktime_get and ktime_get_ts were calling timekeeping_get_ns() but later they were not calling arch_gettimeoffset() so architectures using this mechanism returned 0 ns when calling these functions. This happened for example when running Busybox's ping which calls syscall(__NR_clock_gettime, CLOCK_MONOTONIC, ts) which eventually calls ktime_get. As a result the returned ping travel time was zero. CC: stable@kernel.org Signed-off-by: Hector Palacios <hector.palacios@digi.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-11-17genirq: Don't allow per cpu interrupts to be suspendedMarc Zyngier
The power management functions related to interrupts do not know (yet) about per-cpu interrupts and end up calling the wrong low-level methods to enable/disable interrupts. This leads to all kind of interesting issues (action taken on one CPU only, updating a refcount which is not used otherwise...). The workaround for the time being is simply to flag these interrupts with IRQF_NO_SUSPEND. At least on ARM, these interrupts are actually dealt with at the architecture level. Reported-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1321446459-31409-1-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-17tracing: Add entries in buffer and total entries to default output headerSteven Rostedt
Knowing the number of event entries in the ring buffer compared to the total number that were written is useful information. The latency format gives this information and there's no reason that the default format does not. This information is now added to the default header, along with the number of online CPUs: # tracer: nop # # entries-in-buffer/entries-written: 159836/64690869 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | <idle>-0 [000] ...2 49.442971: local_touch_nmi <-cpu_idle <idle>-0 [000] d..2 49.442973: enter_idle <-cpu_idle <idle>-0 [000] d..2 49.442974: atomic_notifier_call_chain <-enter_idle <idle>-0 [000] d..2 49.442976: __atomic_notifier_call_chain <-atomic_notifier The above shows that the trace contains 159836 entries, but 64690869 were written. One could figure out that there were 64531033 entries that were dropped. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-17tracing: Add irq, preempt-count and need resched info to default trace outputSteven Rostedt
People keep asking how to get the preempt count, irq, and need resched info and we keep telling them to enable the latency format. Some developers think that traces without this info is completely useless, and for a lot of tasks it is useless. The first option was to enable the latency trace as the default format, but the header for the latency format is pretty useless for most tracers and it also does the timestamp in straight microseconds from the time the trace started. This is sometimes more difficult to read as the default trace is seconds from the start of boot up. Latency format: # tracer: nop # # nop latency trace v1.1.5 on 3.2.0-rc1-test+ # -------------------------------------------------------------------- # latency: 0 us, #159771/64234230, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: -0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / delay # cmd pid ||||| time | caller # \ / ||||| \ | / migratio-6 0...2 41778231us+: rcu_note_context_switch <-__schedule migratio-6 0...2 41778233us : trace_rcu_utilization <-rcu_note_context_switch migratio-6 0...2 41778235us+: rcu_sched_qs <-rcu_note_context_switch migratio-6 0d..2 41778236us+: rcu_preempt_qs <-rcu_note_context_switch migratio-6 0...2 41778238us : trace_rcu_utilization <-rcu_note_context_switch migratio-6 0...2 41778239us+: debug_lockdep_rcu_enabled <-__schedule default format: # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | migration/0-6 [000] 50.025810: rcu_note_context_switch <-__schedule migration/0-6 [000] 50.025812: trace_rcu_utilization <-rcu_note_context_switch migration/0-6 [000] 50.025813: rcu_sched_qs <-rcu_note_context_switch migration/0-6 [000] 50.025815: rcu_preempt_qs <-rcu_note_context_switch migration/0-6 [000] 50.025817: trace_rcu_utilization <-rcu_note_context_switch migration/0-6 [000] 50.025818: debug_lockdep_rcu_enabled <-__schedule migration/0-6 [000] 50.025820: debug_lockdep_rcu_enabled <-__schedule The latency format header has latency information that is pretty meaningless for most tracers. Although some of the header is useful, and we can add that later to the default format as well. What is really useful with the latency format is the irqs-off, need-resched hard/softirq context and the preempt count. This commit adds the option irq-info which is on by default that adds this information: # tracer: nop # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | <idle>-0 [000] d..2 49.309305: cpuidle_get_driver <-cpuidle_idle_call <idle>-0 [000] d..2 49.309307: mwait_idle <-cpu_idle <idle>-0 [000] d..2 49.309309: need_resched <-mwait_idle <idle>-0 [000] d..2 49.309310: test_ti_thread_flag <-need_resched <idle>-0 [000] d..2 49.309312: trace_power_start.constprop.13 <-mwait_idle <idle>-0 [000] d..2 49.309313: trace_cpu_idle <-mwait_idle <idle>-0 [000] d..2 49.309315: need_resched <-mwait_idle If a user wants the old format, they can disable the 'irq-info' option: # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | <idle>-0 [000] 49.309305: cpuidle_get_driver <-cpuidle_idle_call <idle>-0 [000] 49.309307: mwait_idle <-cpu_idle <idle>-0 [000] 49.309309: need_resched <-mwait_idle <idle>-0 [000] 49.309310: test_ti_thread_flag <-need_resched <idle>-0 [000] 49.309312: trace_power_start.constprop.13 <-mwait_idle <idle>-0 [000] 49.309313: trace_cpu_idle <-mwait_idle <idle>-0 [000] 49.309315: need_resched <-mwait_idle Requested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>