summaryrefslogtreecommitdiffstats
path: root/include/linux/cpufreq.h
AgeCommit message (Collapse)Author
2014-04-07cpufreq: create another field .flags in cpufreq_frequency_tableViresh Kumar
Currently cpufreq frequency table has two fields: frequency and driver_data. driver_data is only for drivers' internal use and cpufreq core shouldn't use it at all. But with the introduction of BOOST frequencies, this assumption was broken and we started using it as a flag instead. There are two problems due to this: - It is against the description of this field, as driver's data is used by the core now. - if drivers fill it with -3 for any frequency, then those frequencies are never considered by cpufreq core as it is exactly same as value of CPUFREQ_BOOST_FREQ, i.e. ~2. The best way to get this fixed is by creating another field flags which will be used for such flags. This patch does that. Along with that various drivers need modifications due to the change of struct cpufreq_frequency_table. Reviewed-by: Gautham R Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-26cpufreq: Make cpufreq_notify_transition & cpufreq_notify_post_transition staticViresh Kumar
cpufreq_notify_transition() and cpufreq_notify_post_transition() shouldn't be called directly by cpufreq drivers anymore and so these should be marked static. Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-26cpufreq: Make sure frequency transitions are serializedSrivatsa S. Bhat
Whenever we change the frequency of a CPU, we call the PRECHANGE and POSTCHANGE notifiers. They must be serialized, i.e. PRECHANGE and POSTCHANGE notifiers should strictly alternate, thereby preventing two different sets of PRECHANGE or POSTCHANGE notifiers from interleaving arbitrarily. The following examples illustrate why this is important: Scenario 1: ----------- A thread reading the value of cpuinfo_cur_freq, will call __cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition() The ondemand governor can decide to change the frequency of the CPU at the same time and hence it can end up sending the notifications via ->target(). If the notifiers are not serialized, the following sequence can occur: - PRECHANGE Notification for freq A (from cpuinfo_cur_freq) - PRECHANGE Notification for freq B (from target()) - Freq changed by target() to B - POSTCHANGE Notification for freq B - POSTCHANGE Notification for freq A We can see from the above that the last POSTCHANGE Notification happens for freq A but the hardware is set to run at freq B. Where would we break then?: adjust_jiffies() in cpufreq.c & cpufreq_callback() in arch/arm/kernel/smp.c (which also adjusts the jiffies). All the loops_per_jiffy calculations will get messed up. Scenario 2: ----------- The governor calls __cpufreq_driver_target() to change the frequency. At the same time, if we change scaling_{min|max}_freq from sysfs, it will end up calling the governor's CPUFREQ_GOV_LIMITS notification, which will also call __cpufreq_driver_target(). And hence we end up issuing concurrent calls to ->target(). Typically, platforms have the following logic in their ->target() routines: (Eg: cpufreq-cpu0, omap, exynos, etc) A. If new freq is more than old: Increase voltage B. Change freq C. If new freq is less than old: decrease voltage Now, if the two concurrent calls to ->target() are X and Y, where X is trying to increase the freq and Y is trying to decrease it, we get the following race condition: X.A: voltage gets increased for larger freq Y.A: nothing happens Y.B: freq gets decreased Y.C: voltage gets decreased X.B: freq gets increased X.C: nothing happens Thus we can end up setting a freq which is not supported by the voltage we have set. That will probably make the clock to the CPU unstable and the system might not work properly anymore. This patch introduces a set of synchronization primitives to serialize frequency transitions, which are to be used as shown below: cpufreq_freq_transition_begin(); //Perform the frequency change cpufreq_freq_transition_end(); The _begin() call sends the PRECHANGE notification whereas the _end() call sends the POSTCHANGE notification. Also, all the necessary synchronization is handled within these calls. In particular, even drivers which set the ASYNC_NOTIFICATION flag can also use these APIs for performing frequency transitions (ie., you can call _begin() from one task, and call the corresponding _end() from a different task). The actual synchronization underneath is not that complicated: The key challenge is to allow drivers to begin the transition from one thread and end it in a completely different thread (this is to enable drivers that do asynchronous POSTCHANGE notification from bottom-halves, to also use the same interface). To achieve this, a 'transition_ongoing' flag, a 'transition_lock' spinlock and a wait-queue are added per-policy. The flag and the wait-queue are used in conjunction to create an "uninterrupted flow" from _begin() to _end(). The spinlock is used to ensure that only one such "flow" is in flight at any given time. Put together, this provides us all the necessary synchronization. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20cpufreq: Add stop CPU callback to cpufreq_driver interfaceDirk Brandewie
This callback allows the driver to do clean up before the CPU is completely down and its state cannot be modified. This is used by the intel_pstate driver to reduce the requested P state prior to the core going away. This is required because the requested P state of the offline core is used to select the package P state. This effectively sets the floor package P state to the requested P state on the offline core. Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> [rjw: Minor modifications] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-19cpufreq: remove unused notifier: CPUFREQ_{SUSPENDCHANGE|RESUMECHANGE}Viresh Kumar
Two cpufreq notifiers CPUFREQ_RESUMECHANGE and CPUFREQ_SUSPENDCHANGE have not been used for some time, so remove them to clean up code a bit. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-12cpufreq: Remove cpufreq_generic_exit()Viresh Kumar
cpufreq_generic_exit() is empty now and can be deleted. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-12cpufreq: add 'freq_table' in struct cpufreq_policyViresh Kumar
freq table is not per CPU but per policy, so it makes more sense to keep it within struct cpufreq_policy instead of a per-cpu variable. This patch does it. Over that, there is no need to set policy->freq_table to NULL in ->exit(), as policy structure is going to be freed soon. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-06cpufreq: Implement cpufreq_generic_suspend()Viresh Kumar
Multiple platforms need to set CPUs to a particular frequency before suspending the system, so provide a common infrastructure for them. Those platforms only need to point their ->suspend callback pointers to the generic routine. Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-06cpufreq: suspend governors on system suspend/hibernateViresh Kumar
This patch adds cpufreq suspend/resume calls to dpm_{suspend|resume}() for handling suspend/resume of cpufreq governors. Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found an issue where the tunables configuration for clusters/sockets with non-boot CPUs was lost after system suspend/resume, as we were notifying governors with CPUFREQ_GOV_POLICY_EXIT on removal of the last CPU for that policy which caused the tunables memory to be freed. This is fixed by preventing any governor operations from being carried out between the device suspend and device resume stages of system suspend and resume, respectively. We could have added these callbacks at dpm_{suspend|resume}_noirq() level, but there is an additional problem that the majority of I/O devices is already suspended at that point and if cpufreq drivers want to change the frequency before suspending, then that not be possible on some platforms (which depend on peripherals like i2c, regulators, etc). Reported-and-tested-by: Lan Tianyu <tianyu.lan@intel.com> Reported-by: Jinhyuk Choi <jinchoi@broadcom.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-17cpufreq: Add boost frequency support in coreLukasz Majewski
This commit adds boost frequency support in cpufreq core (Hardware & Software). Some SoCs (like Exynos4 - e.g. 4x12) allow setting frequency above its normal operation limits. Such mode shall be only used for a short time. Overclocking (boost) support is essentially provided by platform dependent cpufreq driver. This commit unifies support for SW and HW (Intel) overclocking solutions in the core cpufreq driver. Previously the "boost" sysfs attribute was defined in the ACPI processor driver code. By default boost is disabled. One global attribute is available at: /sys/devices/system/cpu/cpufreq/boost. It only shows up when cpufreq driver supports overclocking. Under the hood frequencies dedicated for boosting are marked with a special flag (CPUFREQ_BOOST_FREQ) at driver's frequency table. It is the user's concern to enable/disable overclocking with a proper call to sysfs. The cpufreq_boost_trigger_state() function is defined non static on purpose. It is used later with thermal subsystem to provide automatic enable/disable of the BOOST feature. Signed-off-by: Lukasz Majewski <l.majewski@samsung.com> Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-17cpufreq: introduce cpufreq_generic_get() routineViresh Kumar
CPUFreq drivers that use clock frameworks interface,i.e. clk_get_rate(), to get CPUs clk rate, have similar sort of code used in most of them. This patch adds a generic ->get() which will do the same thing for them. All those drivers are required to now is to set .get to cpufreq_generic_get() and set their clk pointer in policy->clk during ->init(). Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Acked-by: Shawn Guo <shawn.guo@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Shawn Guo <shawn.guo@linaro.org> Acked-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-17cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properlyViresh Kumar
There are several problems with cpufreq stats in the way it handles cpufreq_unregister_driver() and suspend/resume.. - We must not lose data collected so far when suspend/resume happens and so stats directories must not be removed/allocated during these operations, which is done currently. - cpufreq_stat has registered notifiers with both cpufreq and hotplug. It adds sysfs stats directory with a cpufreq notifier: CPUFREQ_NOTIFY and removes this directory with a notifier from hotplug core. In case cpufreq_unregister_driver() is called (on rmmod cpufreq driver), stats directories per cpu aren't removed as CPUs are still online. The only call cpufreq_stats gets is cpufreq_stats_update_policy_cpu() for all CPUs except the last of each policy. And pointer to stat information is stored in the entry for last CPU in the per-cpu cpufreq_stats_table. But policy structure would be freed inside cpufreq core and so that will result in memory leak inside cpufreq stats (as we are never freeing memory for stats). Now if we again insert the module cpufreq_register_driver() will be called and we will again allocate stats data and put it on for first CPU of every policy. In case we only have a single CPU per policy, we will return with a error from cpufreq_stats_create_table() due to this code: if (per_cpu(cpufreq_stats_table, cpu)) return -EBUSY; And so probably cpufreq stats directory would not show up anymore (as it was added inside last policies->kobj which doesn't exist anymore). I haven't tested it, though. Also the values in stats files wouldn't be refreshed as we are using the earlier stats structure. - CPUFREQ_NOTIFY is called from cpufreq_set_policy() which is called for scenarios where we don't really want cpufreq_stat_notifier_policy() to get called. For example whenever we are changing anything related to a policy: min/max/current freq, etc. cpufreq_set_policy() is called and so cpufreq stats is notified. Where we don't do any useful stuff other than simply returning with -EBUSY from cpufreq_stats_create_table(). And so this isn't the right notifier that cpufreq stats.. Due to all above reasons this patch does following changes: - Add new notifiers CPUFREQ_CREATE_POLICY and CPUFREQ_REMOVE_POLICY, which are only called when policy is created/destroyed. They aren't called for suspend/resume paths.. - Use these notifiers in cpufreq_stat_notifier_policy() to create/destory stats sysfs entries. And so cpufreq_unregister_driver() or suspend/resume shouldn't be a problem for cpufreq_stats. - Return early from cpufreq_stat_cpu_callback() for suspend/resume sequence, so that we don't free stats structure. Acked-by: Nicolas Pitre <nico@linaro.org> Tested-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-06cpufreq: Make sure CPU is running on a freq from freq-tableViresh Kumar
Sometimes boot loaders set CPU frequency to a value outside of frequency table present with cpufreq core. In such cases CPU might be unstable if it has to run on that frequency for long duration of time and so its better to set it to a frequency which is specified in freq-table. This also makes cpufreq stats inconsistent as cpufreq-stats would fail to register because current frequency of CPU isn't found in freq-table. Because we don't want this change to affect boot process badly, we go for the next freq which is >= policy->cur ('cur' must be set by now, otherwise we will end up setting freq to lowest of the table as 'cur' is initialized to zero). In case current frequency doesn't match any frequency from freq-table, we throw warnings to user, so that user can get this fixed in their bootloaders or freq-tables. Reported-by: Carlos Hernandez <ceh@ti.com> Reported-and-tested-by: Nishanth Menon <nm@ti.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-06cpufreq: Mark ARM drivers with CPUFREQ_NEED_INITIAL_FREQ_CHECK flagViresh Kumar
Sometimes boot loaders set CPU frequency to a value outside of frequency table present with cpufreq core. In such cases CPU might be unstable if it has to run on that frequency for long duration of time and so its better to set it to a frequency which is specified in frequency table. On some systems we can't really say what frequency we're running at the moment and so for these we shouldn't check if we are running at a frequency present in frequency table. And so we really can't force this for all the cpufreq drivers. Hence we are created another flag here: CPUFREQ_NEED_INITIAL_FREQ_CHECK that will be marked by platforms which want to go for this check at boot time. Initially this is done for all ARM platforms but others may follow if required. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-06cpufreq: Introduce cpufreq_notify_post_transition()Viresh Kumar
This introduces a new routine cpufreq_notify_post_transition() which can be used to send POSTCHANGE notification for new freq with or without both {PRE|POST}CHANGE notifications for last freq. This is useful at multiple places, especially for sending transition failure notifications. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-12-08Revert "cpufreq: suspend governors on system suspend/hibernate"Rafael J. Wysocki
Commit 5a87182aa21d (cpufreq: suspend governors on system suspend/hibernate) causes hibernation problems to happen on Bjørn Mork's and Paul Bolle's systems, so revert it. Fixes: 5a87182aa21d (cpufreq: suspend governors on system suspend/hibernate) Reported-by: Bjørn Mork <bjorn@mork.no> Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-12-06Merge branches 'pm-cpuidle' and 'pm-cpufreq'Rafael J. Wysocki
* pm-cpuidle: cpuidle: Check for dev before deregistering it. intel_idle: Fixed C6 state on Avoton/Rangeley processors * pm-cpufreq: cpufreq: fix garbage kobjects on errors during suspend/resume cpufreq: suspend governors on system suspend/hibernate
2013-11-28cpufreq: suspend governors on system suspend/hibernateViresh Kumar
This patch adds cpufreq suspend/resume calls to dpm_{suspend|resume}_noirq() for handling suspend/resume of cpufreq governors. Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found anr issue where tunables configuration for clusters/sockets with non-boot CPUs was getting lost after suspend/resume, as we were notifying governors with CPUFREQ_GOV_POLICY_EXIT on removal of the last cpu for that policy and so deallocating memory for tunables. This is fixed by this patch as we don't allow any operation on governors after device suspend and before device resume now. Reported-and-tested-by: Lan Tianyu <tianyu.lan@intel.com> Reported-by: Jinhyuk Choi <jinchoi@broadcom.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [rjw: Changelog, minor cleanups] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-11-15Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm updates from Dave Airlie: "This is a combo of -next and some -fixes that came in in the intervening time. Highlights: New drivers: ARM Armada driver for Marvell Armada 510 SOCs Intel: Broadwell initial support under a default off switch, Stereo/3D HDMI mode support Valleyview improvements Displayport improvements Haswell fixes initial mipi dsi panel support CRC support for debugging build with CONFIG_FB=n Radeon: enable DPM on a number of GPUs by default secondary GPU powerdown support enable HDMI audio by default Hawaii support Nouveau: dynamic pm code infrastructure reworked, does nothing major yet GK208 modesetting support MSI fixes, on by default again PMPEG improvements pageflipping fixes GMA500: minnowboard SDVO support VMware: misc fixes MSM: prime, plane and rendernodes support Tegra: rearchitected to put the drm driver into the drm subsystem. HDMI and gr2d support for tegra 114 SoC QXL: oops fix, and multi-head fixes DRM core: sysfs lifetime fixes client capability ioctl further cleanups to device midlayer more vblank timestamp fixes" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (789 commits) drm/nouveau: do not map evicted vram buffers in nouveau_bo_vma_add drm/nvc0-/gr: shift wrapping bug in nvc0_grctx_generate_r406800 drm/nouveau/pwr: fix missing mutex unlock in a failure path drm/nv40/therm: fix slowing down fan when pstate undefined drm/nv11-: synchronise flips to vblank, unless async flip requested drm/nvc0-: remove nasty fifo swmthd hack for flip completion method drm/nv10-: we no longer need to create nvsw object on user channels drm/nouveau: always queue flips relative to kernel channel activity drm/nouveau: there is no need to reserve/fence the new fb when flipping drm/nouveau: when bailing out of a pushbuf ioctl, do not remove previous fence drm/nouveau: allow nouveau_fence_ref() to be a noop drm/nvc8/mc: msi rearm is via the nvc0 method drm/ttm: Fix vma page_prot bit manipulation drm/vmwgfx: Fix a couple of compile / sparse warnings and errors drm/vmwgfx: Resource evict fixes drm/edid: compare actual vrefresh for all modes for quirks drm: shmob_drm: Convert to clk_prepare/unprepare drm/nouveau: fix 32-bit build drm/i915/opregion: fix build error on CONFIG_ACPI=n Revert "drm/radeon/audio: don't set speaker allocation on DCE4+" ...
2013-10-31cpufreq: distinguish drivers that do asynchronous notificationsViresh Kumar
There are few special cases like exynos5440 which doesn't send POSTCHANGE notification from their ->target() routine and call some kind of bottom halves for doing this work, work/tasklet/etc.. From which they finally send POSTCHANGE notification. Its better if we distinguish them from other cpufreq drivers in some way so that core can handle them specially. So this patch introduces another flag: CPUFREQ_ASYNC_NOTIFICATION, which will be set by such drivers. This also changes exynos5440-cpufreq.c and powernow-k8 in order to set this flag. Acked-by: Amit Daniel Kachhap <amit.daniel@samsung.com> Acked-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-25cpufreq: create per policy rwsem instead of per CPU cpu_policy_rwsemviresh kumar
We have per-CPU cpu_policy_rwsem for cpufreq core, but we never use all of them. We always use rwsem of policy->cpu and so we can actually make this rwsem per policy instead. This patch does this change. With this change other tricky situations are also avoided now, like which lock to take while we are changing policy->cpu, etc. Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-25cpufreq: Implement light weight ->target_index() routineViresh Kumar
Currently, the prototype of cpufreq_drivers target routines is: int target(struct cpufreq_policy *policy, unsigned int target_freq, unsigned int relation); And most of the drivers call cpufreq_frequency_table_target() to get a valid index of their frequency table which is closest to the target_freq. And they don't use target_freq and relation after that. So, it makes sense to just do this work in cpufreq core before calling cpufreq_frequency_table_target() and simply pass index instead. But this can be done only with drivers which expose their frequency table with cpufreq core. For others we need to stick with the old prototype of target() until those drivers are converted to expose frequency tables. This patch implements the new light weight prototype for target_index() routine. It looks like this: int target_index(struct cpufreq_policy *policy, unsigned int index); CPUFreq core will call cpufreq_frequency_table_target() before calling this routine and pass index to it. Because CPUFreq core now requires to call routines present in freq_table.c CONFIG_CPU_FREQ_TABLE must be enabled all the time. This also marks target() interface as deprecated. So, that new drivers avoid using it. And Documentation is updated accordingly. It also converts existing .target() to newly defined light weight .target_index() routine for many driver. Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Russell King <linux@arm.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>
2013-10-18cpufreq: Add dummy cpufreq_cpu_get/put for CONFIG_CPU_FREQ=nDaniel Vetter
The drm/i915 driver wants to adjust it's own power policies using the cpu policies as a guideline (we can implicitly boost the cpus through the gpus on some platforms). To avoid a dreaded select (since a depends will leave users wondering where where their driver has gone too) add dummy functions. Reported-by: kbuild test robot <fengguang.wu@intel.com> Cc: kbuild test robot <fengguang.wu@intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: cpufreq@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-10-16cpufreq: create cpufreq_generic_init() routineViresh Kumar
Many CPUFreq drivers for SMP system (where all cores share same clock lines), do similar stuff in their ->init() part. This patch creates a generic routine in cpufreq core which can be used by these so that we can remove some redundant code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16cpufreq: define generic .attr, .exit() and .verify() routinesViresh Kumar
Most of the CPUFreq drivers do similar things in .exit() and .verify() routines and .attr. So its better if we have generic routines for them which can be used by cpufreq drivers then. This patch introduces generic .attr, .exit() and .verify() cpufreq drivers. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16cpufreq: add new routine cpufreq_verify_within_cpu_limits()Viresh Kumar
Most of the users of cpufreq_verify_within_limits() calls it for limiting with min/max from policy->cpuinfo. We can make that code simple by introducing another routine which will do this for them automatically. This patch adds another routine cpufreq_verify_within_cpu_limits() and updates others to use it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16cpufreq: use cpufreq_driver->flags to mark CPUFREQ_HAVE_GOVERNOR_PER_POLICYViresh Kumar
Use cpufreq_driver->flags to mark CPUFREQ_HAVE_GOVERNOR_PER_POLICY instead of a separate field within cpufreq_driver. This will save some bytes of memory. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16cpufreq: rewrite cpufreq_driver->flags using shift operatorViresh Kumar
Currently cpufreq_driver's flags are defined directly using 0x1, 0x2, 0x4, 0x8, etc.. As the list grows it becomes less readable.. Use bitwise shift operator << to generate these numbers for respective positions. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-30cpufreq: Add new helper cpufreq_table_validate_and_show()Viresh Kumar
Almost every cpufreq driver is required to validate its frequency table with: cpufreq_frequency_table_cpuinfo() and then expose it to cpufreq core with: cpufreq_frequency_table_get_attr(). This patch creates another helper routine cpufreq_table_validate_and_show() that will do both these steps in a single call and will return 0 for success, error otherwise. This also fixes potential bugs in cpufreq drivers where people have called cpufreq_frequency_table_get_attr() before calling cpufreq_frequency_table_cpuinfo(), as the later may fail. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10Revert "cpufreq: make sure frequency transitions are serialized"Rafael J. Wysocki
Commit 7c30ed5 (cpufreq: make sure frequency transitions are serialized) attempted to serialize frequency transitions by adding checks to the CPUFREQ_PRECHANGE and CPUFREQ_POSTCHANGE notifications. However, it assumed that the notifications will always originate from the driver's .target() callback, but they also can be triggered by cpufreq_out_of_sync() and that leads to warnings like this on some systems: WARNING: CPU: 0 PID: 14543 at drivers/cpufreq/cpufreq.c:317 __cpufreq_notify_transition+0x238/0x260() In middle of another frequency transition accompanied by a call trace similar to this one: [<ffffffff81720daa>] dump_stack+0x46/0x58 [<ffffffff8106534c>] warn_slowpath_common+0x8c/0xc0 [<ffffffff815b8560>] ? acpi_cpufreq_target+0x320/0x320 [<ffffffff81065436>] warn_slowpath_fmt+0x46/0x50 [<ffffffff815b1ec8>] __cpufreq_notify_transition+0x238/0x260 [<ffffffff815b33be>] cpufreq_notify_transition+0x3e/0x70 [<ffffffff815b345d>] cpufreq_out_of_sync+0x6d/0xb0 [<ffffffff815b370c>] cpufreq_update_policy+0x10c/0x160 [<ffffffff815b3760>] ? cpufreq_update_policy+0x160/0x160 [<ffffffff81413813>] cpufreq_set_cur_state+0x8c/0xb5 [<ffffffff814138df>] processor_set_cur_state+0xa3/0xcf [<ffffffff8158e13c>] thermal_cdev_update+0x9c/0xb0 [<ffffffff8159046a>] step_wise_throttle+0x5a/0x90 [<ffffffff8158e21f>] handle_thermal_trip+0x4f/0x140 [<ffffffff8158e377>] thermal_zone_device_update+0x57/0xa0 [<ffffffff81415b36>] acpi_thermal_check+0x2e/0x30 [<ffffffff81415ca0>] acpi_thermal_notify+0x40/0xdc [<ffffffff813e7dbd>] acpi_device_notify+0x19/0x1b [<ffffffff813f8241>] acpi_ev_notify_dispatch+0x41/0x5c [<ffffffff813e3fbe>] acpi_os_execute_deferred+0x25/0x32 [<ffffffff81081060>] process_one_work+0x170/0x4a0 [<ffffffff81082121>] worker_thread+0x121/0x390 [<ffffffff81082000>] ? manage_workers.isra.20+0x170/0x170 [<ffffffff81088fe0>] kthread+0xc0/0xd0 [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0 [<ffffffff8173582c>] ret_from_fork+0x7c/0xb0 [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0 For this reason, revert commit 7c30ed5 along with the fix 266c13d (cpufreq: Fix serialization of frequency transitions) on top of it and we will revisit the serialization problem later. Reported-by: Alessandro Bono <alessandro.bono@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writesSrivatsa S. Bhat
Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary and partial solution to the race condition between writing to a cpufreq sysfs file and taking a CPU offline. Now that we have a proper and complete solution to that problem, remove the temporary fix. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: serialize calls to __cpufreq_governor()Viresh Kumar
We can't take a big lock around __cpufreq_governor() as this causes recursive locking for some cases. But calls to this routine must be serialized for every policy. Otherwise we can see some unpredictable events. For example, consider following scenario: __cpufreq_remove_dev() __cpufreq_governor(policy, CPUFREQ_GOV_STOP); policy->governor->governor(policy, CPUFREQ_GOV_STOP); cpufreq_governor_dbs() case CPUFREQ_GOV_STOP: mutex_destroy(&cpu_cdbs->timer_mutex) cpu_cdbs->cur_policy = NULL; <PREEMPT> store() __cpufreq_set_policy() __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); policy->governor->governor(policy, CPUFREQ_GOV_LIMITS); case CPUFREQ_GOV_LIMITS: mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex) if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL And so store() will eventually result in a crash if cur_policy is NULL at this point. Introduce an additional variable which would guarantee serialization here. Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-10cpufreq: Drop the owner field from struct cpufreq_driverViresh Kumar
We don't need to set .owner = THIS_MODULE any more in cpufreq drivers as this field isn't used any more by the cpufreq core. This patch removes it and updates all dependent drivers accordingly. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-10cpufreq: Store cpufreq policies in a listLukasz Majewski
Policies available in the cpufreq framework are now linked together. They are accessible via cpufreq_policy_list defined in the cpufreq core. [rjw: Fix from Yinghai Lu folded in] Signed-off-by: Lukasz Majewski <l.majewski@samsung.com> Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07cpufreq: Give consistent names to cpufreq_policy objectsViresh Kumar
They are called policy, cur_policy, new_policy, data, etc. Just call them policy wherever possible. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07cpufreq: Re-arrange declarations in cpufreq.hViresh Kumar
They are pretty much mixed up. Although generic headers are present, definitions/declarations are present outside of them too ... This patch just moves stuff up and down to make it look better and consistent. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07cpufreq: Clean up header files included in the coreViresh Kumar
This patch addresses the following issues in the header files in the cpufreq core: - Include headers in ascending order, so that we don't add same many times by mistake. - <asm/> must be included after <linux/>, so that they override whatever they need to. - Remove unnecessary includes. - Don't include files already included by cpufreq.h or cpufreq_governor.h. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-26cpufreq: Remove unused function __cpufreq_driver_getavg()Stratos Karafotis
The target frequency calculation method in the ondemand governor has changed and it is now independent of the measured average frequency. Consequently, the __cpufreq_driver_getavg() function and getavg member of struct cpufreq_driver are not used any more, so drop them. [rjw: Changelog] Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-04cpufreq: Fix serialization of frequency transitionsViresh Kumar
Commit 7c30ed ("cpufreq: make sure frequency transitions are serialized") interacts poorly with systems that have a single core freqency for all cores. On such systems we have a single policy for all cores with several CPUs. When we do a frequency transition the governor calls the pre and post change notifiers which causes cpufreq_notify_transition() per CPU. Since the policy is the same for all of them all CPUs after the first and the warnings added are generated by checking a per-policy flag the warnings will be triggered for all cores after the first. Fix this by allowing notifier to be called for n times. Where n is the number of cpus in policy->cpus. Reported-and-tested-by: Mark Brown <broonie@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-27acpi-cpufreq: Add new sysfs attribute freqdomain_cpusLan Tianyu
Commits fcf8058 (cpufreq: Simplify cpufreq_add_dev()) and aa77a52 (cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init()) changed the contents of the "related_cpus" sysfs attribute on systems where acpi-cpufreq is used and user space can't get the list of CPUs which are in the same hardware coordination CPU domain (provided by the ACPI AML method _PSD) via "related_cpus" any more. To make up for that loss add a new sysfs attribute "freqdomian_cpus" for the acpi-cpufreq driver which exposes the list of CPUs in the same domain regardless of whether it is coordinated by hardware or software. [rjw: Changelog, documentation] References: https://bugzilla.kernel.org/show_bug.cgi?id=58761 Reported-by: Jean-Philippe Halimi <jean-philippe.halimi@exascale-computing.eu> Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-27cpufreq: make sure frequency transitions are serializedViresh Kumar
Whenever we are changing frequency of a cpu, we are calling PRECHANGE and POSTCHANGE notifiers. They must be serialized. i.e. PRECHANGE or POSTCHANGE shouldn't be called twice contiguously. This can happen due to bugs in users of __cpufreq_driver_target() or actual cpufreq drivers who are sending these notifiers. This patch adds some protection against this. Now, we keep track of the last transaction and see if something went wrong. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-21cpufreq: Fix minor formatting issuesViresh Kumar
There were a few noticeable formatting issues in core cpufreq code. This cleans them up to make code look better. The changes include: - Whitespace cleanup. - Rearrangements of code. - Multiline comments fixes. - Formatting changes to fit 80 columns. Copyright information in cpufreq.c is also updated to include my name for 2013. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-21cpufreq: Fix governor start/stop race conditionXiaoguang Chen
Cpufreq governors' stop and start operations should be carried out in sequence. Otherwise, there will be unexpected behavior, like in the example below. Suppose there are 4 CPUs and policy->cpu=CPU0, CPU1/2/3 are linked to CPU0. The normal sequence is: 1) Current governor is userspace. An application tries to set the governor to ondemand. It will call __cpufreq_set_policy() in which it will stop the userspace governor and then start the ondemand governor. 2) Current governor is userspace. The online of CPU3 runs on CPU0. It will call cpufreq_add_policy_cpu() in which it will first stop the userspace governor, and then start it again. If the sequence of the above two cases interleaves, it becomes: 1) Application stops userspace governor 2) Hotplug stops userspace governor which is a problem, because the governor shouldn't be stopped twice in a row. What happens next is: 3) Application starts ondemand governor 4) Hotplug starts a governor In step 4, the hotplug is supposed to start the userspace governor, but now the governor has been changed by the application to ondemand, so the ondemand governor is started once again, which is incorrect. The solution is to prevent policy governors from being stopped multiple times in a row. A governor should only be stopped once for one policy. After it has been stopped, no more governor stop operations should be executed. Also add a mutex to serialize governor operations. [rjw: Changelog. And you owe me a beverage of my choice.] Signed-off-by: Xiaoguang Chen <chenxg@marvell.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-04cpufreq: rename index as driver_data in cpufreq_frequency_tableViresh Kumar
The "index" field of struct cpufreq_frequency_table was never an index and isn't used at all by the cpufreq core. It only is useful for cpufreq drivers for their internal purposes. Many people nowadays blindly set it in ascending order with the assumption that the core will use it, which is a mistake. Rename it to "driver_data" as that's what its purpose is. All of its users are updated accordingly. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-27cpufreq: Don't create empty /sys/devices/system/cpu/cpufreq directoryViresh Kumar
When we don't have any file in cpu/cpufreq directory we shouldn't create it. Specially with the introduction of per-policy governor instance patchset, even governors are moved to cpu/cpu*/cpufreq/governor-name directory and so this directory is just not required. Lets have it only when required. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-27cpufreq: Move get_cpu_idle_time() to cpufreq.cViresh Kumar
Governors other than ondemand and conservative can also use get_cpu_idle_time() and they aren't required to compile cpufreq_governor.c. So, move these independent routines to cpufreq.c instead. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-27cpufreq: governors: Move get_governor_parent_kobj() to cpufreq.cViresh Kumar
get_governor_parent_kobj() can be used by any governor, generic cpufreq governors or platform specific ones and so must be present in cpufreq.c instead of cpufreq_governor.c. This patch moves it to cpufreq.c. This also adds EXPORT_SYMBOL_GPL(get_governor_parent_kobj) so that modules can use this function too. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-02cpufreq: Notify all policy->cpus in cpufreq_notify_transition()Viresh Kumar
policy->cpus contains all online cpus that have single shared clock line. And their frequencies are always updated together. Many SMP system's cpufreq drivers take care of this in individual drivers but the best place for this code is in cpufreq core. This patch modifies cpufreq_notify_transition() to notify frequency change for all cpus in policy->cpus and hence updates all users of this API. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Stephen Warren <swarren@nvidia.com> Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01cpufreq: governor: Implement per policy instances of governorsViresh Kumar
Currently, there can't be multiple instances of single governor_type. If we have a multi-package system, where we have multiple instances of struct policy (per package), we can't have multiple instances of same governor. i.e. We can't have multiple instances of ondemand governor for multiple packages. Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/ governor-name/. Which again reflects that there can be only one instance of a governor_type in the system. This is a bottleneck for multicluster system, where we want different packages to use same governor type, but with different tunables. This patch uses the infrastructure provided by earlier patch and implements init/exit routines for ondemand and conservative governors. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01cpufreq: Add per policy governor-init/exit infrastructureViresh Kumar
Currently, there can't be multiple instances of single governor_type. If we have a multi-package system, where we have multiple instances of struct policy (per package), we can't have multiple instances of same governor. i.e. We can't have multiple instances of ondemand governor for multiple packages. Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/ governor-name/. Which again reflects that there can be only one instance of a governor_type in the system. This is a bottleneck for multicluster system, where we want different packages to use same governor type, but with different tunables. This patch is inclined towards providing this infrastructure. Because we are required to allocate governor's resources dynamically now, we must do it at policy creation and end. And so got CPUFREQ_GOV_POLICY_INIT/EXIT. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>