summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2013-09-10cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writesSrivatsa S. Bhat
Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary and partial solution to the race condition between writing to a cpufreq sysfs file and taking a CPU offline. Now that we have a proper and complete solution to that problem, remove the temporary fix. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplugSrivatsa S. Bhat
The functions that are used to write to cpufreq sysfs files (such as store_scaling_max_freq()) are not hotplug safe. They can race with CPU hotplug tasks and lead to problems such as trying to acquire an already destroyed timer-mutex etc. Eg: __cpufreq_remove_dev() __cpufreq_governor(policy, CPUFREQ_GOV_STOP); policy->governor->governor(policy, CPUFREQ_GOV_STOP); cpufreq_governor_dbs() case CPUFREQ_GOV_STOP: mutex_destroy(&cpu_cdbs->timer_mutex) cpu_cdbs->cur_policy = NULL; <PREEMPT> store() __cpufreq_set_policy() __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); policy->governor->governor(policy, CPUFREQ_GOV_LIMITS); case CPUFREQ_GOV_LIMITS: mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex) if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL So use get_online_cpus()/put_online_cpus() in the store_*() functions, to synchronize with CPU hotplug. However, there is an additional point to note here: some parts of the CPU teardown in the cpufreq subsystem are done in the CPU_POST_DEAD stage, with cpu_hotplug.lock *released*. So, using the get/put_online_cpus() functions alone is insufficient; we should also ensure that we don't race with those latter steps in the hotplug sequence. We can easily achieve this by checking if the CPU is online before proceeding with the store, since the CPU would have been marked offline by the time the CPU_POST_DEAD notifiers are executed. Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lockSrivatsa S. Bhat
__cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going offline. But because we destroy the kobject towards the end of the CPU offline phase, there are certain race windows where a task can try to write to a cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking that CPU offline, and this can bump up the kobject refcount, which in turn might hinder the CPU offline task from running to completion. (It can also cause other more serious problems such as trying to acquire a destroyed timer-mutex etc., depending on the exact stage of the cleanup at which the task managed to take a new refcount). To fix the race window, we will need to synchronize those store_*() call-sites with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that in turn can cause a total deadlock because it can end up waiting for the CPU offline task to complete, with incremented refcount! Write to sysfs CPU offline task -------------- ---------------- kobj_refcnt++ Acquire cpu_hotplug.lock get_online_cpus(); Wait for kobj_refcnt to drop to zero **DEADLOCK** A simple way to avoid this problem is to perform the kobject cleanup in the CPU offline path, with the cpu_hotplug.lock *released*. That is, we can perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock released. Doing this helps us avoid deadlocks due to holding kobject refcounts and waiting on each other on the cpu_hotplug.lock. (Note: We can't move all of the cpufreq CPU offline steps to the CPU_POST_DEAD stage, because certain things such as stopping the governors have to be done before the outgoing CPU is marked offline. So retain those parts in the CPU_DOWN_PREPARE stage itself). Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: Split __cpufreq_remove_dev() into two partsSrivatsa S. Bhat
During CPU offline, the cpufreq core invokes __cpufreq_remove_dev() to perform work such as stopping the cpufreq governor, clearing the CPU from the policy structure etc, and finally cleaning up the kobject. There are certain subtle issues related to the kobject cleanup, and it would be much easier to deal with them if we separate that part from the rest of the cleanup-work in the CPU offline phase. So split the __cpufreq_remove_dev() function into 2 parts: one that handles the kobject cleanup, and the other that handles the rest of the work. Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: Fix wrong time unit conversionAndreas Schwab
The time spent by a CPU under a given frequency is stored in jiffies unit in the cpu var cpufreq_stats_table->time_in_state[i], i being the index of the frequency. This is what is displayed in the following file on the right column: cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state 2301000 19835820 2300000 3172 [...] Now cpufreq converts this jiffies unit delta to clock_t before returning it to the user as in the above file. And that conversion is achieved using the API cputime64_to_clock_t(). Although it accidentally works on traditional tick based cputime accounting, where cputime_t maps directly to jiffies, it doesn't work with other types of cputime accounting such as CONFIG_VIRT_CPU_ACCOUNTING_* where cputime_t can map to nsecs or any granularity preffered by the architecture. For example we get a buggy zero delta on full dyntick configurations: cat /sys/devices/system/cpu/cpuX/cpufreq/stats/time_in_state 2301000 0 2300000 0 [...] Fix this with using the proper jiffies_64_t to clock_t conversion. Reported-and-tested-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: serialize calls to __cpufreq_governor()Viresh Kumar
We can't take a big lock around __cpufreq_governor() as this causes recursive locking for some cases. But calls to this routine must be serialized for every policy. Otherwise we can see some unpredictable events. For example, consider following scenario: __cpufreq_remove_dev() __cpufreq_governor(policy, CPUFREQ_GOV_STOP); policy->governor->governor(policy, CPUFREQ_GOV_STOP); cpufreq_governor_dbs() case CPUFREQ_GOV_STOP: mutex_destroy(&cpu_cdbs->timer_mutex) cpu_cdbs->cur_policy = NULL; <PREEMPT> store() __cpufreq_set_policy() __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS); policy->governor->governor(policy, CPUFREQ_GOV_LIMITS); case CPUFREQ_GOV_LIMITS: mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex) if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL And so store() will eventually result in a crash if cur_policy is NULL at this point. Introduce an additional variable which would guarantee serialization here. Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10cpufreq: don't allow governor limits to be changed when it is disabledViresh Kumar
__cpufreq_governor() returns with -EBUSY when governor is already stopped and we try to stop it again, but when it is stopped we must not allow calls to CPUFREQ_GOV_LIMITS event as well. This patch adds this check in __cpufreq_governor(). Reported-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-09ARM: vexpress: allow dcscb and tc2_pm in a combined ARMv6+v7 buildNicolas Pitre
This fixes the following build error: /tmp/cce439dZ.s: Assembler messages: /tmp/cce439dZ.s:506: Error: selected processor does not support ARM mode `isb ' /tmp/cce439dZ.s:512: Error: selected processor does not support ARM mode `isb ' /tmp/cce439dZ.s:513: Error: selected processor does not support ARM mode `dsb ' /tmp/cce439dZ.s:583: Error: selected processor does not support ARM mode `isb ' /tmp/cce439dZ.s:589: Error: selected processor does not support ARM mode `isb ' /tmp/cce439dZ.s:590: Error: selected processor does not support ARM mode `dsb ' Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Nicolas Pitre <nico@linaro.org> Tested-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Olof Johansson <olof@lixom.net>
2013-09-09Merge branch 'versatile/fixes' into fixesOlof Johansson
From Peter Maydell: These patches fix a number of issues with the PCI controller code for mach-versatile: (1) The irq mapping matched neither hardware nor QEMU; we correct it to match the hardware, which means it will also work on recent (1.5 or later) QEMU. (2) The code was confused between the PCI I/O window (at 0x43000000) and the first PCI memory window (at 0x44000000), which meant that PCI devices using PCI PIO rather than MMIO didn't work. This is fixed (and some variables/labels are renamed to avoid further confusion in future). (3) The SMAP register offsets were all off-by-four, though by fluke this didn't actually have any ill effects. All these changes have been tested on real hardware (PB926 plus the PCI backplane), as well as on QEMU. I have confirmed that IRQs and PCI PIO and MMIO work OK. PCI bus-master DMA doesn't seem to work on h/w -- as far as I can tell the device is correctly managing to DMA to the right places in memory, but every other 32 bit word is corrupt (at least judging from rtl8139 debug dumps of the frames it's receiving). I'm not sure what's going on here, but since this is disjoint from the irq and I/O issues I don't think that applying the patches that fix those should be stalled on trying to debug DMA problems. (DMA works fine on QEMU, incidentally.) * versatile/fixes: ARM: PCI: versatile: Fix SMAP register offsets ARM: PCI: versatile: Fix PCI I/O ARM: PCI: versatile: Fix map_irq function to match hardware Signed-off-by: Olof Johansson <olof@lixom.net>
2013-09-09ARM: shmobile: lager: Do not use register_type field of struct sh_eth_plat_dataSimon Horman
As of 8d3214c ("sh_eth: remove 'register_type' field from 'struct sh_eth_plat_data'") is is no longer necessary or correct to use the 'register_type' field from 'struct sh_eth_plat_data' and doing so results in a build error. Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Olof Johansson <olof@lixom.net>
2013-09-09Merge tag 'renesas-fixes3-for-v3.12' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes From Simon Horman: Third Round of Renesas ARM based SoC fixes for v3.12 * Update early timer initialisation order of r8a7779 SoC This resolves a regression introduced by a894fcc2d01a89e6fe3da0845a4d80a5312e1124 ("ARM: smp_twd: Divorce smp_twd from local timer API"). This problem was introduced in v3.10-rc2. * tag 'renesas-fixes3-for-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: ARM: shmobile: r8a7779: Update early timer initialisation order Signed-off-by: Olof Johansson <olof@lixom.net>
2013-09-09ARM: pxa: ssp: Check return values from phandle lookupsOlof Johansson
Commit a6e56c28a178cef5f (ARM: pxa: ssp: add DT bindings) causes warnings when built: arch/arm/plat-pxa/ssp.c: In function 'pxa_ssp_probe': arch/arm/plat-pxa/ssp.c:145:17: warning: 'dma_spec.args[0]' may be used uninitialized in this function [-Wmaybe-uninitialized] Resolve by checking return values and aborting when lookups fail. Cc: Daniel Mack <zonque@gmail.com> Cc: Mark Brown <broonie@linaro.org> Cc: Haojian Zhuang <haojian.zhuang@gmail.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2013-09-09dmaengine: dma_sync_wait and dma_find_channel undefinedJon Mason
dma_sync_wait and dma_find_channel are declared regardless of whether CONFIG_DMA_ENGINE is enabled, but calling the function without CONFIG_DMA_ENGINE enabled results "undefined reference" errors. To get around this, declare dma_sync_wait and dma_find_channel as inline functions if CONFIG_DMA_ENGINE is undefined. Signed-off-by: Jon Mason <jon.mason@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2013-09-09Merge tag 'late-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC late changes from Kevin Hilman: "These are changes that arrived a little late before the merge window, or had dependencies on previous branches. Highlights: - ux500: misc. cleanup, fixup I2C devices - exynos: DT updates for RTC; PM updates - at91: DT updates for NAND; new platforms added to generic defconfig - sunxi: DT updates: cubieboard2, pinctrl driver, gated clocks - highbank: LPAE fixes, select necessary ARM errata - omap: PM fixes and improvements; OMAP5 mailbox support - omap: basic support for new DRA7xx SoCs" * tag 'late-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (60 commits) ARM: dts: vexpress: Add CCI node to TC2 device-tree ARM: EXYNOS: Skip C1 cpuidle state for exynos5440 ARM: EXYNOS: always enable PM domains support for EXYNOS4X12 ARM: highbank: clean-up some unused includes ARM: sun7i: Enable the A20 clocks in the DTSI ARM: sun6i: Enable clock support in the DTSI ARM: sun5i: dt: Use the A10s gates in the DTSI ARM: at91: at91_dt_defconfig: enable rm9200 support ARM: dts: add ADC device tree node for exynos5420/5250 ARM: dts: Add RTC DT node to Exynos5420 SoC ARM: dts: Update the "status" property of RTC DT node for Exynos5250 SoC ARM: dts: Fix the RTC DT node name for Exynos5250 irqchip: mmp: avoid to include irqs head file ARM: mmp: avoid to include head file in mach-mmp irqchip: mmp: support irqchip irqchip: move mmp irq driver ARM: OMAP: AM33xx: clock: Add RNG clock data ARM: OMAP: TI81XX: add always-on powerdomain for TI81XX ARM: OMAP4: clock: Lock PLLs in the right sequence ARM: OMAP: AM33XX: hwmod: Add hwmod data for debugSS ...
2013-09-09Merge tag 'renesas-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM Renesas SoC cleanup, refactoring and more SMP support from Kevin Hilman: "Lots of cleanup and refactoring and some SMP additions for Renesas platforms. Due to some inter-dependencies with other arm-soc branches, this Renesas stuff was separated out for sending after the other branches were merged. Highlights: - remove unused board support and cleanup of unused headers - refactoring of init and device registration - simplify IRQ initialization" * tag 'renesas-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (68 commits) ARM: shmobile: Per-CPU SMP boot / sleep code for SCU SoCs ARM: shmobile: Introduce per-CPU SMP boot / sleep code ARM: shmobile: Use shared SCU CPU Hotplug code on r8a7779 ARM: shmobile: Use shared SCU CPU Hotplug code on sh73a0 ARM: shmobile: Add shared SCU CPU Hotplug code ARM: shmobile: Use shared SCU SMP boot code on emev2 ARM: shmobile: Use shared SCU SMP boot code on r8a7779 ARM: shmobile: Use shared SCU SMP boot code on sh73a0 ARM: shmobile: Introduce shared SCU SMP boot code ARM: shmobile: sh73a0: Remove global GPIO_NR definition ARM: shmobile: kzm9d: remove nfsroot settings from bootargs ARM: shmobile: armadillo800eva: remove nfsroot settings from bootargs ARM: shmobile: r8a7779: move r8a7779_init_irq_xxx() to setup ARM: shmobile: r8a7740: move r8a7740_init_irq_of() to setup ARM: shmobile: bockw: add missing __initdata ARM: shmobile: r8a7790: add missing __initdata ARM: shmobile: r8a7779: add missing __initdata ARM: shmobile: Remove unused shmobile_init_time() ARM: shmobile: Use clocksource_of_init() on r8a7790 ARM: shmobile: Use default ->init_time() on KZM9G DT ref ...
2013-09-09Merge tag 'drivers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC driver update from Kevin Hilman: "This contains the ARM SoC related driver updates for v3.12. The only thing this cycle are core PM updates and CPUidle support for ARM's TC2 big.LITTLE development platform" * tag 'drivers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: cpuidle: big.LITTLE: vexpress-TC2 CPU idle driver ARM: vexpress: tc2: disable GIC CPU IF in tc2_pm_suspend drivers: irq-chip: irq-gic: introduce gic_cpu_if_down()
2013-09-09Merge tag 'clk-for-linus-3.12' of git://git.linaro.org/people/mturquette/linuxLinus Torvalds
Pull clock framework changes from Michael Turquette: "The common clk framework changes for 3.12 are dominated by clock driver patches, both new drivers and fixes to existing. A high percentage of these are for Samsung platforms like Exynos. Core framework fixes and some new features like automagical clock re-parenting round out the patches" * tag 'clk-for-linus-3.12' of git://git.linaro.org/people/mturquette/linux: (102 commits) clk: only call get_parent if there is one clk: samsung: exynos5250: Simplify registration of PLL rate tables clk: samsung: exynos4: Register PLL rate tables for Exynos4x12 clk: samsung: exynos4: Register PLL rate tables for Exynos4210 clk: samsung: exynos4: Reorder registration of mout_vpllsrc clk: samsung: pll: Add support for rate configuration of PLL46xx clk: samsung: pll: Use new registration method for PLL46xx clk: samsung: pll: Add support for rate configuration of PLL45xx clk: samsung: pll: Use new registration method for PLL45xx clk: samsung: exynos4: Rename exynos4_plls to exynos4x12_plls clk: samsung: exynos4: Remove checks for DT node clk: samsung: exynos4: Remove unused static clkdev aliases clk: samsung: Modify _get_rate() helper to use __clk_lookup() clk: samsung: exynos4: Use separate aliases for cpufreq related clocks clocksource: samsung_pwm_timer: Get clock from device tree ARM: dts: exynos4: Specify PWM clocks in PWM node pwm: samsung: Update DT bindings documentation to cover clocks clk: Move symbol export to proper location clk: fix new_parent dereference before null check clk: wm831x: Initialise wm831x pointer on init ...
2013-09-09xfs: check magic numbers in dir3 leaf verifier firstDave Chinner
Calling xfs_dir3_leaf_hdr_from_disk() in a verifier before validating the magic numbers in the buffer results in ASSERT failures due to mismatching magic numbers when a corruption occurs. Seeing as the verifier is supposed to catch the corruption and pass it back to the caller, having the verifier assert fail on error defeats the purpose of detecting the errors in the first place. Check the magic numbers direct from the buffer before decoding the header. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-09xfs: fix some minor sparse warningsDave Chinner
A couple of simple locking annotations and 0 vs NULL warnings. Nothing that changes any code behaviour, just removes build noise. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-09xfs: fix endian warning in xlog_recover_get_buf_lsn()Dave Chinner
sparse reports: fs/xfs/xfs_log_recover.c:2017:24: sparse: cast to restricted __be64 Because I used the wrong structure for the on-disk superblock cast in 50d5c8d ("xfs: check LSN ordering for v5 superblocks during recovery"). Fix it. Reported-by: kbuild test robot Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-09-09Merge tag 'trace-3.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Not much changes for the 3.12 merge window. The major tracing changes are still in flux, and will have to wait for 3.13. The changes for 3.12 are mostly clean ups and minor fixes. H Peter Anvin added a check to x86_32 static function tracing that helps a small segment of the kernel community. Oleg Nesterov had a few changes from 3.11, but were mostly clean ups and not worth pushing in the -rc time frame. Li Zefan had small clean up with annotating a raw_init with __init. I fixed a slight race in updating function callbacks, but the race is so small and the bug that happens when it occurs is so minor it's not even worth pushing to stable. The only real enhancement is from Alexander Z Lam that made the tracing_cpumask work for trace buffer instances, instead of them all sharing a global cpumask" * tag 'trace-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/rcu: Do not trace debug_lockdep_rcu_enabled() x86-32, ftrace: Fix static ftrace when early microcode is enabled ftrace: Fix a slight race in modifying what function callback gets traced tracing: Make tracing_cpumask available for all instances tracing: Kill the !CONFIG_MODULES code in trace_events.c tracing: Don't pass file_operations array to event_create_dir() tracing: Kill trace_create_file_ops() and friends tracing/syscalls: Annotate raw_init function with __init
2013-09-09target: Add MAXIMUM COMPARE AND WRITE LENGTH in Block Limits VPDNicholas Bellinger
This patch adds the MAXIMUM COMPARE AND WRITE LENGTH bit, currently hardcoded to a single logical block (NoLB=1) within the Block Limits VPD in spc_emulate_evpd_b0(). Also add emulate_caw device attribute in configfs (enabled by default) to allow the exposure of this bit to be disabled, if necessary. Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Martin Petersen <martin.petersen@oracle.com> Cc: Chris Mason <chris.mason@fusionio.com> Cc: James Bottomley <JBottomley@Parallels.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09target: Make __target_execute_cmd() available as externNicholas Bellinger
Required by COMPARE_AND_WRITE for write instance user-data submission, in order to bypass target_execute_cmd() checks. Reported-by: Christoph Hellwig <hch@lst.de> Cc: Roland Dreier <roland@purestorage.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09target: Add transport_reset_sgl_orig() for COMPARE_AND_WRITENicholas Bellinger
After COMPARE_AND_WRITE completes it's comparision, the WRITE payload SGLs head expect to be updated to point from the verify instance of user data, to the write instance of user data. So for this special case, add transport_reset_sgl_orig() usage within transport_free_pages() and add se_cmd->t_data_[sg,nents]_orig members to save the original assignments. Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Martin Petersen <martin.petersen@oracle.com> Cc: Chris Mason <chris.mason@fusionio.com> Cc: James Bottomley <JBottomley@Parallels.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09target: Do memory allocation for bidi commands using target_alloc_sglNicholas Bellinger
This patch updates transport_generic_new_cmd() to call target_alloc_sgl() for SGL + page memory allocation for se_cmd->t_bidi_data_sg. It also adds the special case for SCF_COMPARE_AND_WRITE to calculate a different bidi_length based upon se_cmd->t_task_nolb. Reported-by: Christoph Hellwig <hch@lst.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Martin Petersen <martin.petersen@oracle.com> Cc: Chris Mason <chris.mason@fusionio.com> Cc: James Bottomley <JBottomley@Parallels.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09target: Refactor transport_generic_get_mem to target_alloc_sglNicholas Bellinger
This patch refactors transport_generic_get_mem() to target_alloc_sgl() for accepting **sgl, *nents, length and zero_page as function parameters in order to be used for both se_cmd->t_data_sg + se_cmd->t_bidi_data_sg allocations. Reported-by: Christoph Hellwig <hch@lst.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Martin Petersen <martin.petersen@oracle.com> Cc: Chris Mason <chris.mason@fusionio.com> Cc: James Bottomley <JBottomley@Parallels.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09target: Convert se_cmd->t_bidi_data_sg checks to use SCF_BIDINicholas Bellinger
Stop keying off se_cmd->t_bidi_data_sg within transport_complete_qf() + target_complete_ok_work(), and just use SCF_BIDI instead. Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Martin Petersen <martin.petersen@oracle.com> Cc: Chris Mason <chris.mason@fusionio.com> Cc: James Bottomley <JBottomley@Parallels.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09target: Allow sbc_ops->execute_rw() to accept SGLs + data_directionNicholas Bellinger
COMPARE_AND_WRITE expects to be able to send down a DMA_FROM_DEVICE to obtain the necessary READ payload for comparision against the first half of the WRITE payload containing the verify user data. Currently virtual backends expect to internally reference SGLs, SGL nents, and data_direction, so change IBLOCK, FILEIO and RD sbc_ops->execute_rw() to accept this values as function parameters. Also add default sbc_execute_rw() handler for the typical case for cmd->execute_rw() submission using cmd->t_data_sg, cmd->t_data_nents, and cmd->data_direction). v2 Changes: - Add SCF_COMPARE_AND_WRITE command flag - Use sbc_execute_rw() for normal cmd->execute_rw() submission with expected se_cmd members. Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Martin Petersen <martin.petersen@oracle.com> Cc: Chris Mason <chris.mason@fusionio.com> Cc: James Bottomley <JBottomley@Parallels.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09target: Add TCM_MISCOMPARE_VERIFY sense handlingNicholas Bellinger
This patch adds TCM_MISCOMPARE_VERIFY (ASC=0x1d, ASCQ=0x00) sense handling to transport_send_check_condition_and_sense(), which is required for a COMPARE_AND_WRITE comparision failure. Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Martin Petersen <martin.petersen@oracle.com> Cc: Chris Mason <chris.mason@fusionio.com> Cc: James Bottomley <JBottomley@Parallels.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09target: Add return for se_cmd->transport_complete_callbackNicholas Bellinger
This patch adds a sense_reason_t return to ->transport_complete_callback(), and updates target_complete_ok_work() to invoke the call if necessary to transport_send_check_condition_and_sense() during the failure case. Also update xdreadwrite_callback() to use this return value. Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Martin Petersen <martin.petersen@oracle.com> Cc: Chris Mason <chris.mason@fusionio.com> Cc: James Bottomley <JBottomley@Parallels.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09scsi: Add CDB definition for COMPARE_AND_WRITENicholas Bellinger
Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Martin Petersen <martin.petersen@oracle.com> Cc: Chris Mason <chris.mason@fusionio.com> Cc: James Bottomley <JBottomley@Parallels.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09target/pscsi: remove an unneeded checkDan Carpenter
blk_get_request() just returns NULL on error, it doesn't return an ERR_PTR. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09iscsi-target: Convert to per-cpu ida_alloc + ida_free command mapNicholas Bellinger
This patch changes iscsi-target to use transport_alloc_session_tags() pre-allocation logic for per-cpu session tag pooling with internal ida_alloc() + ida_free() calls based upon the saved se_cmd->map_tag id. This includes tag pool setup based upon per NodeACL queue_depth after locating se_node_acl in iscsi_target_locate_portal(). Also update iscsit_allocate_cmd() and iscsit_release_cmd() to use percpu_ida_alloc() and percpu_ida_free() respectively. v5 changes; - Convert to percpu_ida.h include v2 changes: - Fix bug with SessionType=Discovery in iscsi_target_locate_portal() Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09iscsi/iser-target: Convert to command priv_size usageNicholas Bellinger
This command converts iscsi/isert-target to use allocations based on iscsit_transport->priv_size within iscsit_allocate_cmd(), instead of using an embedded isert_cmd->iscsi_cmd. This includes removing iscsit_transport->alloc_cmd() usage, along with updating isert-target code to use iscsit_priv_cmd(). Also, remove left-over iscsit_transport->release_cmd() usage for direct calls to iscsit_release_cmd(), and drop the now unused lio_cmd_cache and isert_cmd_cache. Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09vhost/scsi: Add pre-allocation for tv_cmd SGL + upages memoryNicholas Bellinger
This patch adds support for pre-allocation of per tv_cmd descriptor scatterlist + user-space page pointer memory using se_sess->sess_cmd_map within tcm_vhost_make_nexus() code. This includes sanity checks within vhost_scsi_map_to_sgl() to reject I/O that exceeds these initial hardcoded values, and the necessary cleanup in tcm_vhost_make_nexus() failure path + tcm_vhost_drop_nexus(). v3 changes: - Rebase to v3.11-rc5 code Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Asias He <asias@redhat.com> Cc: Kent Overstreet <kmo@daterainc.com> Reviewed-by: Asias He <asias@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09vhost/scsi: Convert to per-cpu ida_alloc + ida_free command mapNicholas Bellinger
This patch changes vhost/scsi to use transport_init_session_tags() pre-allocation logic for per-cpu session tag pooling with internal ida_alloc() + ida_free() calls based upon the saved se_cmd->map_tag id. FIXME: Make transport_init_session_tags() number of tags setup configurable per vring client setting via configfs v5 changes: - Convert to percpu_ida.h include v3 changes: - Update to percpu-ida usage - Rebase to v3.11-rc5 code Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Asias He <asias@redhat.com> Cc: Kent Overstreet <kmo@daterainc.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09target: Add transport_init_session_tags using per-cpu idaNicholas Bellinger
This patch adds lib/idr.c based transport_init_session_tags() logic that allows fabric drivers to setup a per-cpu se_sess->sess_tag_pool and associated se_sess->sess_cmd_map for basic tagged pre-allocation of fabric descriptor sized memory. v5 changes: - Convert to percpu_ida.h include v4 changes: - Add transport_alloc_session_tags() for fabrics that need early transport_init_session() v3 changes: - Update to percpu-ida usage Cc: Kent Overstreet <kmo@daterainc.com> Cc: Asias He <asias@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Asias He <asias@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09idr: Percpu idaKent Overstreet
Percpu frontend for allocating ids. With percpu allocation (that works), it's impossible to guarantee it will always be possible to allocate all nr_tags - typically, some will be stuck on a remote percpu freelist where the current job can't get to them. We do guarantee that it will always be possible to allocate at least (nr_tags / 2) tags - this is done by keeping track of which and how many cpus have tags on their percpu freelists. On allocation failure if enough cpus have tags that there could potentially be (nr_tags / 2) tags stuck on remote percpu freelists, we then pick a remote cpu at random to steal from. Note that there's no cpu hotplug notifier - we don't care, because steal_tags() will eventually get the down cpu's tags. We _could_ satisfy more allocations if we had a notifier - but we'll still meet our guarantees and it's absolutely not a correctness issue, so I don't think it's worth the extra code. From akpm: "It looks OK to me (that's as close as I get to an ack :)) v6 changes: - Add #include <linux/cpumask.h> to include/linux/percpu_ida.h to make alpha/arc builds happy (Fengguang) - Move second (cpu >= nr_cpu_ids) check inside of first check scope in steal_tags() (akpm + nab) v5 changes: - Change percpu_ida->cpus_have_tags to cpumask_t (kmo + akpm) - Add comment for percpu_ida_cpu->lock + ->nr_free (kmo + akpm) - Convert steal_tags() to use cpumask_weight() + cpumask_next() + cpumask_first() + cpumask_clear_cpu() (kmo + akpm) - Add comment for alloc_global_tags() (kmo + akpm) - Convert percpu_ida_alloc() to use cpumask_set_cpu() (kmo + akpm) - Convert percpu_ida_free() to use cpumask_set_cpu() (kmo + akpm) - Drop percpu_ida->cpus_have_tags allocation in percpu_ida_init() (kmo + akpm) - Drop percpu_ida->cpus_have_tags kfree in percpu_ida_destroy() (kmo + akpm) - Add comment for percpu_ida_alloc @ gfp (kmo + akpm) - Move to percpu_ida.c + percpu_ida.h (kmo + akpm + nab) v4 changes: - Fix tags.c reference in percpu_ida_init (akpm) Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09iser-target: Updates for login negotiation multi-plexing supportNicholas Bellinger
This patch updates iser-target code to support login negotiation multi-plexing. This includes only using isert_conn->conn_login_comp for the first login request PDU, pushing the subsequent processing to iscsi_conn->login_work -> iscsi_target_do_login_rx(), and turning isert_get_login_rx() into a NOP. v3 changes: - Drop unnecessary LOGIN_FLAGS_READ_ACTIVE bit set in isert_rx_login_req() Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09iscsi-target: Remove left-over iscsi_target_do_login_ioNicholas Bellinger
There is no need for iscsi_target_do_login_io() anymore in modern code, so go ahead and call iscsi_target_do_tx_login_io() directly within iscsi_target_do_login(). Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09iscsi-target: Add sk->sk_state_change to cleanup after TCP failureNicholas Bellinger
This patch adds a sock->sk_state_change() -> iscsi_target_sk_state_change() callback in order to handle transient TCP failures during the login process, where sock->sk_data_ready() -> iscsi_target_sk_data_ready() may not be called to release connection resources, and relinquish tpg->np_login_lock via iscsit_deaccess_np() It performs the sk->sk_state check using iscsi_target_sk_state_check() to look for TCP_CLOSE_WAIT + TCP_CLOSE, and invokes schedule_delayed_work() -> iscsi_target_do_cleanup() to perform the remaining cleanup from process context. It adds an explicit sk_state_check to iscsi_target_do_login() in order to determine a state failure when iscsi_target_sk_state_change() may not be able to proceed before LOGIN_FLAGS_READY=1 is set. Also use sk->sk_sndtimeo -> sk->sk_rcvtimeo settings during login to iscsi_target_set_sock_callbacks(), and revert back post login to use MAX_SCHEDULE_TIMEOUT in iscsi_target_restore_sock_callbacks(). Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09iscsi-target: Add login negotiation multi-plexing supportNicholas Bellinger
This patch adds support for login negotiation multi-plexing in iscsi-target code. This involves handling the first login request PDU + payload and login response PDU + payload within __iscsi_target_login_thread() process context, and then changing struct sock->sk_data_ready() so that all subsequent exchanges are handled by workqueue process context, to allow other incoming login requests to be received in parallel by __iscsi_target_login_thread(). Upon login negotiation completion (or failure), ->sk_data_ready() is replaced with the original kernel sockets handler saved in iscsi_conn->orig_data_ready. v3 changes: - Convert iscsi_target_sk_data_ready() lock access to write[lock,unlock]_bh() - Only clear LOGIN_FLAGS_READ_ACTIVE when iscsi_target_do_login() returns zero - Add LOGIN_FLAGS_READY + LOGIN_FLAGS_CLOSED bit checks to iscsi_target_sk_data_ready() - Make INIT_DELAYED_WORK() + iscsi_target_set_sock_callbacks() setup happen earlier by moving from iscsi_target_start_negotiation() into iscsi_target_locate_portal() - Set LOGIN_FLAGS_READY bit in iscsi_target_start_negotiation() after iscsi_target_do_login() returns zero. v2 changes: - Add login_timer in iscsi_target_do_login_rx() to avoid possible endless sleep with MSG_WAITALL for traditional iscsi-target in certain network configurations. - Convert lprintk() -> pr_debug() - Remove forward declarations of iscsi_target_set_sock_callbacks(), iscsi_target_restore_sock_callbacks() and iscsi_target_sk_data_ready() - Make iscsi_target_set_sock_callbacks + iscsi_target_restore_sock_callbacks() static (Fengguang) - Make iscsi_target_do_login_rx() safe for iser-target w/o conn->sock Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09clk: only call get_parent if there is oneAlex Elder
In __clk_init(), after a clock is mostly initialized, a scan is done of the orphan clocks to see if the clock being registered is the parent of any of them. This code assumes that any clock that provides a get_parent method actually has at least one parent, and that's not a valid assumption. As a result, an orphan clock with no parent can return *something* as the parent index, and that value is blindly used to dereference the orphan's parent_names[] array (which will be ZERO_SIZE_PTR or NULL). Fix this by ensuring get_parent is only called for orphans with at least one parent. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Mike Turquette <mturquette@linaro.org>
2013-09-09ACPI / bind: Prefer device objects with _STA to those without itRafael J. Wysocki
As reported at https://bugzilla.kernel.org/show_bug.cgi?id=60829, there still are cases in which do_find_child() doesn't choose the ACPI device object it is "expected" to choose if there are more such objects matching one PCI device present. This particular problem may be worked around by making do_find_child() return device obejcts witn _STA whose result indicates that the device is enabled before device objects without _STA if there's more than one device object to choose from. This change doesn't affect the case in which there's only one matching ACPI device object per PCI device. References: https://bugzilla.kernel.org/show_bug.cgi?id=60829 Reported-by: Peter Wu <lekensteyn@gmail.com> Tested-by: Felix Lisczyk <felix.lisczyk@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-09bnx2x: Fix configuration of doorbell blockAriel Elior
As part of VF RSS feature doorbell block was configured not to use dpm, but a small part of configuration was left out, preventing the driver from sending tx messages to the device. This patch adds the missing configuration. Reported-by: Eric Dumazet <eric.dumazet@gmil.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Tested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-09iscsi-target: Prepare login code for multi-plexing supportNicholas Bellinger
This patch prepares the iscsi-target login code for multi-plexing support. This includes: - Adding iscsi_tpg_np->tpg_np_kref + iscsit_login_kref_put() for handling callback of iscsi_tpg_np->tpg_np_comp - Adding kref_put() in iscsit_deaccess_np() - Adding kref_put() and wait_for_completion() in iscsit_reset_np_thread() - Refactor login failure path release logic into iscsi_target_login_sess_out() - Update __iscsi_target_login_thread() to handle iscsi_post_login_handler() asynchronous completion - Add shutdown parameter for iscsit_clear_tpg_np_login_thread*() v3 changes: - Convert iscsi_portal_group->np_login_lock to ->np_login_sem - Add LOGIN_FLAGS definitions v2 changes: - Remove duplicate call to iscsi_post_login_handler() in __iscsi_target_login_thread() - Drop unused iscsi_np->np_login_tpg Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09ACPI / hotplug / PCI: Avoid parent bus rescans on spurious device checksRafael J. Wysocki
In the current ACPIPHP notify handler we always go directly for a rescan of the parent bus if we get a device check notification for a device that is not a bridge. However, this obviously is overzealous if nothing really changes, because this way we may rescan the whole PCI hierarchy pretty much in vain. That happens on Alex Williamson's machine whose ACPI tables contain device objects that are supposed to coresspond to PCIe root ports, but those ports aren't physically present (or at least they aren't visible in the PCI config space to us). The BIOS generates multiple device check notifies for those objects during boot and for each of them we go straight for the parent bus rescan, but the parent bus is the root bus in this particular case. In consequence, we rescan the whole PCI bus from the top several times in a row, which is completely unnecessary, increases boot time by 50% (after previous fixes) and generates excess dmesg output from the PCI subsystem. Fix the problem by checking if we can find anything new in the slot corresponding to the device we've got a device check notify for and doing nothig if that's not the case. The spec (ACPI 5.0, Section 5.6.6) appears to mandate this behavior, as it says: Device Check. Used to notify OSPM that the device either appeared or disappeared. If the device has appeared, OSPM will re-enumerate from the parent. If the device has disappeared, OSPM will invalidate the state of the device. OSPM may optimize out re-enumeration. Therefore, according to the spec, we are free to do nothing if nothing changes. References: https://bugzilla.kernel.org/show_bug.cgi?id=60865 Reported-and-tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-09split read_seqretry_or_unlock(), convert d_walk() to resulting primitivesAl Viro
Separate "check if we need to retry" from "unlock if we are done and had seq_writelock"; that allows to use these guys in d_walk(), where we need to recheck every time we ascend back to parent, but do *not* want to unlock until the very end. Lift rcu_read_lock/rcu_read_unlock out into callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-09perf kvm: Fix sample_type manipulationAdrian Hunter
Manipulating the sample_type of an evsel requires the use of: perf_evsel__set_sample_bit() and perf_evsel__reset_sample_bit() Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: David Ahern <dsahern@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1378496412-2424-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-09-09perf evlist: Fix id pos in perf_evlist__open()Adrian Hunter
Ensure the id_pos is correct when perf_evlist__open() is used. This fixes a problem introduced in 7556257 that broke 'perf kvm stat live' in that this tool wasn't updated to use the sample_type bits setting helpers. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: David Ahern <dsahern@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1378496412-2424-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>