summaryrefslogtreecommitdiffstats
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2014-03-29Merge branch 'kvm-ppchv-next' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-next
2014-03-29KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8Paul Mackerras
Currently we save the host PMU configuration, counter values, etc., when entering a guest, and restore it on return from the guest. (We have to do this because the guest has control of the PMU while it is executing.) However, we missed saving/restoring the SIAR and SDAR registers, as well as the registers which are new on POWER8, namely SIER and MMCR2. This adds code to save the values of these registers when entering the guest and restore them on exit. This also works around the bug in POWER8 where setting PMAE with a counter already negative doesn't generate an interrupt. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-29KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offsetPaul Mackerras
Commit c7699822bc21 ("KVM: PPC: Book3S HV: Make physical thread 0 do the MMU switching") reordered the guest entry/exit code so that most of the guest register save/restore code happened in guest MMU context. A side effect of that is that the timebase still contains the guest timebase value at the point where we compute and use vcpu->arch.dec_expires, and therefore that is now a guest timebase value rather than a host timebase value. That in turn means that the timeouts computed in kvmppc_set_timer() are wrong if the timebase offset for the guest is non-zero. The consequence of that is things such as "sleep 1" in a guest after migration may sleep for much longer than they should. This fixes the problem by converting between guest and host timebase values as necessary, by adding or subtracting the timebase offset. This also fixes an incorrect comment. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-29KVM: PPC: Book3S HV: Don't use kvm_memslots() in real modePaul Mackerras
With HV KVM, some high-frequency hypercalls such as H_ENTER are handled in real mode, and need to access the memslots array for the guest. Accessing the memslots array is safe, because we hold the SRCU read lock for the whole time that a guest vcpu is running. However, the checks that kvm_memslots() does when lockdep is enabled are potentially unsafe in real mode, when only the linear mapping is available. Furthermore, kvm_memslots() can be called from a secondary CPU thread, which is an offline CPU from the point of view of the host kernel, and is not running the task which holds the SRCU read lock. To avoid false positives in the checks in kvm_memslots(), and to avoid possible side effects from doing the checks in real mode, this replaces kvm_memslots() with kvm_memslots_raw() in all the places that execute in real mode. kvm_memslots_raw() is a new function that is like kvm_memslots() but uses rcu_dereference_raw_notrace() instead of kvm_dereference_check(). Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-29KVM: PPC: Book3S HV: Return ENODEV error rather than EIOPaul Mackerras
If an attempt is made to load the kvm-hv module on a machine which doesn't have hypervisor mode available, return an ENODEV error, which is the conventional thing to return to indicate that this module is not applicable to the hardware of the current machine, rather than EIO, which causes a warning to be printed. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-29KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS codePaul Mackerras
The in-kernel emulation of RTAS functions needs to read the argument buffer from guest memory in order to find out what function is being requested. The guest supplies the guest physical address of the buffer, and on a real system the code that reads that buffer would run in guest real mode. In guest real mode, the processor ignores the top 4 bits of the address specified in load and store instructions. In order to emulate that behaviour correctly, we need to mask off those bits before calling kvm_read_guest() or kvm_write_guest(). This adds that masking. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-29KVM: PPC: Book3S HV: Add get/set_one_reg for new TM stateMichael Neuling
This adds code to get/set_one_reg to read and write the new transactional memory (TM) state. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-29KVM: PPC: Book3S HV: Add transactional memory supportMichael Neuling
This adds saving of the transactional memory (TM) checkpointed state on guest entry and exit. We only do this if we see that the guest has an active transaction. It also adds emulation of the TM state changes when delivering IRQs into the guest. According to the architecture, if we are transactional when an IRQ occurs, the TM state is changed to suspended, otherwise it's left unchanged. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-26KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=nAnton Blanchard
I noticed KVM is broken when KVM in-kernel XICS emulation (CONFIG_KVM_XICS) is disabled. The problem was introduced in 48eaef05 (KVM: PPC: Book3S HV: use xics_wake_cpu only when defined). It used CONFIG_KVM_XICS to wrap xics_wake_cpu, where CONFIG_PPC_ICP_NATIVE should have been used. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@vger.kernel.org Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Scott Wood <scottwood@freescale.com>
2014-03-26KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCELaurent Dufour
This introduces the H_GET_TCE hypervisor call, which is basically the reverse of H_PUT_TCE, as defined in the Power Architecture Platform Requirements (PAPR). The hcall H_GET_TCE is required by the kdump kernel, which uses it to retrieve TCEs set up by the previous (panicked) kernel. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
2014-03-26KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd writeGreg Kurz
When the guest does an MMIO write which is handled successfully by an ioeventfd, ioeventfd_write() returns 0 (success) and kvmppc_handle_store() returns EMULATE_DONE. Then kvmppc_emulate_mmio() converts EMULATE_DONE to RESUME_GUEST_NV and this causes an exit from the loop in kvmppc_vcpu_run_hv(), causing an exit back to userspace with a bogus exit reason code, typically causing userspace (e.g. qemu) to crash with a message about an unknown exit code. This adds handling of RESUME_GUEST_NV in kvmppc_vcpu_run_hv() in order to fix that. For generality, we define a helper to check for either of the return-to-guest codes we use, RESUME_GUEST and RESUME_GUEST_NV, to make it easy to check for either and provide one place to update if any other return-to-guest code gets defined in future. Since it only affects Book3S HV for now, the helper is added to the kvm_book3s.h header file. We use the helper in two places in kvmppc_run_core() as well for future-proofing, though we don't see RESUME_GUEST_NV in either place at present. [paulus@samba.org - combined 4 patches into one, rewrote description] Suggested-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2014-03-14Merge branch 'kvm-ppc-fix' into HEADPaolo Bonzini
2014-03-13KVM: PPC: Book3S HV: Fix register usage when loading/saving VRSAVEPaul Mackerras
Commit 595e4f7e697e ("KVM: PPC: Book3S HV: Use load/store_fp_state functions in HV guest entry/exit") changed the register usage in kvmppc_save_fp() and kvmppc_load_fp() but omitted changing the instructions that load and save VRSAVE. The result is that the VRSAVE value was loaded from a constant address, and saved to a location past the end of the vcpu struct, causing host kernel memory corruption and various kinds of host kernel crashes. This fixes the problem by using register r31, which contains the vcpu pointer, instead of r3 and r4. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-13KVM: PPC: Book3S HV: Remove bogus duplicate codePaul Mackerras
Commit 7b490411c37f ("KVM: PPC: Book3S HV: Add new state for transactional memory") incorrectly added some duplicate code to the guest exit path because I didn't manage to clean up after a rebase correctly. This removes the extraneous material. The presence of this extraneous code causes host crashes whenever a guest is run. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-11powerpc/powernv: Add iommu DMA bypass support for IODA2Benjamin Herrenschmidt
This patch adds the support for to create a direct iommu "bypass" window on IODA2 bridges (such as Power8) allowing to bypass iommu page translation completely for 64-bit DMA capable devices, thus significantly improving DMA performances. Additionally, this adds a hook to the struct iommu_table so that the IOMMU API / VFIO can disable the bypass when external ownership is requested, since in that case, the device will be used by an environment such as userspace or a KVM guest which must not be allowed to bypass translations. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11powerpc: Fix endian issues in kexec and crash dump codeAnton Blanchard
We expose a number of OF properties in the kexec and crash dump code and these need to be big endian. Cc: stable@vger.kernel.org # v3.13 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11powerpc/ppc32: Fix the bug in the init of non-base exception stack for UPKevin Hao
We would allocate one specific exception stack for each kind of non-base exceptions for every CPU. For ppc32 the CPU hard ID is used as the subscript to get the specific exception stack for one CPU. But for an UP kernel, there is only one element in the each kind of exception stack array. We would get stuck if the CPU hard ID is not equal to '0'. So in this case we should use the subscript '0' no matter what the CPU hard ID is. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11powerpc/xmon: Don't signal we've entered until we're finished printingMichael Ellerman
Currently we set our cpu's bit in cpus_in_xmon, and then we take the output lock and print the exception information. This can race with the master cpu entering the command loop and printing the backtrace. The result is that the backtrace gets garbled with another cpu's exception print out. Fix it by delaying the set of cpus_in_xmon until we are finished printing. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11powerpc/xmon: Fix timeout loop in get_output_lock()Michael Ellerman
As far as I can tell, our 70s era timeout loop in get_output_lock() is generating no code. This leads to the hostile takeover happening more or less simultaneously on all cpus. The result is "interesting", some example output that is more readable than most: cpu 0x1: Vector: 100 (Scypsut e0mx bR:e setV)e catto xc0p:u[ c 00 c0:0 000t0o0V0erc0td:o5 rfc28050000]0c00 0 0 0 6t(pSrycsV1ppuot uxe 1m 2 0Rx21e3:0s0ce000c00000t00)00 60602oV2SerucSayt0y 0p 1sxs Fix it by using udelay() in the timeout loop. The wait time and check frequency are arbitrary, but seem to work OK. We already rely on udelay() working so this is not a new dependency. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11powerpc/xmon: Don't loop forever in get_output_lock()Michael Ellerman
If we enter with xmon_speaker != 0 we skip the first cmpxchg(), we also skip the while loop because xmon_speaker != last_speaker (0) - meaning we skip the second cmpxchg() also. Following that code path the compiler sees no memory barriers and so is within its rights to never reload xmon_speaker. The end result is we loop forever. This manifests as all cpus being in xmon ('c' command), but they refuse to take control when you switch to them ('c x' for cpu # x). I have seen this deadlock in practice and also checked the generated code to confirm this is what's happening. The simplest fix is just to always try the cmpxchg(). Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11powerpc/perf: Configure BHRB filter before enabling PMU interruptsAnshuman Khandual
Right now the config_bhrb() PMU specific call happens after write_mmcr0(), which actually enables the PMU for event counting and interrupts. So there is a small window of time where the PMU and BHRB runs without the required HW branch filter (if any) enabled in BHRB. This can cause some of the branch samples to be collected through BHRB without any filter applied and hence affects the correctness of the results. This patch moves the BHRB config function call before enabling interrupts. Here are some data points captured via trace prints which depicts how we could get PMU interrupts with BHRB filter NOT enabled with a standard perf record command line (asking for branch record information as well). $ perf record -j any_call ls Before the patch:- ls-1962 [003] d... 2065.299590: .perf_event_interrupt: MMCRA: 40000000000 ls-1962 [003] d... 2065.299603: .perf_event_interrupt: MMCRA: 40000000000 ... All the PMU interrupts before this point did not have the requested HW branch filter enabled in the MMCRA. ls-1962 [003] d... 2065.299647: .perf_event_interrupt: MMCRA: 40040000000 ls-1962 [003] d... 2065.299662: .perf_event_interrupt: MMCRA: 40040000000 After the patch:- ls-1850 [008] d... 190.311828: .perf_event_interrupt: MMCRA: 40040000000 ls-1850 [008] d... 190.311848: .perf_event_interrupt: MMCRA: 40040000000 All the PMU interrupts have the requested HW BHRB branch filter enabled in MMCRA. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Fixed up whitespace and cleaned up changelog] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11powerpc/pseries: Select ARCH_RANDOM on pseriesMichael Ellerman
We have a driver for the ARCH_RANDOM hook in rng.c, so we should select ARCH_RANDOM on pseries. Without this the build breaks if you turn ARCH_RANDOM off. This hasn't broken the build because pseries_defconfig doesn't specify a value for PPC_POWERNV, which is default y, and selects ARCH_RANDOM. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11powerpc/perf: Add Power8 cache & TLB eventsMichael Ellerman
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11powerpc/relocate fix relocate processing in LE modeLaurent Dufour
Relocation's code is not working in little endian mode because the r_info field, which is a 64 bits value, should be read from the right offset. The current code is optimized to read the r_info field as a 32 bits value starting at the middle of the double word (offset 12). When running in LE mode, the read value is not correct since only the MSB is read. This patch removes this optimization which consist to deal with a 32 bits value instead of a 64 bits one. This way it works in big and little endian mode. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11powerpc: Fix kdump hang issue on p8 with relocation on exception enabled.Mahesh Salgaonkar
On p8 systems, with relocation on exception feature enabled we are seeing kdump kernel hang at interrupt vector 0xc*4400. The reason is, with this feature enabled, exception are raised with MMU (IR=DR=1) ON with the default offset of 0xc*4000. Since exception is raised in virtual mode it requires the vector region to be executable without which it fails to fetch and execute instruction at 0xc*4xxx. For default kernel since kernel is loaded at real 0, the htab mappings sets the entire kernel text region executable. But for relocatable kernel (e.g. kdump case) we only copy interrupt vectors down to real 0 and never marked that region as executable because in p7 and below we always get exception in real mode. This patch fixes this issue by marking htab mapping range as executable that overlaps with the interrupt vector region for relocatable kernel. Thanks to Ben who helped me to debug this issue and find the root cause. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11powerpc/pseries: Disable relocation on exception while going down during crash.Mahesh Salgaonkar
Disable relocation on exception while going down even in kdump case. This is because we are about clear htab mappings while kexec-ing into kdump kernel and we may run into issues if we still have AIL ON. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11powerpc/eeh: Drop taken reference to driver on eeh_rmv_deviceThadeu Lima de Souza Cascardo
Commit f5c57710dd62dd06f176934a8b4b8accbf00f9f8 ("powerpc/eeh: Use partial hotplug for EEH unaware drivers") introduces eeh_rmv_device, which may grab a reference to a driver, but not release it. That prevents a driver from being removed after it has gone through EEH recovery. This patch drops the reference if it was taken. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-11powerpc: Fix build failure in sysdev/mpic.c for MPIC_WEIRD=yPaul Gortmaker
Commit 446f6d06fab0b49c61887ecbe8286d6aaa796637 ("powerpc/mpic: Properly set default triggers") breaks the mpc7447_hpc_defconfig as follows: CC arch/powerpc/sysdev/mpic.o arch/powerpc/sysdev/mpic.c: In function 'mpic_set_irq_type': arch/powerpc/sysdev/mpic.c:886:9: error: case label does not reduce to an integer constant arch/powerpc/sysdev/mpic.c:890:9: error: case label does not reduce to an integer constant arch/powerpc/sysdev/mpic.c:894:9: error: case label does not reduce to an integer constant arch/powerpc/sysdev/mpic.c:898:9: error: case label does not reduce to an integer constant Looking at the cpp output (gcc 4.7.3), I see: case mpic->hw_set[MPIC_IDX_VECPRI_SENSE_EDGE] | mpic->hw_set[MPIC_IDX_VECPRI_POLARITY_POSITIVE]: The pointer into an array appears because CONFIG_MPIC_WEIRD=y is set for this platform, thus enabling the following: ------------------- #ifdef CONFIG_MPIC_WEIRD static u32 mpic_infos[][MPIC_IDX_END] = { [0] = { /* Original OpenPIC compatible MPIC */ [...] #define MPIC_INFO(name) mpic->hw_set[MPIC_IDX_##name] #else /* CONFIG_MPIC_WEIRD */ #define MPIC_INFO(name) MPIC_##name #endif /* CONFIG_MPIC_WEIRD */ ------------------- Here we convert the case section to if/else if, and also add the equivalent of a default case to warn about unknown types. Boot tested on sbc8548, build tested on all defconfigs. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more KVM updates from Paolo Bonzini: "Second batch of KVM updates. Some minor x86 fixes, two s390 guest features that need some handling in the host, and all the PPC changes. The PPC changes include support for little-endian guests and enablement for new POWER8 features" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (45 commits) x86, kvm: correctly access the KVM_CPUID_FEATURES leaf at 0x40000101 x86, kvm: cache the base of the KVM cpuid leaves kvm: x86: move KVM_CAP_HYPERV_TIME outside #ifdef KVM: PPC: Book3S PR: Cope with doorbell interrupts KVM: PPC: Book3S HV: Add software abort codes for transactional memory KVM: PPC: Book3S HV: Add new state for transactional memory powerpc/Kconfig: Make TM select VSX and VMX KVM: PPC: Book3S HV: Basic little-endian guest support KVM: PPC: Book3S HV: Add support for DABRX register on POWER7 KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8 KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs KVM: PPC: Book3S HV: Consolidate code that checks reason for wake from nap KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8 KVM: PPC: Book3S HV: Add handler for HV facility unavailable KVM: PPC: Book3S HV: Flush the correct number of TLB sets on POWER8 KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers KVM: PPC: Book3S HV: Don't set DABR on POWER8 kvm/ppc: IRQ disabling cleanup ...
2014-01-30Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull more powerpc bits from Ben Herrenschmidt: "Here are a few more powerpc bits for this merge window. The bulk is made of two pull requests from Scott and Anatolij that I had missed previously (they arrived while I was away). Since both their branches are in -next independently, and the content has been around for a little while, they can still go in. The rest is mostly bug and regression fixes, a small series of cleanups to our pseries cpuidle code (including moving it to the right place), and one new cpuidle bakend for the powernv platform. I also wired up the new sched_attr syscalls" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (37 commits) powerpc: Wire up sched_setattr and sched_getattr syscalls powerpc/hugetlb: Replace __get_cpu_var with get_cpu_var powerpc: Make sure "cache" directory is removed when offlining cpu powerpc/mm: Fix mmap errno when MAP_FIXED is set and mapping exceeds the allowed address space powerpc/powernv/cpuidle: Back-end cpuidle driver for powernv platform. powerpc/pseries/cpuidle: smt-snooze-delay cleanup. powerpc/pseries/cpuidle: Remove MAX_IDLE_STATE macro. powerpc/pseries/cpuidle: Make cpuidle-pseries backend driver a non-module. powerpc/pseries/cpuidle: Use cpuidle_register() for initialisation. powerpc/pseries/cpuidle: Move processor_idle.c to drivers/cpuidle. powerpc: Fix 32-bit frames for signals delivered when transactional powerpc/iommu: Fix initialisation of DART iommu table powerpc/numa: Fix decimal permissions powerpc/mm: Fix compile error of pgtable-ppc64.h powerpc: Fix hw breakpoints on !HAVE_HW_BREAKPOINT configurations clk: corenet: Adds the clock binding powerpc/booke64: Guard e6500 tlb handler with CONFIG_PPC_FSL_BOOK3E powerpc/512x: dts: add MPC5125 clock specs powerpc/512x: clk: support MPC5121/5123/5125 SoC variants powerpc/512x: clk: enforce even SDHC divider values ...
2014-01-30Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block IO changes from Jens Axboe: "The major piece in here is the immutable bio_ve series from Kent, the rest is fairly minor. It was supposed to go in last round, but various issues pushed it to this release instead. The pull request contains: - Various smaller blk-mq fixes from different folks. Nothing major here, just minor fixes and cleanups. - Fix for a memory leak in the error path in the block ioctl code from Christian Engelmayer. - Header export fix from CaiZhiyong. - Finally the immutable biovec changes from Kent Overstreet. This enables some nice future work on making arbitrarily sized bios possible, and splitting more efficient. Related fixes to immutable bio_vecs: - dm-cache immutable fixup from Mike Snitzer. - btrfs immutable fixup from Muthu Kumar. - bio-integrity fix from Nic Bellinger, which is also going to stable" * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits) xtensa: fixup simdisk driver to work with immutable bio_vecs block/blk-mq-cpu.c: use hotcpu_notifier() blk-mq: for_each_* macro correctness block: Fix memory leak in rw_copy_check_uvector() handling bio-integrity: Fix bio_integrity_verify segment start bug block: remove unrelated header files and export symbol blk-mq: uses page->list incorrectly blk-mq: use __smp_call_function_single directly btrfs: fix missing increment of bi_remaining Revert "block: Warn and free bio if bi_end_io is not set" block: Warn and free bio if bi_end_io is not set blk-mq: fix initializing request's start time block: blk-mq: don't export blk_mq_free_queue() block: blk-mq: make blk_sync_queue support mq block: blk-mq: support draining mq queue dm cache: increment bi_remaining when bi_end_io is restored block: fixup for generic bio chaining block: Really silence spurious compiler warnings block: Silence spurious compiler warnings block: Kill bio_pair_split() ...
2014-01-29Merge branch 'kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-queuePaolo Bonzini
Conflicts: arch/powerpc/kvm/book3s_hv_rmhandlers.S arch/powerpc/kvm/booke.c
2014-01-29powerpc: Wire up sched_setattr and sched_getattr syscallsBenjamin Herrenschmidt
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-29powerpc/hugetlb: Replace __get_cpu_var with get_cpu_varTiejun Chen
Replace __get_cpu_var safely with get_cpu_var to avoid the following call trace: [ 7253.637591] BUG: using smp_processor_id() in preemptible [00000000 00000000] code: hugemmap01/9048 [ 7253.637601] caller is free_hugepd_range.constprop.25+0x88/0x1a8 [ 7253.637605] CPU: 1 PID: 9048 Comm: hugemmap01 Not tainted 3.10.20-rt14+ #114 [ 7253.637606] Call Trace: [ 7253.637617] [cb049d80] [c0007ea4] show_stack+0x4c/0x168 (unreliable) [ 7253.637624] [cb049dc0] [c031c674] debug_smp_processor_id+0x114/0x134 [ 7253.637628] [cb049de0] [c0016d28] free_hugepd_range.constprop.25+0x88/0x1a8 [ 7253.637632] [cb049e00] [c001711c] hugetlb_free_pgd_range+0x6c/0x168 [ 7253.637639] [cb049e40] [c0117408] free_pgtables+0x12c/0x150 [ 7253.637646] [cb049e70] [c011ce38] unmap_region+0xa0/0x11c [ 7253.637671] [cb049ef0] [c011f03c] do_munmap+0x224/0x3bc [ 7253.637676] [cb049f20] [c011f2e0] vm_munmap+0x38/0x5c [ 7253.637682] [cb049f40] [c000ef88] ret_from_syscall+0x0/0x3c [ 7253.637686] --- Exception: c01 at 0xff16004 Signed-off-by: Tiejun Chen<tiejun.chen@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-29powerpc: Make sure "cache" directory is removed when offlining cpuPaul Mackerras
The code in remove_cache_dir() is supposed to remove the "cache" subdirectory from the sysfs directory for a CPU when that CPU is being offlined. It tries to do this by calling kobject_put() on the kobject for the subdirectory. However, the subdirectory only gets removed once the last reference goes away, and the reference being put here may well not be the last reference. That means that the "cache" subdirectory may still exist when the offlining operation has finished. If the same CPU subsequently gets onlined, the code tries to add a new "cache" subdirectory. If the old subdirectory has not yet been removed, we get a WARN_ON in the sysfs code, with stack trace, and an error message printed on the console. Further, we ultimately end up with an online cpu with no "cache" subdirectory. This fixes it by doing an explicit kobject_del() at the point where we want the subdirectory to go away. kobject_del() removes the sysfs directory even though the object still exists in memory. The object will get freed at some point in the future. A subsequent onlining operation can create a new sysfs directory, even if the old object still exists in memory, without causing any problems. Cc: stable@vger.kernel.org # v3.0+ Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-29powerpc/mm: Fix mmap errno when MAP_FIXED is set and mapping exceeds the ↵jmarchan@redhat.com
allowed address space According to Posix, if MAP_FIXED is specified mmap shall set ENOMEM if the requested mapping exceeds the allowed range for address space of the process. The generic code set it right, but the specific powerpc slice_get_unmapped_area() function currently returns -EINVAL in that case. This patch corrects it. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-29powerpc/powernv/cpuidle: Back-end cpuidle driver for powernv platform.Deepthi Dharwar
Following patch ports the cpuidle framework for powernv platform and also implements a cpuidle back-end powernv idle driver calling on to power7_nap and snooze idle states. Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-29powerpc/pseries/cpuidle: smt-snooze-delay cleanup.Deepthi Dharwar
smt-snooze-delay was designed to disable NAP state or delay the entry to the NAP state prior to adoption of cpuidle framework. This is per-cpu variable. With the coming of CPUIDLE framework, states can be disabled on per-cpu basis using the cpuidle/enable sysfs entry. Also, with the coming of cpuidle driver each state's target residency is per-driver unlike earlier which was per-device. Therefore, the per-cpu sysfs smt-snooze-delay which decides the target residency of the idle state on a particular cpu causes more confusion to the user as we cannot have different smt-snooze-delay (target residency) values for each cpu. In the current code, smt-snooze-delay functionality is completely broken. It makes sense to remove smt-snooze-delay from idle driver with the coming of cpuidle framework. However, sysfs files are retained as ppc64_util currently utilises it. Once we fix ppc64_util, propose to clean up the kernel code. Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-29powerpc/pseries/cpuidle: Move processor_idle.c to drivers/cpuidle.Deepthi Dharwar
Move the file from arch specific pseries/processor_idle.c to drivers/cpuidle/cpuidle-pseries.c Make the relevant Makefile and Kconfig changes. Also, introduce Kconfig.powerpc in drivers/cpuidle for all powerpc cpuidle drivers. Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-29powerpc: Fix 32-bit frames for signals delivered when transactionalPaul Mackerras
Commit d31626f70b61 ("powerpc: Don't corrupt transactional state when using FP/VMX in kernel") introduced a bug where the uc_link and uc_regs fields of the ucontext_t that is created to hold the transactional values of the registers in a 32-bit signal frame didn't get set correctly. The reason is that we now clear the MSR_TS bits in the MSR in save_tm_user_regs(), before the code that sets uc_link and uc_regs. To fix this, we move the setting of uc_link and uc_regs into the same if statement that selects whether to call save_tm_user_regs() or save_user_regs(). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-29powerpc/iommu: Fix initialisation of DART iommu tableAlistair Popple
Commit d084775738b746648d4102337163a04534a02982 switched the generic powerpc iommu backend code to use the it_page_shift field to determine page size. Commit 3a553170d35d69bea3877bffa508489dfa6f133d should have initiliased this field for all platforms, however the DART iommu table code was not updated. This commit initialises the it_page_shift field to 4K for the DART iommu. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-29powerpc/numa: Fix decimal permissionsJoe Perches
This should have been octal. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-29powerpc/mm: Fix compile error of pgtable-ppc64.hLi Zhong
It seems that forward declaration couldn't work well with typedef, use struct spinlock directly to avoiding following build errors: In file included from include/linux/spinlock.h:81, from include/linux/seqlock.h:35, from include/linux/time.h:5, from include/uapi/linux/timex.h:56, from include/linux/timex.h:56, from include/linux/sched.h:17, from arch/powerpc/kernel/asm-offsets.c:17: include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t' /root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-29powerpc: Fix hw breakpoints on !HAVE_HW_BREAKPOINT configurationsAndreas Schwab
This fixes a logic error that caused a failure to update the hw breakpoint registers when not using the hw-breakpoint interface. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-29Merge remote-tracking branch 'scott/next' into nextBenjamin Herrenschmidt
<< This contains a fix for a chroma_defconfig build break that was introduced by e6500 tablewalk support, and a device tree binding patch that missed the previous pull request due to some last-minute polishing. >>
2014-01-29Merge remote-tracking branch 'agust/next' into nextBenjamin Herrenschmidt
<< Switch mpc512x to the common clock framework and adapt mpc512x drivers to use the new clock driver. Old PPC_CLOCK code is removed entirely since there are no users any more. >>
2014-01-27Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "So here's my next branch for powerpc. A bit late as I was on vacation last week. It's mostly the same stuff that was in next already, I just added two patches today which are the wiring up of lockref for powerpc, which for some reason fell through the cracks last time and is trivial. The highlights are, in addition to a bunch of bug fixes: - Reworked Machine Check handling on kernels running without a hypervisor (or acting as a hypervisor). Provides hooks to handle some errors in real mode such as TLB errors, handle SLB errors, etc... - Support for retrieving memory error information from the service processor on IBM servers running without a hypervisor and routing them to the memory poison infrastructure. - _PAGE_NUMA support on server processors - 32-bit BookE relocatable kernel support - FSL e6500 hardware tablewalk support - A bunch of new/revived board support - FSL e6500 deeper idle states and altivec powerdown support You'll notice a generic mm change here, it has been acked by the relevant authorities and is a pre-req for our _PAGE_NUMA support" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (121 commits) powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked() powerpc: Add support for the optimised lockref implementation powerpc/powernv: Call OPAL sync before kexec'ing powerpc/eeh: Escalate error on non-existing PE powerpc/eeh: Handle multiple EEH errors powerpc: Fix transactional FP/VMX/VSX unavailable handlers powerpc: Don't corrupt transactional state when using FP/VMX in kernel powerpc: Reclaim two unused thread_info flag bits powerpc: Fix races with irq_work Move precessing of MCE queued event out from syscall exit path. pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines powerpc: Make add_system_ram_resources() __init powerpc: add SATA_MV to ppc64_defconfig powerpc/powernv: Increase candidate fw image size powerpc: Add debug checks to catch invalid cpu-to-node mappings powerpc: Fix the setup of CPU-to-Node mappings during CPU online powerpc/iommu: Don't detach device without IOMMU group powerpc/eeh: Hotplug improvement powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config space powerpc/eeh: Add restore_config operation ...
2014-01-27Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc mremap fix from Ben Herrenschmidt: "This is the patch that I had sent after -rc8 and which we decided to wait before merging. It's based on a different tree than my -next branch (it needs some pre-reqs that were in -rc4 or so while my -next is based on -rc1) so I left it as a separate branch for your to pull. It's identical to the request I did 2 or 3 weeks back. This fixes crashes in mremap with THP on powerpc. The fix however requires a small change in the generic code. It moves a condition into a helper we can override from the arch which is harmless, but it *also* slightly changes the order of the set_pmd and the withdraw & deposit, which should be fine according to Kirill (who wrote that code) but I agree -rc8 is a bit late... It was acked by Kirill and Andrew told me to just merge it via powerpc" * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/thp: Fix crash on mremap
2014-01-28powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked()Michael Ellerman
At a glance these are just the inverse of each other. The one subtlety is that arch_spin_value_unlocked() takes the lock by value, rather than as a pointer, which is important for the lockref code. On the other hand arch_spin_is_locked() doesn't really care, so implement it in terms of arch_spin_value_unlocked(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-28powerpc: Add support for the optimised lockref implementationMichael Ellerman
This commit adds the architecture support required to enable the optimised implementation of lockrefs. That's as simple as defining arch_spin_value_unlocked() and selecting the Kconfig option. We also define cmpxchg64_relaxed(), because the lockref code does not need the cmpxchg to have barrier semantics. Using Linus' test case[1] on one system I see a 4x improvement for the basic enablement, and a further 1.3x for cmpxchg64_relaxed(), for a total of 5.3x vs the baseline. On another system I see more like 2x improvement. [1]: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>