Age | Commit message (Collapse) | Author |
|
Pull networking updates from David Miller:
1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.
2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
Benniston.
3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
Mork.
4) BPF now has a "random" opcode, from Chema Gonzalez.
5) Add more BPF documentation and improve test framework, from Daniel
Borkmann.
6) Support TCP fastopen over ipv6, from Daniel Lee.
7) Add software TSO helper functions and use them to support software
TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia.
8) Support software TSO in fec driver too, from Nimrod Andy.
9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.
10) Handle broadcasts more gracefully over macvlan when there are large
numbers of interfaces configured, from Herbert Xu.
11) Allow more control over fwmark used for non-socket based responses,
from Lorenzo Colitti.
12) Do TCP congestion window limiting based upon measurements, from Neal
Cardwell.
13) Support busy polling in SCTP, from Neal Horman.
14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.
15) Bridge promisc mode handling improvements from Vlad Yasevich.
16) Don't use inetpeer entries to implement ID generation any more, it
performs poorly, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
tcp: fixing TLP's FIN recovery
net: fec: Add software TSO support
net: fec: Add Scatter/gather support
net: fec: Increase buffer descriptor entry number
net: fec: Factorize feature setting
net: fec: Enable IP header hardware checksum
net: fec: Factorize the .xmit transmit function
bridge: fix compile error when compiling without IPv6 support
bridge: fix smatch warning / potential null pointer dereference
via-rhine: fix full-duplex with autoneg disable
bnx2x: Enlarge the dorq threshold for VFs
bnx2x: Check for UNDI in uncommon branch
bnx2x: Fix 1G-baseT link
bnx2x: Fix link for KR with swapped polarity lane
sctp: Fix sk_ack_backlog wrap-around problem
net/core: Add VF link state control policy
net/fsl: xgmac_mdio is dependent on OF_MDIO
net/fsl: Make xgmac_mdio read error message useful
net_sched: drr: warn when qdisc is not work conserving
...
|
|
Pull watchdog updates from Wim Van Sebroeck:
"This contains:
- addition of the Intel MID watchdog
- removal of W83697HF and W83697UG drivers (code was merged into
w83627hf_wdt driver)
- addition of Armada 375/380 SoC support
- conversion of imx2_wdt to regmap API and to watchdog core API
- lots of other small improvements and fixes"
[ Wim was also tagged by gmail as a spammer, but not delayed by days
unlike Ben ]
* git://www.linux-watchdog.org/linux-watchdog: (25 commits)
x86: intel-mid: add watchdog platform code for Merrifield
watchdog: add Intel MID watchdog driver support
watchdog: sp805: Set watchdog_device->timeout from ->set_timeout()
booke/watchdog: refine and clean up the codes
watchdog: iop_wdt only builds for mach-iop13xx
watchdog: Remove drivers for W83697HF and W83697UG
watchdog: w83627hf_wdt: Add early_disable module parameter
ARM: mvebu: Add A375/A380 watchdog binding documentation
watchdog: orion: Add Armada 375/380 SoC support
watchdog: orion: Introduce per-SoC enabled() function
watchdog: orion: Introduce per-SoC stop() function
watchdog: orion: Remove unneeded atomic access
watchdog: orion: Introduce a SoC-specific RSTOUT mapping
watchdog: orion: Move the register ioremap'ing to its own function
watchdog: xilinx: Make of_device_id array const
watchdog: imx2_wdt: convert to watchdog core api
watchdog: imx2_wdt: convert to use regmap API.
watchdog: imx2_wdt: Sort the header files alphabetically
watchdog: ath79_wdt: switch to clk_prepare/clk_disable
watchdog: ath79_wdt: avoid spurious restarts on AR934x
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
"Here is the bulk of the powerpc changes for this merge window. It got
a bit delayed in part because I wasn't paying attention, and in part
because I discovered I had a core PCI change without a PCI maintainer
ack in it. Bjorn eventually agreed it was ok to merge it though we'll
probably improve it later and I didn't want to rebase to add his ack.
There is going to be a bit more next week, essentially fixes that I
still want to sort through and test.
The biggest item this time is the support to build the ppc64 LE kernel
with our new v2 ABI. We previously supported v2 userspace but the
kernel itself was a tougher nut to crack. This is now sorted mostly
thanks to Anton and Rusty.
We also have a fairly big series from Cedric that add support for
64-bit LE zImage boot wrapper. This was made harder by the fact that
traditionally our zImage wrapper was always 32-bit, but our new LE
toolchains don't really support 32-bit anymore (it's somewhat there
but not really "supported") so we didn't want to rely on it. This
meant more churn that just endian fixes.
This brings some more LE bits as well, such as the ability to run in
LE mode without a hypervisor (ie. under OPAL firmware) by doing the
right OPAL call to reinitialize the CPU to take HV interrupts in the
right mode and the usual pile of endian fixes.
There's another series from Gavin adding EEH improvements (one day we
*will* have a release with less than 20 EEH patches, I promise!).
Another highlight is the support for the "Split core" functionality on
P8 by Michael. This allows a P8 core to be split into "sub cores" of
4 threads which allows the subcores to run different guests under KVM
(the HW still doesn't support a partition per thread).
And then the usual misc bits and fixes ..."
[ Further delayed by gmail deciding that BenH is a dirty spammer.
Google knows. ]
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (155 commits)
powerpc/powernv: Add missing include to LPC code
selftests/powerpc: Test the THP bug we fixed in the previous commit
powerpc/mm: Check paca psize is up to date for huge mappings
powerpc/powernv: Pass buffer size to OPAL validate flash call
powerpc/pseries: hcall functions are exported to modules, need _GLOBAL_TOC()
powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC()
powerpc/powernv: Set memory_block_size_bytes to 256MB
powerpc: Allow ppc_md platform hook to override memory_block_size_bytes
powerpc/powernv: Fix endian issues in memory error handling code
powerpc/eeh: Skip eeh sysfs when eeh is disabled
powerpc: 64bit sendfile is capped at 2GB
powerpc/powernv: Provide debugfs access to the LPC bus via OPAL
powerpc/serial: Use saner flags when creating legacy ports
powerpc: Add cpu family documentation
powerpc/xmon: Fix up xmon format strings
powerpc/powernv: Add calls to support little endian host
powerpc: Document sysfs DSCR interface
powerpc: Fix regression of per-CPU DSCR setting
powerpc: Split __SYSFS_SPRSETUP macro
arch: powerpc/fadump: Cleaning up inconsistent NULL checks
...
|
|
Basically, this patch does the following:
1. Move the codes of parsing boot parameters from setup-common.c
to driver. In this way, code reader can know directly that
there are boot parameters that can change the timeout.
2. Make boot parameter 'booke_wdt_period' effective.
currently, when driver is loaded, default timeout is always
being used in stead of booke_wdt_period.
3. Wrap up the watchdog timeout in device struct and clean up
unnecessary codes.
Signed-off-by: Tang Yuantian <yuantian.tang@freescale.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Reviewed-by: Li Yang <leoli@freescale.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
|
|
Pull slave-dmaengine updates from Vinod Koul:
- new Xilixn VDMA driver from Srikanth
- bunch of updates for edma driver by Thomas, Joel and Peter
- fixes and updates on dw, ste_dma, freescale, mpc512x, sudmac etc
* 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma: (45 commits)
dmaengine: sh: don't use dynamic static allocation
dmaengine: sh: fix print specifier warnings
dmaengine: sh: make shdma_prep_dma_cyclic static
dmaengine: Kconfig: Update MXS_DMA help text to include MX6Q/MX6DL
of: dma: Grammar s/requests/request/, s/used required/required/
dmaengine: shdma: Enable driver compilation with COMPILE_TEST
dmaengine: rcar-hpbdma: Include linux/err.h
dmaengine: sudmac: Include linux/err.h
dmaengine: sudmac: Keep #include sorted alphabetically
dmaengine: shdmac: Include linux/err.h
dmaengine: shdmac: Keep #include sorted alphabetically
dmaengine: s3c24xx-dma: Add cyclic transfer support
dmaengine: s3c24xx-dma: Process whole SG chain
dmaengine: imx: correct sdmac->status for cyclic dma tx
dmaengine: pch: fix compilation for alpha target
dmaengine: dw: check return code of dma_async_device_register()
dmaengine: dw: fix regression in dw_probe() function
dmaengine: dw: enable clock before access
dma: pch_dma: Fix Kconfig dependencies
dmaengine: mpc512x: add support for peripheral transfers
...
|
|
As of commit 799fef06123f ("powerpc: Use generic idle loop"), this
applies to arch_cpu_idle() instead of cpu_idle().
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
kbuild bot spotted that one:
arch/powerpc/platforms/powernv/opal-lpc.c: In function 'opal_lpc_init_debugfs':
>> arch/powerpc/platforms/powernv/opal-lpc.c:319:35: error: 'powerpc_debugfs_root' undeclared (first use in this function)
root = debugfs_create_dir("lpc", powerpc_debugfs_root);
^
We neet to include the definition explicitely.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
We have a bug in our hugepage handling which exhibits as an infinite
loop of hash faults. If the fault is being taken in the kernel it will
typically trigger the softlockup detector, or the RCU stall detector.
The bug is as follows:
1. mmap(0xa0000000, ..., MAP_FIXED | MAP_HUGE_TLB | MAP_ANONYMOUS ..)
2. Slice code converts the slice psize to 16M.
3. The code on lines 539-540 of slice.c in slice_get_unmapped_area()
synchronises the mm->context with the paca->context. So the paca slice
mask is updated to include the 16M slice.
3. Either:
* mmap() fails because there are no huge pages available.
* mmap() succeeds and the mapping is then munmapped.
In both cases the slice psize remains at 16M in both the paca & mm.
4. mmap(0xa0000000, ..., MAP_FIXED | MAP_ANONYMOUS ..)
5. The slice psize is converted back to 64K. Because of the check on line 539
of slice.c we DO NOT update the paca->context. The paca slice mask is now
out of sync with the mm slice mask.
6. User/kernel accesses 0xa0000000.
7. The SLB miss handler slb_allocate_realmode() **uses the paca slice mask**
to create an SLB entry and inserts it in the SLB.
18. With the 16M SLB entry in place the hardware does a hash lookup, no entry
is found so a data access exception is generated.
19. The data access handler calls do_page_fault() -> handle_mm_fault().
10. __handle_mm_fault() creates a THP mapping with do_huge_pmd_anonymous_page().
11. The hardware retries the access, there is still nothing in the hash table
so once again a data access exception is generated.
12. hash_page() calls into __hash_page_thp() and inserts a mapping in the
hash. Although the THP mapping maps 16M the hashing is done using 64K
as the segment page size.
13. hash_page() returns immediately after calling __hash_page_thp(), skipping
over the code at line 1125. Resulting in the mismatch between the
paca->context and mm->context not being detected.
14. The hardware retries the access, the hash it generates using the 16M
SLB entry does NOT match the hash we inserted.
15. We take another data access and go into __hash_page_thp().
16. We see a valid entry in the hpte_slot_array and so we call updatepp()
which succeeds.
17. Goto 14.
We could fix this in two ways. The first would be to remove or modify
the check on line 539 of slice.c.
The second option is to cause the check of paca psize in hash_page() on
line 1125 to also be done for THP pages.
We prefer the latter, because the check & update of the paca psize is
not done until we know it's necessary. It's also done only on the
current cpu, so we don't need to IPI all other cpus.
Without further rearranging the code, the simplest fix is to pull out
the code that checks paca psize and call it in two places. Firstly for
THP/hugetlb, and secondly for other mappings as before.
Thanks to Dave Jones for trinity, which originally found this bug.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org [v3.11+]
|
|
We pass actual buffer size to opal_validate_flash() OPAL API call
and in return it contains output buffer size.
Commit cc146d1d (Fix little endian issues) missed to set the size
param before making OPAL call. So firmware image validation fails.
This patch sets size variable before making OPAL call.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Tested-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The hcall macros may call out to c code for tracing, so we need
to set up a valid r2. This fixes an oops found when testing
ibmvscsi as a module.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
_GLOBAL_TOC()
__clear_user and copy_page load from the TOC and are also exported
to modules. This means we have to use _GLOBAL_TOC() so that we
create the global entry point that sets up the TOC.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
powerpc sets a low SECTION_SIZE_BITS to accomodate small pseries
boxes. We default to 16MB memory blocks, and boxes with a lot
of memory end up with enormous numbers of sysfs memory nodes.
Set a more reasonable default for powernv of 256MB.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The pseries platform code unconditionally overrides
memory_block_size_bytes regardless of the running platform.
Create a ppc_md hook that so each platform can choose to
do what it wants.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
struct OpalMemoryErrorData is passed to us from firmware, so we
have to byteswap it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
When eeh is not enabled, and hotplug two pci devices on the same bus, eeh
related sysfs would be added twice for the first added pci device. Since the
eeh_dev is not created when eeh is not enabled.
This patch adds the check, if eeh is not enabled, eeh sysfs will not be
created.
After applying this patch, following warnings are reduced:
sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:00.0/eeh_mode'
sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:00.0/eeh_config_addr'
sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:00.0/eeh_pe_config_addr'
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
commit 8f9c0119d7ba (compat: fs: Generic compat_sys_sendfile
implementation) changed the PowerPC 64bit sendfile call from
sys_sendile64 to sys_sendfile.
Unfortunately this broke sendfile of lengths greater than 2G because
sys_sendfile caps at MAX_NON_LFS. Restore what we had previously which
fixes the bug.
Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This provides debugfs files to access the LPC bus on Power8
non-virtualized using the appropriate OPAL firmware calls.
The usage is simple: one file per space (IO, MEM and FW),
lseek to the address and read/write the data. IO and MEM always
generate series of byte accesses. FW can generate word and dword
accesses if aligned properly.
Based on an original patch from Rob Lippert and reworked.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
We had a mix & match of flags used when creating legacy ports
depending on where we found them in the device-tree. Among others
we were missing UPF_SKIP_TEST for some kind of ISA ports which is
a problem as quite a few UARTs out there don't support the loopback
test (such as a lot of BMCs).
Let's pick the set of flags used by the SoC code and generalize it
which means autoconf, no loopback test, irq maybe shared and fixed
port.
Sending to stable as the lack of UPF_SKIP_TEST is breaking
serial on some machines so I want this back into distros
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org
|
|
There are a couple of places where xmon is using %x to print values that
are unsigned long.
I found this out the hard way recently:
0:mon> p c000000000d0e7c8 c00000033dc90000 00000000a0000089 c000000000000000
return value is 0x96300500
Which is calling find_linux_pte_or_hugepte(), the result should be a
kernel pointer. After decoding the page tables by hand I discovered the
correct value was c000000396300500.
So fix up that case and a few others.
We also use a mix of 0x%x, %x and %u to print cpu numbers. So
standardise on 0x%x.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
When running as a powernv "host" system on P8, we need to switch
the endianness of interrupt handlers. This does it via the appropriate
call to the OPAL firmware which may result in just switching HID0:HILE
but depending on the processor version might need to do a few more
things. This call must be done early before any other processor has
been brought out of firmware.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
sys_sgetmask and sys_ssetmask are obsolete system calls no longer
supported in libc.
This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert
mode configuration.That option is enabled by default for those
architectures.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When it was introduced, zone_reclaim_mode made sense as NUMA distances
punished and workloads were generally partitioned to fit into a NUMA
node. NUMA machines are now common but few of the workloads are
NUMA-aware and it's routine to see major performance degradation due to
zone_reclaim_mode being enabled but relatively few can identify the
problem.
Those that require zone_reclaim_mode are likely to be able to detect
when it needs to be enabled and tune appropriately so lets have a
sensible default for the bulk of users.
This patch (of 2):
zone_reclaim_mode causes processes to prefer reclaiming memory from
local node instead of spilling over to other nodes. This made sense
initially when NUMA machines were almost exclusively HPC and the
workload was partitioned into nodes. The NUMA penalties were
sufficiently high to justify reclaiming the memory. On current machines
and workloads it is often the case that zone_reclaim_mode destroys
performance but not all users know how to detect this. Favour the
common case and disable it by default. Users that are sophisticated
enough to know they need zone_reclaim_mode will detect it.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
_PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
faults on x86. Care is taken such that _PAGE_NUMA is used only in
situations where the VMA flags distinguish between NUMA hinting faults
and prot_none faults. This decision was x86-specific and conceptually
it is difficult requiring special casing to distinguish between PROTNONE
and NUMA ptes based on context.
Fundamentally, we only need the _PAGE_NUMA bit to tell the difference
between an entry that is really unmapped and a page that is protected
for NUMA hinting faults as if the PTE is not present then a fault will
be trapped.
Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset.
This patch shrinks the maximum possible swap size and uses the bit to
uniquely distinguish between NUMA hinting ptes and swap ptes.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently hugepage migration is available for all archs which support
pmd-level hugepage, but testing is done only for x86_64 and there're
bugs for other archs. So to avoid breaking such archs, this patch
limits the availability strictly to x86_64 until developers of other
archs get interested in enabling this feature.
Simply disabling hugepage migration on non-x86_64 archs is not enough to
fix the reported problem where sys_move_pages() hits the BUG_ON() in
follow_page(FOLL_GET), so let's fix this by checking if hugepage
migration is supported in vma_migratable().
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux into next
Pull DeviceTree updates from Rob Herring:
- Another round of clean-up of FDT related code in architecture code.
This removes knowledge of internal FDT details from most
architectures except powerpc.
- Conversion of kernel's custom FDT parsing code to use libfdt.
- DT based initialization for generic serial earlycon. The
introduction of generic serial earlycon support went in through the
tty tree.
- Improve the platform device naming for DT probed devices to ensure
unique naming and use parent names instead of a global index.
- Fix a race condition in of_update_property.
- Unify the various linker section OF match tables and fix several
function prototype errors.
- Update platform_get_irq_byname to work in deferred probe cases.
- 2 binding doc updates
* tag 'devicetree-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (58 commits)
of: handle NULL node in next_child iterators
of/irq: provide more wrappers for !CONFIG_OF
devicetree: bindings: Document micrel vendor prefix
dt: bindings: dwc2: fix required value for the phy-names property
of_pci_irq: kill useless variable in of_irq_parse_pci()
of/irq: do irq resolution in platform_get_irq_byname()
of: Add a testcase for of_find_node_by_path()
of: Make of_find_node_by_path() handle /aliases
of: Create unlocked version of for_each_child_of_node()
lib: add glibc style strchrnul() variant
of: Handle memory@0 node on PPC32 only
pci/of: Remove dead code
of: fix race between search and remove in of_update_property()
of: Use NULL for pointers
of: Stop naming platform_device using dcr address
of: Ensure unique names without sacrificing determinism
tty/serial: pl011: add DT based earlycon support
of/fdt: add FDT serial scanning for earlycon
of/fdt: add FDT address translation support
serial: earlycon: add DT support
...
|
|
Pull KVM updates from Paolo Bonzini:
"At over 200 commits, covering almost all supported architectures, this
was a pretty active cycle for KVM. Changes include:
- a lot of s390 changes: optimizations, support for migration, GDB
support and more
- ARM changes are pretty small: support for the PSCI 0.2 hypercall
interface on both the guest and the host (the latter acked by
Catalin)
- initial POWER8 and little-endian host support
- support for running u-boot on embedded POWER targets
- pretty large changes to MIPS too, completing the userspace
interface and improving the handling of virtualized timer hardware
- for x86, a larger set of changes is scheduled for 3.17. Still, we
have a few emulator bugfixes and support for running nested
fully-virtualized Xen guests (para-virtualized Xen guests have
always worked). And some optimizations too.
The only missing architecture here is ia64. It's not a coincidence
that support for KVM on ia64 is scheduled for removal in 3.17"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits)
KVM: add missing cleanup_srcu_struct
KVM: PPC: Book3S PR: Rework SLB switching code
KVM: PPC: Book3S PR: Use SLB entry 0
KVM: PPC: Book3S HV: Fix machine check delivery to guest
KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs
KVM: PPC: Book3S HV: Make sure we don't miss dirty pages
KVM: PPC: Book3S HV: Fix dirty map for hugepages
KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address
KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates()
KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number
KVM: PPC: Book3S: Add ONE_REG register names that were missed
KVM: PPC: Add CAP to indicate hcall fixes
KVM: PPC: MPIC: Reset IRQ source private members
KVM: PPC: Graciously fail broken LE hypercalls
PPC: ePAPR: Fix hypercall on LE guest
KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler
KVM: PPC: BOOK3S: Always use the saved DAR value
PPC: KVM: Make NX bit available with magic page
KVM: PPC: Disable NX for old magic page using guests
KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest
...
|
|
Conflicts:
include/net/inetpeer.h
net/ipv6/output_core.c
Changes in net were fixing bugs in code removed in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull scheduler updates from Ingo Molnar:
"The main scheduling related changes in this cycle were:
- various sched/numa updates, for better performance
- tree wide cleanup of open coded nice levels
- nohz fix related to rq->nr_running use
- cpuidle changes and continued consolidation to improve the
kernel/sched/idle.c high level idle scheduling logic. As part of
this effort I pulled cpuidle driver changes from Rafael as well.
- standardized idle polling amongst architectures
- continued work on preparing better power/energy aware scheduling
- sched/rt updates
- misc fixlets and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
sched/numa: Decay ->wakee_flips instead of zeroing
sched/numa: Update migrate_improves/degrades_locality()
sched/numa: Allow task switch if load imbalance improves
sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code
sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice()
sched: Initialize rq->age_stamp on processor start
sched, nohz: Change rq->nr_running to always use wrappers
sched: Fix the rq->next_balance logic in rebalance_domains() and idle_balance()
sched: Use clamp() and clamp_val() to make sys_nice() more readable
sched: Do not zero sg->cpumask and sg->sgp->power in build_sched_groups()
sched/numa: Fix initialization of sched_domain_topology for NUMA
sched: Call select_idle_sibling() when not affine_sd
sched: Simplify return logic in sched_read_attr()
sched: Simplify return logic in sched_copy_attr()
sched: Fix exec_start/task_hot on migrated tasks
arm64: Remove TIF_POLLING_NRFLAG
metag: Remove TIF_POLLING_NRFLAG
sched/idle: Make cpuidle_idle_call() void
sched/idle: Reflow cpuidle_idle_call()
sched/idle: Delay clearing the polling bit
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull core locking updates from Ingo Molnar:
"The main changes in this cycle were:
- reduced/streamlined smp_mb__*() interface that allows more usecases
and makes the existing ones less buggy, especially in rarer
architectures
- add rwsem implementation comments
- bump up lockdep limits"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
rwsem: Add comments to explain the meaning of the rwsem's count field
lockdep: Increase static allocations
arch: Mass conversion of smp_mb__*()
arch,doc: Convert smp_mb__*()
arch,xtensa: Convert smp_mb__*()
arch,x86: Convert smp_mb__*()
arch,tile: Convert smp_mb__*()
arch,sparc: Convert smp_mb__*()
arch,sh: Convert smp_mb__*()
arch,score: Convert smp_mb__*()
arch,s390: Convert smp_mb__*()
arch,powerpc: Convert smp_mb__*()
arch,parisc: Convert smp_mb__*()
arch,openrisc: Convert smp_mb__*()
arch,mn10300: Convert smp_mb__*()
arch,mips: Convert smp_mb__*()
arch,metag: Convert smp_mb__*()
arch,m68k: Convert smp_mb__*()
arch,m32r: Convert smp_mb__*()
arch,ia64: Convert smp_mb__*()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb into next
Pull USB driver updates from Greg KH:
"Here is the big USB driver pull request for 3.16-rc1.
Nothing huge here, but lots of little things in the USB core, and in
lots of drivers. Hopefully the USB power management will be work
better now that it has been reworked to do per-port power control
dynamically. There's also a raft of gadget driver updates and fixes,
CONFIG_USB_DEBUG is finally gone now that everything has been
converted over to the dynamic debug inteface, the last hold-out
drivers were cleaned up and the config option removed. There were
also other minor things all through the drivers/usb/ tree, the
shortlog shows this pretty well.
All have been in linux-next, including the very last patch, which came
from linux-next to fix a build issue on some platforms"
* tag 'usb-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (314 commits)
usb: hub_handle_remote_wakeup() only exists for CONFIG_PM=y
USB: orinoco_usb: remove CONFIG_USB_DEBUG support
USB: media: lirc: igorplugusb: remove CONFIG_USB_DEBUG support
USB: media: streamzap: remove CONFIG_USB_DEBUG
USB: media: redrat3: remove CONFIG_USB_DEBUG usage
USB: media: redrat3: remove unneeded tracing macro
usb: qcserial: add additional Sierra Wireless QMI devices
usb: host: max3421-hcd: Use module_spi_driver
usb: host: max3421-hcd: Allow platform-data to specify Vbus polarity
usb: host: max3421-hcd: fix "spi_rd8" uses dynamic stack allocation warning
usb: host: max3421-hcd: Fix missing unlock in max3421_urb_enqueue()
usb: qcserial: add Netgear AirCard 341U
Documentation: dt-bindings: update xhci-platform DT binding for R-Car H2 and M2
usb: host: xhci-plat: add xhci_plat_start()
usb: host: max3421-hcd: Fix potential NULL urb dereference
Revert "usb: gadget: net2280: Add support for PLX USB338X"
USB: usbip: remove CONFIG_USB_DEBUG reference
USB: remove CONFIG_USB_DEBUG from defconfig files
usb: resume child device when port is powered on
usb: hub_handle_remote_wakeup() depends on CONFIG_PM_RUNTIME=y
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into next
Pull PCI changes from Bjorn Helgaas:
"Enumeration
- Notify driver before and after device reset (Keith Busch)
- Use reset notification in NVMe (Keith Busch)
NUMA
- Warn if we have to guess host bridge node information (Myron Stowe)
- Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee
Suthikulpanit)
- Clean up and mark early_root_info_init() as deprecated (Suravee
Suthikulpanit)
Driver binding
- Add "driver_override" for force specific binding (Alex Williamson)
- Fail "new_id" addition for devices we already know about (Bandan
Das)
Resource management
- Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox)
- Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas)
- Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
- Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas)
- Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas)
- Don't convert BAR address to resource if dma_addr_t is too small
(Bjorn Helgaas)
- Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas)
- Don't print anything while decoding is disabled (Bjorn Helgaas)
- Don't add disabled subtractive decode bus resources (Bjorn Helgaas)
- Add resource allocation comments (Bjorn Helgaas)
- Restrict 64-bit prefetchable bridge windows to 64-bit resources
(Yinghai Lu)
- Assign i82875p_edac PCI resources before adding device (Yinghai Lu)
PCI device hotplug
- Remove unnecessary "dev->bus" test (Bjorn Helgaas)
- Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas)
- Fix rphahp endianess issues (Laurent Dufour)
- Acknowledge spurious "cmd completed" event (Rajat Jain)
- Allow hotplug service drivers to operate in polling mode (Rajat Jain)
- Fix cpqphp possible NULL dereference (Rickard Strandqvist)
MSI
- Replace pci_enable_msi_block() by pci_enable_msi_exact()
(Alexander Gordeev)
- Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev)
- Simplify populate_msi_sysfs() (Jan Beulich)
Virtualization
- Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson)
- Mark RTL8110SC INTx masking as broken (Alex Williamson)
Generic host bridge driver
- Add generic PCI host controller driver (Will Deacon)
Freescale i.MX6
- Use new clock names (Lucas Stach)
- Drop old IRQ mapping (Lucas Stach)
- Remove optional (and unused) IRQs (Lucas Stach)
- Add support for MSI (Lucas Stach)
- Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat)
Renesas R-Car
- Add gen2 device tree support (Ben Dooks)
- Use new OF interrupt mapping when possible (Lucas Stach)
- Add PCIe driver (Phil Edworthy)
- Add PCIe MSI support (Phil Edworthy)
- Add PCIe device tree bindings (Phil Edworthy)
Samsung Exynos
- Remove unnecessary OOM messages (Jingoo Han)
- Fix add_pcie_port() section mismatch warning (Sachin Kamat)
Synopsys DesignWare
- Make MSI ISR shared IRQ aware (Lucas Stach)
Miscellaneous
- Check for broken config space aliasing (Alex Williamson)
- Update email address (Ben Hutchings)
- Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas)
- Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas)
- Remove unnecessary __ref annotations (Bjorn Helgaas)
- Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns
(Bjorn Helgaas)
- Fix use of uninitialized MPS value (Bjorn Helgaas)
- Tidy x86/gart messages (Bjorn Helgaas)
- Fix return value from pci_user_{read,write}_config_*() (Gavin Shan)
- Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo)
- Remove unused serial device IDs (Jean Delvare)
- Use designated initialization in PCI_VDEVICE (Mark Rustad)
- Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu)
- Configure MPS on ARM (Murali Karicheri)
- Remove unnecessary includes of <linux/init.h> (Paul Gortmaker)
- Move Open Firmware devspec attribute to PCI common code (Sebastian Ott)
- Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott)
- Remove pcibios_add_platform_entries() (Sebastian Ott)
- Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch)
- Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang)
- Add and use new pci_is_bridge() interface (Yijing Wang)
- Make pci_bus_add_device() void (Yijing Wang)
DMA API
- Clarify physical/bus address distinction in docs (Bjorn Helgaas)
- Fix typos in docs (Emilio López)
- Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim)
- Change dma_declare_coherent_memory() CPU address to phys_addr_t
(Bjorn Helgaas)
- Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory()
(Bjorn Helgaas)"
* tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (92 commits)
MAINTAINERS: Add generic PCI host controller driver
PCI: generic: Add generic PCI host controller driver
PCI: imx6: Add support for MSI
PCI: designware: Make MSI ISR shared IRQ aware
PCI: imx6: Remove optional (and unused) IRQs
PCI: imx6: Drop old IRQ mapping
PCI: imx6: Use new clock names
i82875p_edac: Assign PCI resources before adding device
ARM/PCI: Call pcie_bus_configure_settings() to set MPS
PCI: imx6: Fix imx6_add_pcie_port() section mismatch warning
PCI: Make pci_bus_add_device() void
PCI: exynos: Fix add_pcie_port() section mismatch warning
PCI: Introduce new device binding path using pci_dev.driver_override
PCI: rcar: Add gen2 device tree support
PCI: cpqphp: Fix possible null pointer dereference
PCI: rcar: Add R-Car PCIe device tree bindings
PCI: rcar: Add MSI support for PCIe
PCI: rcar: Add Renesas R-Car PCIe driver
PCI: Fix return value from pci_user_{read,write}_config_*()
PCI: exynos: Remove unnecessary OOM messages
...
|
|
This patch finally allows us to get rid of the BPF_S_* enum.
Currently, the code performs unnecessary encode and decode
workarounds in seccomp and filter migration itself when a filter
is being attached in order to overcome BPF_S_* encoding which
is not used anymore by the new interpreter resp. JIT compilers.
Keeping it around would mean that also in future we would need
to extend and maintain this enum and related encoders/decoders.
We can get rid of all that and save us these operations during
filter attaching. Naturally, also JIT compilers need to be updated
by this.
Before JIT conversion is being done, each compiler checks if A
is being loaded at startup to obtain information if it needs to
emit instructions to clear A first. Since BPF extensions are a
subset of BPF_LD | BPF_{W,H,B} | BPF_ABS variants, case statements
for extensions can be removed at that point. To ease and minimalize
code changes in the classic JITs, we have introduced bpf_anc_helper().
Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int),
arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we
unfortunately didn't have access, but changes are analogous to
the rest.
Joint work with Alexei Starovoitov.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Chema Gonzalez <chemag@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fix from Ben Herrenschmidt:
"Here's just one trivial patch to wire up sys_renameat2 which I seem to
have completely missed so far.
(My test build scripts fwd me warnings but miss the ones generated for
missing syscalls)"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Wire renameat2() syscall
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
On LPAR guest systems Linux enables the shadow SLB to indicate to the
hypervisor a number of SLB entries that always have to be available.
Today we go through this shadow SLB and disable all ESID's valid bits.
However, pHyp doesn't like this approach very much and honors us with
fancy machine checks.
Fortunately the shadow SLB descriptor also has an entry that indicates
the number of valid entries following. During the lifetime of a guest
we can just swap that value to 0 and don't have to worry about the
SLB restoration magic.
While we're touching the code, let's also make it more readable (get
rid of rldicl), allow it to deal with a dynamic number of bolted
SLB entries and only do shadow SLB swizzling on LPAR systems.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We didn't make use of SLB entry 0 because ... of no good reason. SLB entry 0
will always be used by the Linux linear SLB entry, so the fact that slbia
does not invalidate it doesn't matter as we overwrite SLB 0 on exit anyway.
Just enable use of SLB entry 0 for our shadow SLB code.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
The code that delivered a machine check to the guest after handling
it in real mode failed to load up r11 before calling kvmppc_msr_interrupt,
which needs the old MSR value in r11 so it can see the transactional
state there. This adds the missing load.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
This adds workarounds for two hardware bugs in the POWER8 performance
monitor unit (PMU), both related to interrupt generation. The effect
of these bugs is that PMU interrupts can get lost, leading to tools
such as perf reporting fewer counts and samples than they should.
The first bug relates to the PMAO (perf. mon. alert occurred) bit in
MMCR0; setting it should cause an interrupt, but doesn't. The other
bug relates to the PMAE (perf. mon. alert enable) bit in MMCR0.
Setting PMAE when a counter is negative and counter negative
conditions are enabled to cause alerts should cause an alert, but
doesn't.
The workaround for the first bug is to create conditions where a
counter will overflow, whenever we are about to restore a MMCR0
value that has PMAO set (and PMAO_SYNC clear). The workaround for
the second bug is to freeze all counters using MMCR2 before reading
MMCR0.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Current, when testing whether a page is dirty (when constructing the
bitmap for the KVM_GET_DIRTY_LOG ioctl), we test the C (changed) bit
in the HPT entries mapping the page, and if it is 0, we consider the
page to be clean. However, the Power ISA doesn't require processors
to set the C bit to 1 immediately when writing to a page, and in fact
allows them to delay the writeback of the C bit until they receive a
TLB invalidation for the page. Thus it is possible that the page
could be dirty and we miss it.
Now, if there are vcpus running, this is not serious since the
collection of the dirty log is racy already - some vcpu could dirty
the page just after we check it. But if there are no vcpus running we
should return definitive results, in case we are in the final phase of
migrating the guest.
Also, if the permission bits in the HPTE don't allow writing, then we
know that no CPU can set C. If the HPTE was previously writable and
the page was modified, any C bit writeback would have been flushed out
by the tlbie that we did when changing the HPTE to read-only.
Otherwise we need to do a TLB invalidation even if the C bit is 0, and
then check the C bit.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
The dirty map that we construct for the KVM_GET_DIRTY_LOG ioctl has
one bit per system page (4K/64K). Currently, we only set one bit in
the map for each HPT entry with the Change bit set, even if the HPT is
for a large page (e.g., 16MB). Userspace then considers only the
first system page dirty, though in fact the guest may have modified
anywhere in the large page.
To fix this, we make kvm_test_clear_dirty() return the actual number
of pages that are dirty (and rename it to kvm_test_clear_dirty_npages()
to emphasize that that's what it returns). In kvmppc_hv_get_dirty_log()
we then set that many bits in the dirty map.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Currently, when a huge page is faulted in for a guest, we select the
rmap chain to insert the HPTE into based on the guest physical address
that the guest tried to access. Since there is an rmap chain for each
system page, there are many rmap chains for the area covered by a huge
page (e.g. 256 for 16MB pages when PAGE_SIZE = 64kB), and the huge-page
HPTE could end up in any one of them.
For consistency, and to make the huge-page HPTEs easier to find, we now
put huge-page HPTEs in the rmap chain corresponding to the base address
of the huge page.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
The global_invalidates() function contains a check that is intended
to tell whether we are currently executing in the context of a hypercall
issued by the guest. The reason is that the optimization of using a
local TLB invalidate instruction is only valid in that context. The
check was testing local_paca->kvm_hstate.kvm_vcore, which gets set
when entering the guest but no longer gets cleared when exiting the
guest. To fix this, we use the kvm_vcpu field instead, which does
get cleared when exiting the guest, by the kvmppc_release_hwthread()
calls inside kvmppc_run_core().
The effect of having the check wrong was that when kvmppc_do_h_remove()
got called from htab_write() on the destination machine during a
migration, it cleared the current cpu's bit in kvm->arch.need_tlb_flush.
This meant that when the guest started running in the destination VM,
it may miss out on doing a complete TLB flush, and therefore may end
up using stale TLB entries from a previous guest that used the same
LPID value.
This should make migration more reliable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Commit b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8
SPRs") added a definition of KVM_REG_PPC_WORT with the same register
number as the existing KVM_REG_PPC_VRSAVE (though in fact the
definitions are not identical because of the different register sizes.)
For clarity, this moves KVM_REG_PPC_WORT to the next unused number,
and also adds it to api.txt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We worked around some nasty KVM magic page hcall breakages:
1) NX bit not honored, so ignore NX when we detect it
2) LE guests swizzle hypercall instruction
Without these fixes in place, there's no way it would make sense to expose kvm
hypercalls to a guest. Chances are immensely high it would trip over and break.
So add a new CAP that gives user space a hint that we have workarounds for the
bugs above in place. It can use those as hint to disable PV hypercalls when
the guest CPU is anything POWER7 or higher and the host does not have fixes
in place.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
When we reset the in-kernel MPIC controller, we forget to reset some hidden
state such as destmask and output. This state is usually set when the guest
writes to the IDR register for a specific IRQ line.
To make sure we stay in sync and don't forget hidden state, treat reset of
the IDR register as a simple write of the IDR register. That automatically
updates all the hidden state as well.
Reported-by: Paul Janzen <pcj@pauljanzen.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
There are LE Linux guests out there that don't handle hypercalls correctly.
Instead of interpreting the instruction stream from device tree as big endian
they assume it's a little endian instruction stream and fail.
When we see an illegal instruction from such a byte reversed instruction stream,
bail out graciously and just declare every hcall as error.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
We get an array of instructions from the hypervisor via device tree that
we write into a buffer that gets executed whenever we want to make an
ePAPR compliant hypercall.
However, the hypervisor passes us these instructions in BE order which
we have to manually convert to LE when we want to run them in LE mode.
With this fixup in place, I can successfully run LE kernels with KVM
PV enabled on PR KVM.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Use make_dsisr instead of open coding it. This also have
the added benefit of handling alignment interrupt on additional
instructions.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Although it's optional, IBM POWER cpus always had DAR value set on
alignment interrupt. So don't try to compute these values.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Because old kernels enable the magic page and then choke on NXed trampoline
code we have to disable NX by default in KVM when we use the magic page.
However, since commit b18db0b8 we have successfully fixed that and can now
leave NX enabled, so tell the hypervisor about this.
Signed-off-by: Alexander Graf <agraf@suse.de>
|