asmadeus/linux.git - The linux kernel

Age	Commit message (Collapse)	Author
2015-02-10	mm: drop support of non-linear mapping from fault codepath	Kirill A. Shutemov
	We don't create non-linear mappings anymore. Let's drop code which handles them on page fault. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10	mm: drop support of non-linear mapping from unmap/zap codepath	Kirill A. Shutemov
	We have remap_file_pages(2) emulation in -mm tree for few release cycles and we plan to have it mainline in v3.20. This patchset removes rest of VM_NONLINEAR infrastructure. Patches 1-8 take care about generic code. They are pretty straight-forward and can be applied without other of patches. Rest patches removes pte_file()-related stuff from architecture-specific code. It usually frees up one bit in non-present pte. I've tried to reuse that bit for swap offset, where I was able to figure out how to do that. For obvious reason I cannot test all that arch-specific code and would like to see acks from maintainers. In total, remap_file_pages(2) required about 1.4K lines of not-so-trivial kernel code. That's too much for functionality nobody uses. Tested-by: Felipe Balbi <balbi@ti.com> This patch (of 38): We don't create non-linear mappings anymore. Let's drop code which handles them on unmap/zap. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10	mm: replace remap_file_pages() syscall with emulation	Kirill A. Shutemov
	remap_file_pages(2) was invented to be able efficiently map parts of huge file into limited 32-bit virtual address space such as in database workloads. Nonlinear mappings are pain to support and it seems there's no legitimate use-cases nowadays since 64-bit systems are widely available. Let's drop it and get rid of all these special-cased code. The patch replaces the syscall with emulation which creates new VMA on each remap_file_pages(), unless they it can be merged with an adjacent one. I didn't find any real code that uses remap_file_pages(2) to test emulation impact on. I've checked Debian code search and source of all packages in ALT Linux. No real users: libc wrappers, mentions in strace, gdb, valgrind and this kind of stuff. There are few basic tests in LTP for the syscall. They work just fine with emulation. To test performance impact, I've written small test case which demonstrate pretty much worst case scenario: map 4G shmfs file, write to begin of every page pgoff of the page, remap pages in reverse order, read every page. The test creates 1 million of VMAs if emulation is in use, so I had to set vm.max_map_count to 1100000 to avoid -ENOMEM. Before: 23.3 ( +- 4.31% ) seconds After: 43.9 ( +- 0.85% ) seconds Slowdown: 1.88x I believe we can live with that. Test case: #define _GNU_SOURCE #include <assert.h> #include <stdlib.h> #include <stdio.h> #include <sys/mman.h> #define MB (1024UL * 1024) #define SIZE (4096 * MB) int main(int argc, char *argv) { unsigned long p; long i, pass; for (pass = 0; pass < 10; pass++) { p = mmap(NULL, SIZE, PROT_READ\|PROT_WRITE, MAP_SHARED \| MAP_ANONYMOUS, -1, 0); if (p == MAP_FAILED) { perror("mmap"); return -1; } for (i = 0; i < SIZE / 4096; i++) p[i * 4096 / sizeof(p)] = i; for (i = 0; i < SIZE / 4096; i++) { if (remap_file_pages(p + i 4096 / sizeof(p), 4096, 0, (SIZE - 4096 (i + 1)) >> 12, 0)) { perror("remap_file_pages"); return -1; } } for (i = SIZE / 4096 - 1; i >= 0; i--) assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1); munmap(p, SIZE); } return 0; } [akpm@linux-foundation.org: fix spello] [sasha.levin@oracle.com: initialize populate before usage] [sasha.levin@oracle.com: grab file ref to prevent race while mmaping] Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Armin Rigo <arigo@tunes.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10	mm: don't use compound_head() in virt_to_head_page()	Joonsoo Kim
	compound_head() is implemented with assumption that there would be race condition when checking tail flag. This assumption is only true when we try to access arbitrary positioned struct page. The situation that virt_to_head_page() is called is different case. We call virt_to_head_page() only in the range of allocated pages, so there is no race condition on tail flag. In this case, we don't need to handle race condition and we can reduce overhead slightly. This patch implements compound_head_fast() which is similar with compound_head() except tail flag race handling. And then, virt_to_head_page() uses this optimized function to improve performance. I saw 1.8% win in a fast-path loop over kmem_cache_alloc/free, (14.063 ns -> 13.810 ns) if target object is on tail page. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10	fsnotify: fix handling of renames in audit	Jan Kara
	Commit e9fd702a58c4 ("audit: convert audit watches to use fsnotify instead of inotify") broke handling of renames in audit. Audit code wants to update inode number of an inode corresponding to watched name in a directory. When something gets renamed into a directory to a watched name, inotify previously passed moved inode to audit code however new fsnotify code passes directory inode where the change happened. That confuses audit and it starts watching parent directory instead of a file in a directory. This can be observed for example by doing: cd /tmp touch foo bar auditctl -w /tmp/foo touch foo mv bar foo touch foo In audit log we see events like: type=CONFIG_CHANGE msg=audit(1423563584.155:90): auid=1000 ses=2 op="updated rules" path="/tmp/foo" key=(null) list=4 res=1 ... type=PATH msg=audit(1423563584.155:91): item=2 name="bar" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE type=PATH msg=audit(1423563584.155:91): item=3 name="foo" inode=1046842 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE type=PATH msg=audit(1423563584.155:91): item=4 name="foo" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=CREATE ... and that's it - we see event for the first touch after creating the audit rule, we see events for rename but we don't see any event for the last touch. However we start seeing events for unrelated stuff happening in /tmp. Fix the problem by passing moved inode as data in the FS_MOVED_FROM and FS_MOVED_TO events instead of the directory where the change happens. This doesn't introduce any new problems because noone besides audit_watch.c cares about the passed value: fs/notify/fanotify/fanotify.c cares only about FSNOTIFY_EVENT_PATH events. fs/notify/dnotify/dnotify.c doesn't care about passed 'data' value at all. fs/notify/inotify/inotify_fsnotify.c uses 'data' only for FSNOTIFY_EVENT_PATH. kernel/audit_tree.c doesn't care about passed 'data' at all. kernel/audit_watch.c expects moved inode as 'data'. Fixes: e9fd702a58c49db ("audit: convert audit watches to use fsnotify instead of inotify") Signed-off-by: Jan Kara <jack@suse.cz> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Paris <eparis@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10	Merge tag 'stable/for-linus-3.20-rc0-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen features and fixes from David Vrabel: - Reworked handling for foreign (grant mapped) pages to simplify the code, enable a number of additional use cases and fix a number of long-standing bugs. - Prefer the TSC over the Xen PV clock when dom0 (and the TSC is stable). - Assorted other cleanup and minor bug fixes. * tag 'stable/for-linus-3.20-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (25 commits) xen/manage: Fix USB interaction issues when resuming xenbus: Add proper handling of XS_ERROR from Xenbus for transactions. xen/gntdev: provide find_special_page VMA operation xen/gntdev: mark userspace PTEs as special on x86 PV guests xen-blkback: safely unmap grants in case they are still in use xen/gntdev: safely unmap grants in case they are still in use xen/gntdev: convert priv->lock to a mutex xen/grant-table: add a mechanism to safely unmap pages that are in use xen-netback: use foreign page information from the pages themselves xen: mark grant mapped pages as foreign xen/grant-table: add helpers for allocating pages x86/xen: require ballooned pages for grant maps xen: remove scratch frames for ballooned pages and m2p override xen/grant-table: pre-populate kernel unmap ops for xen_gnttab_unmap_refs() mm: add 'foreign' alias for the 'pinned' page flag mm: provide a find_special_page vma operation x86/xen: cleanup arch/x86/xen/mmu.c x86/xen: add some __init annotations in arch/x86/xen/mmu.c x86/xen: add some __init and static annotations in arch/x86/xen/setup.c x86/xen: use correct types for addresses in arch/x86/xen/setup.c ...
2015-02-10	blk-mq: make blk_mq_run_queues() static	Jens Axboe
	We no longer use it outside of blk-mq.c, so we can make it static and stop exporting it. Additionally, kill the 'async' argument, as there's only one used of it. Signed-off-by: Jens Axboe <axboe@fb.com>
2015-02-10	vfio: Add and use device request op for vfio bus drivers	Alex Williamson
	When a request is made to unbind a device from a vfio bus driver, we need to wait for the device to become unused, ie. for userspace to release the device. However, we have a long standing TODO in the code to do something proactive to make that happen. To enable this, we add a request callback on the vfio bus driver struct, which is intended to signal the user through the vfio device interface to release the device. Instead of passively waiting for the device to become unused, we can now pester the user to give it up. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-02-10	Merge branch 'pm-cpufreq'	Rafael J. Wysocki
	* pm-cpufreq: (46 commits) intel_pstate: provide option to only use intel_pstate with HWP cpufreq-dt: Drop unnecessary check before cpufreq_cooling_unregister() invocation cpufreq: Create for_each_governor() cpufreq: Create for_each_policy() cpufreq: Drop cpufreq_disabled() check from cpufreq_cpu_{get\|put}() cpufreq: Set cpufreq_cpu_data to NULL before putting kobject intel_pstate: honor user space min_perf_pct override on resume intel_pstate: respect cpufreq policy request intel_pstate: Add num_pstates to sysfs intel_pstate: expose turbo range to sysfs intel_pstate: Add support for SkyLake cpufreq: stats: drop unnecessary locking cpufreq: stats: don't update stats on false notifiers cpufreq: stats: don't update stats from show_trans_table() cpufreq: stats: time_in_state can't be NULL in cpufreq_stats_update() cpufreq: stats: create sysfs group once we are ready cpufreq: remove CPUFREQ_UPDATE_POLICY_CPU notifications cpufreq: stats: drop 'cpu' field of struct cpufreq_stats cpufreq: Remove (now) unused 'last_cpu' from struct cpufreq_policy cpufreq: stats: rename 'struct cpufreq_stats' objects as 'stats' ...
2015-02-10	Merge branch 'pm-domains'	Rafael J. Wysocki
	* pm-domains: PM: Convert dev_pm_put_subsys_data() into a void function PM: Update function header for dev_pm_get_subsys_data() PM / Domains: Handle errors from genpd's ->attach_dev() callback PM / Domains: Re-order initialization of generic_pm_domain_data PM / Domains: Free pm_subsys_data in error path in __pm_genpd_add_device() PM / Domains: Eliminate the mutex for the generic_pm_domain_data PM / Domains: Don't check for an existing device when adding a new PM / Domains: Don't allow an existing generic_pm_domain_data PM / Domains: Remove reference counting for the generic_pm_domain_data PM / Domains: Rename __pm_genpd_alloc\|free_dev_data() PM / Domains: Remove pm_genpd_dev_need_restore() API
2015-02-10	Merge branches 'pm-qos', 'pm-opp' and 'pm-devfreq'	Rafael J. Wysocki
	* pm-qos: PM / QoS: Use lockdep asserts to find missing hold of power.lock PM / QoS: Add debugfs support to view the list of constraints * pm-opp: PM / OPP: Assert RCU lock in exported functions PM / OPP: Update kernel documentation PM / OPP: Ensure consistent naming of static functions PM / OPP: export dev_pm_opp_get_notifier * pm-devfreq: PM / devfreq: event: Add documentation for exynos-ppmu devfreq-event driver devfreq: Fix build break of devfreq-event class PM / devfreq: event: Add devfreq_event class PM / devfreq: tegra: add devfreq driver for Tegra Activity Monitor
2015-02-10	Merge branch 'acpi-resources'	Rafael J. Wysocki
	* acpi-resources: (23 commits) Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources x86/irq, ACPI: Implement ACPI driver to support IOAPIC hotplug ACPI: Add interfaces to parse IOAPIC ID for IOAPIC hotplug x86/PCI: Refine the way to release PCI IRQ resources x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation x86/PCI: Fix the range check for IO resources PCI: Use common resource list management code instead of private implementation resources: Move struct resource_list_entry from ACPI into resource core ACPI: Introduce helper function acpi_dev_filter_resource_type() ACPI: Add field offset to struct resource_list_entry ACPI: Translate resource into master side address for bridge window resources ACPI: Return translation offset when parsing ACPI address space resources ACPI: Enforce stricter checks for address space descriptors ACPI: Set flag IORESOURCE_UNSET for unassigned resources ACPI: Normalize return value of resource parser functions ACPI: Fix a bug in parsing ACPI Memory24 resource ACPI: Add prefetch decoding to the address space parser ACPI: Move the window flag logic to the combined parser ACPI: Unify the parsing of address_space and ext_address_space ACPI: Let the parser return false for disabled resources ...
2015-02-10	Merge branch 'devel-stable' into for-next	Russell King

2015-02-10	ARM: 8256/1: driver coamba: add device binding path 'driver_override'	Antonios Motakis
	As already demonstrated with PCI [1] and the platform bus [2], a driver_override property in sysfs can be used to bypass the id matching of a device to a AMBA driver. This can be used by VFIO to bind to any AMBA device requested by the user. [1] http://lists-archives.com/linux-kernel/28030441-pci-introduce-new-device-binding-path-using-pci_dev-driver_override.html [2] https://www.redhat.com/archives/libvir-list/2014-April/msg00382.html Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Reviewed-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-02-09	Merge branch 'for-3.20' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata changes from Tejun Heo: "Mostly driver-specific changes. Nothing too noteworthy. This pull request contains three merges from for-3.19-fixes. The first two are to pull ahci_xgene and sata_dwc_460ex fix commits which are depended upon by later changes. The last one is to pull in a fix patch which missed the v3.19-rc window" * 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (24 commits) ahci_xgene: Fix the dma state machine lockup for the ATA_CMD_SMART PIO mode command. ata: libahci: Use of_platform_device_create only if supported sata_mv: Delete unnecessary checks before the function call "phy_power_off" ata: Delete unnecessary checks before the function call "pci_dev_put" ata: pata_platform: fix owner module reference mismatch for scsi host ata: ahci_platform: fix owner module reference mismatch for scsi host pata_pdc2027x: Use 64-bit timekeeping ata: libahci: Fix devres cleanup on failure ata: libahci: Allow using multiple regulators Documentation: bindings: Add the regulator property to the sub-nodes AHCI bindings ata: libahci: Clean-up the ahci_platform_en/disable_phys functions sata_rcar: extend PM methods sata_dwc_460ex: disable compilation on ARM and ARM64 ata: libata-core: Remove unused function sata_dwc_460ex: convert to devm_kzalloc in ->probe() sata_dwc_460ex: remove extra message sata_dwc_460ex: use np local variable in ->probe() sata_dwc_460ex: fix most of the sparse warnings sata_dwc_460ex: enable COMPILE_TEST for the driver sata_dwc_460ex: remove redundant dev_set_drvdata ...
2015-02-09	Merge branch 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq	Linus Torvalds
	Pull workqueue changes from Tejun Heo: "One cosmetic cleanup patch" * 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue.h: remove loops of single statement macros
2015-02-09	Merge branch 'for-3.20' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup changes from Tejun Heo: "Three mostly trivial patches. The biggest change is that blkio is now initialized before memcg which will be needed to make memcg and blkcg cooperate on writeback IOs" * 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: add dummy css_put() for !CONFIG_CGROUPS cgroup: reorder SUBSYS(blkio) in cgroup_subsys.h Update of Documentation/cgroups/00-INDEX
2015-02-09	Merge branch 'for-3.20' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu changes from Tejun Heo: "Nothing too interesting. One cleanup patch and another to add a trivial state check function" * 'for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu_ref: implement percpu_ref_is_dying() percpu_ref: remove unnecessary ACCESS_ONCE() in percpu_ref_tryget_live()
2015-02-09	Merge branch 'x86-efi-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "Main changes: - Move efivarfs from the misc filesystem section to pseudo filesystem - Expose firmware platform size in sysfs - Improve robustness of get_memory_map() by removing assumptions on the size of efi_memory_desc_t. - various cleanups and fixes The biggest risk is the get_memory_map() change, which changes the way that both the arm64 and x86 EFI boot stub build the early memory map. There are no known regressions with it at the moment, BYMMV" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: Don't look for chosen@0 node on DT platforms firmware: efi: Remove unneeded guid unparse efi/libstub: Call get_memory_map() to obtain map and desc sizes efi: Small leak on error in runtime map code efi: rtc-efi: Mark UIE as unsupported arm64/efi: efistub: Apply __init annotation efi: Expose underlying UEFI firmware platform size to userland efi: Rename efi_guid_unparse to efi_guid_to_str efi: Update the URLs for efibootmgr fs: Make efivarfs a pseudo filesystem, built by default with EFI
2015-02-09	Merge branch 'x86-apic-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 APIC updates from Ingo Molnar: "Continued fallout of the conversion of the x86 IRQ code to the hierarchical irqdomain framework: more cleanups, simplifications, memory allocation behavior enhancements, mainly in the interrupt remapping and APIC code" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) x86, init: Fix UP boot regression on x86_64 iommu/amd: Fix irq remapping detection logic x86/acpi: Make acpi_[un]register_gsi_ioapic() depend on CONFIG_X86_LOCAL_APIC x86: Consolidate boot cpu timer setup x86/apic: Reuse apic_bsp_setup() for UP APIC setup x86/smpboot: Sanitize uniprocessor init x86/smpboot: Move apic init code to apic.c init: Get rid of x86isms x86/apic: Move apic_init_uniprocessor code x86/smpboot: Cleanup ioapic handling x86/apic: Sanitize ioapic handling x86/ioapic: Add proper checks to setp/enable_IO_APIC() x86/ioapic: Provide stub functions for IOAPIC%3Dn x86/smpboot: Move smpboot inlines to code x86/x2apic: Use state information for disable x86/x2apic: Split enable and setup function x86/x2apic: Disable x2apic from nox2apic setup x86/x2apic: Add proper state tracking x86/x2apic: Clarify remapping mode for x2apic enablement x86/x2apic: Move code in conditional region ...
2015-02-09	Merge branch 'timers-core-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Ingo Molnar: "The main changes in this cycle were: - rework hrtimer expiry calculation in hrtimer_interrupt(): the previous code had a subtle bug where expiry caching would miss an expiry, resulting in occasional bogus (late) expiry of hrtimers. - continuing Y2038 fixes - ktime division optimization - misc smaller fixes and cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Make __hrtimer_get_next_event() static rtc: Convert rtc_set_ntp_time() to use timespec64 rtc: Remove redundant rtc_valid_tm() from rtc_hctosys() rtc: Modify rtc_hctosys() to address y2038 issues rtc: Update rtc-dev to use y2038-safe time interfaces rtc: Update interface.c to use y2038-safe time interfaces time: Expose get_monotonic_boottime64 for in-kernel use time: Expose getboottime64 for in-kernel uses ktime: Optimize ktime_divns for constant divisors hrtimer: Prevent stale expiry time in hrtimer_interrupt() ktime.h: Introduce ktime_ms_delta
2015-02-09	Merge branch 'sched-core-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main scheduler changes in this cycle were: - various sched/deadline fixes and enhancements - rescheduling latency fixes/cleanups - rework the rq->clock code to be more consistent and more robust. - minor micro-optimizations - ->avg.decay_count fixes - add a stack overflow check to might_sleep() - idle-poll handler fix, possibly resulting in power savings - misc smaller updates and fixes" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/Documentation: Remove unneeded word sched/wait: Introduce wait_on_bit_timeout() sched: Pull resched loop to __schedule() callers sched/deadline: Remove cpu_active_mask from cpudl_find() sched: Fix hrtick_start() on UP sched/deadline: Avoid pointless __setscheduler() sched/deadline: Fix stale yield state sched/deadline: Fix hrtick for a non-leftmost task sched/deadline: Modify cpudl::free_cpus to reflect rd->online sched/idle: Add missing checks to the exit condition of cpu_idle_poll() sched: Fix missing preemption opportunity sched/rt: Reduce rq lock contention by eliminating locking of non-feasible target sched/debug: Print rq->clock_task sched/core: Rework rq->clock update skips sched/core: Validate rq_clock*() serialization sched/core: Remove check of p->sched_class sched/fair: Fix sched_entity::avg::decay_count initialization sched/debug: Fix potential call to __ffs(0) in sched_show_task() sched/debug: Check for stack overflow in ___might_sleep() sched/fair: Fix the dealing with decay_count in __synchronize_entity_decay()
2015-02-09	Merge branch 'perf-core-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - AMD range breakpoints support: Extend breakpoint tools and core to support address range through perf event with initial backend support for AMD extended breakpoints. The syntax is: perf record -e mem:addr/len:type For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512) perf record -e mem:0x1000/512:w - event throttling/rotating fixes - various event group handling fixes, cleanups and general paranoia code to be more robust against bugs in the future. - kernel stack overhead fixes User-visible tooling side changes: - Show precise number of samples in at the end of a 'record' session, if processing build ids, since we will then traverse the whole perf.data file and see all the PERF_RECORD_SAMPLE records, otherwise stop showing the previous off-base heuristicly counted number of "samples" (Namhyung Kim). - Support to read compressed module from build-id cache (Namhyung Kim) - Enable sampling loads and stores simultaneously in 'perf mem' (Stephane Eranian) - 'perf diff' output improvements (Namhyung Kim) - Fix error reporting for evsel pgfault constructor (Arnaldo Carvalho de Melo) Tooling side infrastructure changes: - Cache eh/debug frame offset for dwarf unwind (Namhyung Kim) - Support parsing parameterized events (Cody P Schafer) - Add support for IP address formats in libtraceevent (David Ahern) Plus other misc fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) perf: Decouple unthrottling and rotating perf: Drop module reference on event init failure perf: Use POLLIN instead of POLL_IN for perf poll data in flag perf: Fix put_event() ctx lock perf: Fix move_group() order perf: Fix event->ctx locking perf: Add a bit of paranoia perf symbols: Convert lseek + read to pread perf tools: Use perf_data_file__fd() consistently perf symbols: Support to read compressed module from build-id cache perf evsel: Set attr.task bit for a tracking event perf header: Set header version correctly perf record: Show precise number of samples perf tools: Do not use __perf_session__process_events() directly perf callchain: Cache eh/debug frame offset for dwarf unwind perf tools: Provide stub for missing pthread_attr_setaffinity_np perf evsel: Don't rely on malloc working for sz 0 tools lib traceevent: Add support for IP address formats perf ui/tui: Show fatal error message only if exists perf tests: Fix typo in sample-parsing.c ...
2015-02-09	Merge branch 'locking-core-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Ingo Molnar: "The main changes are: - mutex, completions and rtmutex micro-optimizations - lock debugging fix - various cleanups in the MCS and the futex code" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Optimize setting task running after being blocked locking/rwsem: Use task->state helpers sched/completion: Add lock-free checking of the blocking case sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state locking/mutex: Explicitly mark task as running after wakeup futex: Fix argument handling in futex_lock_pi() calls doc: Fix misnamed FUTEX_CMP_REQUEUE_PI op constants locking/Documentation: Update code path softirq/preempt: Add missing current->preempt_disable_ip update locking/osq: No need for load/acquire when acquire-polling locking/mcs: Better differentiate between MCS variants locking/mutex: Introduce ww_mutex_set_context_slowpath() locking/mutex: Move MCS related comments to proper location locking/mutex: Checking the stamp is WW only
2015-02-09	Merge branch 'core-rcu-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main RCU changes in this cycle are: - Documentation updates. - Miscellaneous fixes. - Preemptible-RCU fixes, including fixing an old bug in the interaction of RCU priority boosting and CPU hotplug. - SRCU updates. - RCU CPU stall-warning updates. - RCU torture-test updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) rcu: Initialize tiny RCU stall-warning timeouts at boot rcu: Fix RCU CPU stall detection in tiny implementation rcu: Add GP-kthread-starvation checks to CPU stall warnings rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors rcu: Optionally run grace-period kthreads at real-time priority ksoftirqd: Use new cond_resched_rcu_qs() function ksoftirqd: Enable IRQs and call cond_resched() before poking RCU rcutorture: Add more diagnostics in rcu_barrier() test failure case torture: Flag console.log file to prevent holdovers from earlier runs torture: Add "-enable-kvm -soundhw pcspk" to qemu command line rcutorture: Handle different mpstat versions rcutorture: Check from beginning to end of grace period rcu: Remove redundant rcu_batches_completed() declaration rcutorture: Drop rcu_torture_completed() and friends rcu: Provide rcu_batches_completed_sched() for TINY_RCU rcutorture: Use unsigned for Reader Batch computations rcutorture: Make build-output parsing correctly flag RCU's warnings rcu: Make _batches_completed() functions return unsigned long rcutorture: Issue warnings on close calls due to Reader Batch blows documentation: Fix smp typo in memory-barriers.txt ...
2015-02-09	IB/mlx4: Reset flow support for IB kernel ULPs	Yishai Hadas
	The driver exposes interfaces that directly relate to HW state. Upon fatal error, consumers of these interfaces (ULPs) that rely on completion of all their posted work-request could hang, thereby introducing dependencies in shutdown order. To prevent this from happening, we manage the relevant resources (CQs, QPs) that are used by the device. Upon a fatal error, we now generate simulated completions for outstanding WQEs that were not completed at the time the HW was reset. It includes invoking the completion event handler for all involved CQs so that the ULPs will poll those CQs. When polled we return simulated CQEs with IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and not wait forever for completions upon receiving remove_one. The above change requires an extra check in the data path to make sure that when device is in error state, the simulated CQEs will be returned and no further WQEs will be posted. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09	Merge tag 'regulator-v3.20' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "This has not been a busy release for the regulator framework, though we do have the first parts of some ongoing work from Bjorn Andersson to allow us to support more complex modern systems with dynamic configuration of regulators in suspend and idle states. - Support for device-specific properties on regulator nodes when using simplified DT parsing in the core from Krzysztof Kozlowski. - Restructuring of the load tracking code, intended to support future improvements in this area for more complex system designs. - New drivers for Maxim MAX77843 and Mediatek MT6397. - Lots of smaller fixes and improvements" * tag 'regulator-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (29 commits) regulator: max77843: Add max77843 regulator driver regulator: Fix build breakage on !REGULATOR regulator: Build sysfs entries with static attribute groups regulator: qcom-rpm: Make it possible to specify supply regulator: core: Consolidate drms update handling regulator: qcom-rpm: signedness bug in probe() regulator: da9211: Add gpio control for enable/disable of buck regulator: qcom_rpm: Don't update vreg->uV/mV if rpm_reg_write fails regulator: lp872x: Remove **regulators from struct lp872x regulator: da9211: fix unmatched of_node regulator: Update documentation after renaming function argument regulator: axp20x: Migrate to regulator core's simplified DT parsing code regulator: axp20x: Fill regulators_node and of_match descriptor fields regulator: pfuze100-regulator: add pfuze3000 support regulator: max77686: Document gpio properties regulator: Allow parsing custom properties when using simplified DT parsing regulator: max77686: Add GPIO control regulator: Copy config passed during registration regulator: tps65023: Constify struct regmap_config and regulator_ops regulator: max8649: Constify struct regmap_config and regulator_ops ...
2015-02-09	Merge tag 'spi-v3.20' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "The major highlight this release is a refactoring of the core to allow us to run synchronous transfers in the context of the caller when there is no contention for the bus. This improves performance in the very common case by eliminating context switches and reducing the number of hardware setup and teardown operations we need to perform. Other changes: - New drivers for DLN-2 USB-SPI adapter and ST SPI controllers. - A big round of cleanups, performance and feature improvements for the xilinx driver from Ricardo Ribalda Delgado. - A wide range of smaller cleanups, fixes and feature improvements throughout the subsystem" * tag 'spi-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (68 commits) spi: mxs: cleanup wait_for_completion return handling spi: ti-qspi: cleanup wait_for_completion return handling spi: spi-imx: cleanup wait_for_completion handling spi: sh-msiof: cleanup wait_for_completion return handling spi: match var type to return type of wait_for_completion spi: spi-pxa2xx: only include mach/dma.h for legacy DMA spi: atmel: cleanup wait_for_completion return handling spi: fsl-dspi: Remove possible memory leak of 'chip' spi: sh-msiof: Update calculation of frequency dividing spi: spidev: Convert buf pointers for 32-bit compat SPI_IOC_MESSAGE(n) spi/xilinx: Fix access invalid memory on xilinx_spi_tx spi: Revert "spi/xilinx: Remove iowrite/ioread wrappers" spi/xilinx: Check number of slaves range spi/xilinx: Use polling mode on small transfers spi/xilinx: Remove remaining_words driver data variable spi/xilinx: Remove iowrite/ioread wrappers spi/xilinx: Convert bits_per_word in bytes_per_word spi/xilinx: Convert remainding_bytes in remaining words spi/xilinx: Make spi_tx and spi_rx simmetric spi/xilinx: Remove rx_fn and tx_fn pointer ...
2015-02-09	Merge tag 'regmap-v3.20' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "A very quiet release for regmap this time around: - Fix an endianness issue for I2C devices connected via SMBus where we were getting two layers both trying to do endianness handling. - Use a union to reduce the size of the regmap struct. - A couple of smaller fixes" * tag 'regmap-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: Fix i2c word access when using SMBus access functions regmap: Export regmap_get_val_endian regmap: ac97: Clean up indentation regmap: correct the description of structure element in reg_field regmap: Move spinlock_flags into the union
2015-02-09	Merge tag 'wireless-drivers-next-for-davem-2015-02-07' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Major changes: iwlwifi: * more work for new devices (4165 / 8260) * cleanups / improvemnts in rate control * fixes for TDLS * major statistics work from Johannes - more to come * improvements for the fw error dump infrastructure * usual amount of small fixes here and there (scan, D0i3 etc...) * add support for beamforming * enable stuck queue detection for iwlmvm * a few fixes for EBS scan * fixes for various failure paths * improvements for TDLS Offchannel wil6210: * performance tuning * some AP features brcm80211: * rework some code in SDIO part of the brcmfmac driver related to suspend/resume that were found doing stress testing * in PCIe part scheduling of worker thread needed to be relaxed * minor fixes and exposing firmware revision information to user-space, ie. ethtool. mwifiex: * enhancements for change virtual interface handling * remove coupling between netdev and FW supported interface combination, now conversion from any type of supported interface types to any other type is possible * DFS support in AP mode ath9k: * fix calibration issues on some boards * Wake-on-WLAN improvements ath10k: * add support for qca6174 hardware * enable RX batching to reduce CPU load Conflicts: drivers/net/wireless/rtlwifi/pci.c Conflict resolution is to get rid of the 'end' label and keep the rest. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09	dm: allocate requests in target when stacking on blk-mq devices	Mike Snitzer
	For blk-mq request-based DM the responsibility of allocating a cloned request is transfered from DM core to the target type. Doing so enables the cloned request to be allocated from the appropriate blk-mq request_queue's pool (only the DM target, e.g. multipath, can know which block device to send a given cloned request to). Care was taken to preserve compatibility with old-style block request completion that requires request-based DM _not_ acquire the clone request's queue lock in the completion path. As such, there are now 2 different request-based DM target_type interfaces: 1) the original .map_rq() interface will continue to be used for non-blk-mq devices -- the preallocated clone request is passed in from DM core. 2) a new .clone_and_map_rq() and .release_clone_rq() will be used for blk-mq devices -- blk_get_request() and blk_put_request() are used respectively from these hooks. dm_table_set_type() was updated to detect if the request-based target is being stacked on blk-mq devices, if so DM_TYPE_MQ_REQUEST_BASED is set. DM core disallows switching the DM table's type after it is set. This means that there is no mixing of non-blk-mq and blk-mq devices within the same request-based DM table. [This patch was started by Keith and later heavily modified by Mike] Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-02-09	dm: remove exports for request-based interfaces without external callers	Mike Snitzer
	Remove exports for dm_dispatch_request, dm_requeue_unmapped_request, and dm_kill_unmapped_request. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-02-09	SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flag	Trond Myklebust
	Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09	SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT	Trond Myklebust
	Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-09	Merge branch 'for-3.19-fixes' of ↵	Tejun Heo
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata into for-3.20 09c32aaa3683 ("ahci_xgene: Fix the dma state machine lockup for the ATA_CMD_SMART PIO mode command.") missed 3.19 release. Fold it into for-3.20. Signed-off-by: Tejun Heo <tj@kernel.org>
2015-02-08	net:rfs: adjust table size checking	Eric Dumazet
	Make sure root user does not try something stupid. Also make sure mask field in struct rps_sock_flow_table does not share a cache line with the potentially often dirtied flow table. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 567e4b79731c ("net: rfs: add hash collision detection") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08	SUNRPC: Remove TCP client connection reset hack	Trond Myklebust
	Instead we rely on SO_REUSEPORT to provide the reconnection semantics that we need for NFSv2/v3. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08	SUNRPC: Add helpers to prevent socket create from racing	Trond Myklebust
	The socket lock is currently held by the task that is requesting the connection be established. While that is efficient in the case where the connection happens quickly, it is racy in the case where it doesn't. What we really want is for the connect helper to be able to block access to the socket while it is being set up. This patch does so by arranging to transfer the socket lock from the task that is requesting the connect attempt, and then releasing that lock once everything is done. This scheme also gives us automatic protection against collisions with the RPC close code, so we can kill the cancel_delayed_work_sync() call in xs_close(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08	Merge tag 'trace-fixes-v3.19-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fixes from Steven Rostedt: "During testing Sedat Dilek hit a "suspicious RCU usage" splat that pointed out a real bug. During suspend and resume the tlb_flush tracepoint is called when the CPU is going offline. As the CPU has been noted as offline, RCU is ignoring that CPU, which means that it can not use RCU protected locks. When tracepoints are activated, they require RCU locking, and if RCU is ignoring a CPU that runs a tracepoint, there is a chance that the tracepoint could cause corruption. The solution was to change the tracepoint into a TRACE_EVENT_CONDITION() which allows us to check a condition to determine if the tracepoint should be called or not. If the condition is not met, the rcu protected code will not be executed. By adding the condition "cpu_online(smp_processor_id())", this will prevent the RCU protected code from being executed if the CPU is marked offline. After adding this, another bug was discovered. As RCU checks rcu callers, if a rcu call is not done, there is no check (obviously). We found that tracepoints could be added in RCU ignored locations and not have lockdep complain until the tracepoint is activated. This missed places where tracepoints were added in places they should not have been. To fix this, code was added in 3.18 that if lockdep is enabled, any tracepoint will still call the rcu checks even if the tracepoint is not enabled. The bug here, is that the check does not take the CONDITION into account. As the condition may prevent tracepoints from being activated in RCU ignored areas (as the one patch does), we get false positives when we enable lockdep and hit a tracepoint that the condition prevents it from being called in a RCU ignored location. The fix for this is to add the CONDITION to the rcu checks, even if the tracepoint is not enabled" * tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: x86/tlb/trace: Do not trace on CPU that is offline tracing: Add condition check to RCU lockdep checks
2015-02-08	net: rfs: add hash collision detection	Eric Dumazet
	Receive Flow Steering is a nice solution but suffers from hash collisions when a mix of connected and unconnected traffic is received on the host, when flow hash table is populated. Also, clearing flow in inet_release() makes RFS not very good for short lived flows, as many packets can follow close(). (FIN , ACK packets, ...) This patch extends the information stored into global hash table to not only include cpu number, but upper part of the hash value. I use a 32bit value, and dynamically split it in two parts. For host with less than 64 possible cpus, this gives 6 bits for the cpu number, and 26 (32-6) bits for the upper part of the hash. Since hash bucket selection use low order bits of the hash, we have a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big enough. If the hash found in flow table does not match, we fallback to RPS (if it is enabled for the rxqueue). This means that a packet for an non connected flow can avoid the IPI through a unrelated/victim CPU. This also means we no longer have to clear the table at socket close time, and this helps short lived flows performance. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08	net: fix a typo in skb_checksum_validate_zero_check	Sabrina Dubroca
	Remove trailing underscore. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08	tcp: mitigate ACK loops for connections as tcp_timewait_sock	Neal Cardwell
	Ensure that in state FIN_WAIT2 or TIME_WAIT, where the connection is represented by a tcp_timewait_sock, we rate limit dupacks in response to incoming packets (a) with TCP timestamps that fail PAWS checks, or (b) with sequence numbers that are out of the acceptable window. We do not send a dupack in response to out-of-window packets if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a dupack in response to an out-of-window packet. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08	tcp: mitigate ACK loops for connections as tcp_sock	Neal Cardwell
	Ensure that in state ESTABLISHED, where the connection is represented by a tcp_sock, we rate limit dupacks in response to incoming packets (a) with TCP timestamps that fail PAWS checks, or (b) with sequence numbers or ACK numbers that are out of the acceptable window. We do not send a dupack in response to out-of-window packets if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a dupack in response to an out-of-window packet. There is already a similar (although global) rate-limiting mechanism for "challenge ACKs". When deciding whether to send a challence ACK, we first consult the new per-connection rate limit, and then the global rate limit. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08	tcp: mitigate ACK loops for connections as tcp_request_sock	Neal Cardwell
	In the SYN_RECV state, where the TCP connection is represented by tcp_request_sock, we now rate-limit SYNACKs in response to a client's retransmitted SYNs: we do not send a SYNACK in response to client SYN if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a SYNACK in response to a client's retransmitted SYN. This allows the vast majority of legitimate client connections to proceed unimpeded, even for the most aggressive platforms, iOS and MacOS, which actually retransmit SYNs 1-second intervals for several times in a row. They use SYN RTO timeouts following the progression: 1,1,1,1,1,2,4,8,16,32. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08	Merge remote-tracking branches 'spi/topic/orion', 'spi/topic/pxa2xx', ↵	Mark Brown
	'spi/topic/qup', 'spi/topic/rockchip' and 'spi/topic/samsung' into spi-next
2015-02-08	Merge remote-tracking branches 'spi/topic/img-spfi', 'spi/topic/imx', ↵	Mark Brown
	'spi/topic/inline', 'spi/topic/meson' and 'spi/topic/mxs' into spi-next
2015-02-08	Merge remote-tracking branches 'spi/topic/falcon', 'spi/topic/fsf', ↵	Mark Brown
	'spi/topic/fsl', 'spi/topic/fsl-dspi' and 'spi/topic/gpio' into spi-next
2015-02-08	Merge remote-tracking branch 'spi/topic/sh-msiof' into spi-next	Mark Brown

2015-02-08	Merge remote-tracking branches 'regulator/topic/max8649', ↵	Mark Brown
	'regulator/topic/mode', 'regulator/topic/mt6397', 'regulator/topic/pfuze100' and 'regulator/topic/qcom-rpm' into regulator-next
2015-02-08	Merge remote-tracking branches 'regulator/topic/axp20x', ↵	Mark Brown
	'regulator/topic/da9211' and 'regulator/topic/fan53555' into regulator-next