summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2009-02-05Merge branch 'x86/urgent' into x86/apicIngo Molnar
Conflicts: arch/x86/mach-default/setup.c Semantic merge: arch/x86/kernel/irqinit_32.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-05smp, generic: introduce arch_disable_smp_support(), build fixIngo Molnar
This function should be provided on UP too. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-05x86: move default_ipi_xx back to ipi.cYinghai Lu
Impact: cleanup only leave _default_ipi_xx etc in .h Beyond the cleanup factor, this saves a bit of code size as well: text data bss dec hex filename 7281931 1630144 1463304 10375379 9e50d3 vmlinux.before 7281753 1630144 1463304 10375201 9e5021 vmlinux.after Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-05x86, apic: explain the purpose of max_physical_apicidIngo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-05smp, generic: introduce arch_disable_smp_support() instead of ↵Ingo Molnar
disable_ioapic_setup() Impact: cleanup disable_ioapic_setup() in init/main.c is ugly as the function is x86-specific. The #ifdef inline prototype there is ugly too. Replace it with a generic arch_disable_smp_support() function - which has a weak alias for non-x86 architectures and for non-ioapic x86 builds. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-05x86: disable intel_iommu support by defaultKyle McMartin
Due to recurring issues with DMAR support on certain platforms. There's a number of filesystem corruption incidents reported: https://bugzilla.redhat.com/show_bug.cgi?id=479996 http://bugzilla.kernel.org/show_bug.cgi?id=12578 Provide a Kconfig option to change whether it is enabled by default. If disabled, it can still be reenabled by passing intel_iommu=on to the kernel. Keep the .config option off by default. Signed-off-by: Kyle McMartin <kyle@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-By: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-04x86: don't apply __supported_pte_mask to non-present ptesJeremy Fitzhardinge
On an x86 system which doesn't support global mappings, __supported_pte_mask has _PAGE_GLOBAL clear, to make sure it never appears in the PTE. pfn_pte() and so on will enforce it with: static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot) { return __pte((((phys_addr_t)page_nr << PAGE_SHIFT) | pgprot_val(pgprot)) & __supported_pte_mask); } However, we overload _PAGE_GLOBAL with _PAGE_PROTNONE on non-present ptes to distinguish them from swap entries. However, applying __supported_pte_mask indiscriminately will clear the bit and corrupt the pte. I guess the best fix is to only apply __supported_pte_mask to present ptes. This seems like the right solution to me, as it means we can completely ignore the issue of overlaps between the present pte bits and the non-present pte-as-swap entry use of the bits. __supported_pte_mask contains the set of flags we support on the current hardware. We also use bits in the pte for things like logically present ptes with no permissions, and swap entries for swapped out pages. We should only apply __supported_pte_mask to present ptes, because otherwise we may destroy other information being stored in the ptes. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-05x86: fix grammar in user-visible BIOS warningAlex Chiang
Fix user-visible grammo. Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-04x86/Kconfig.cpu: make Kconfig help readable in the consoleBorislav Petkov
Impact: cleanup Some lines exceed the 80 char width making them unreadable. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-04x86, 64-bit: print DMI info in the oops traceKyle McMartin
This patch echoes what we already do on 32-bit since 90f7d25c6b672137344f447a30a9159945ffea72, and prints the DMI product name in show_regs, so that system specific problems can be easily identified. Signed-off-by: Kyle McMartin <kyle@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-04Merge branch 'core/xen' into x86/urgentIngo Molnar
2009-02-03x86: APIC: enable workaround on AMD Fam10h CPUsBorislav Petkov
Impact: fix to enable APIC for AMD Fam10h on chipsets with a missing/b0rked ACPI MP table (MADT) Booting a 32bit kernel on an AMD Fam10h CPU running on chipsets with missing/b0rked MP table leads to a hang pretty early in the boot process due to the APIC not being initialized. Fix that by falling back to the default APIC base address in 32bit code, as it is done in the 64bit codepath. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-03xen: disable interrupts before saving in percpuJeremy Fitzhardinge
Impact: Fix race condition xen_mc_batch has a small preempt race where it takes the address of a percpu variable immediately before disabling interrupts, thereby leaving a small window in which we may migrate to another cpu and save the flags in the wrong percpu variable. Disable interrupts before saving the old flags in a percpu. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-03x86: add x86@kernel.org to MAINTAINERSH. Peter Anvin
Impact: Documentation only There is an email alias as well to reach the x86 maintainers: x86@kernel.org. Document it. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-02x86: push old stack address on irqstack for unwinderMartin Hicks
Impact: Fixes dumpstack and KDB on 64 bits This re-adds the old stack pointer to the top of the irqstack to help with unwinding. It was removed in commit d99015b1abbad743aa049b439c1e1dede6d0fa49 as part of the save_args out-of-line work. Both dumpstack and KDB require this information. Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI hotplug: Change link order of pciehp & acpiphp PCI hotplug: fakephp: Allocate PCI resources before adding the device PCI MSI: Fix undefined shift by 32 PCI PM: Do not wait for buses in B2 or B3 during resume PCI PM: Power up devices before restoring their state PCI PM: Fix hibernation breakage on EeePC 701 PCI: irq and pci_ids patch for Intel Tigerpoint DeviceIDs PCI PM: Fix suspend error paths and testing facility breakage
2009-02-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: fix per cpu kmem_cache_cpu array memory leak kmalloc: return NULL instead of link failure
2009-02-02Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: fbdev/atyfb: Fix DSP config on some PowerMacs & PowerBooks powerpc: Fix oops on some machines due to incorrect pr_debug() powerpc/ps3: Printing fixups for l64 to ll64 convserion drivers/net powerpc/5200: update device tree binding documentation powerpc/5200: Bugfix for PCI mapping of memory and IMMR powerpc/5200: update defconfigs
2009-02-02Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched_rt: don't use first_cpu on cpumask created with cpumask_and sched: fix buddie group latency sched: clear buddies more aggressively sched: symmetric sync vs avg_overlap sched: fix sync wakeups cpuset: fix possible deadlock in async_rebuild_sched_domains
2009-02-02Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (45 commits) V4L/DVB (10411): s5h1409: Perform s5h1409 soft reset after tuning V4L/DVB (10403): saa7134-alsa: saa7130 doesn't support digital audio V4L/DVB (10229): ivtv: fix memory leak V4L/DVB (10385): gspca - main: Fix memory leak when USB disconnection while streaming. V4L/DVB (10325): em28xx: Fix for fail to submit URB with IRQs and Pre-emption Disabled V4L/DVB (10317): radio-mr800: fix radio->muted and radio->stereo V4L/DVB (10314): cx25840: ignore TUNER_SET_CONFIG in the command callback. V4L/DVB (10288): af9015: bug fix: stick does not work always when plugged V4L/DVB (10287): af9015: fix second FE V4L/DVB (10270): saa7146: fix unbalanced mutex_lock/unlock V4L/DVB (10265): budget.c driver: Kernel oops: "BUG: unable to handle kernel paging request at ffffffff V4L/DVB (10261): em28xx: fix kernel panic on audio shutdown V4L/DVB (10257): em28xx: Fix for KWorld 330U Board V4L/DVB (10256): em28xx: Fix for KWorld 330U AC97 V4L/DVB (10254): em28xx: Fix audio URB transfer buffer race condition V4L/DVB (10250): cx25840: fix regression: fw not loaded on first use V4L/DVB (10248): v4l-dvb: fix a bunch of compile warnings. V4L/DVB (10243): em28xx: fix compile warning V4L/DVB (10240): Fix obvious swapped names in v4l2_subdev logic V4L/DVB (10233): [PATCH] Terratec Cinergy DT XS Diversity new USB ID (0ccd:0081) ...
2009-02-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc: pxamci: enable DMA for write ops after CMD/RESP pxamci: replace #ifdef CONFIG_PXA27x with if (cpu_is_pxa27x()) ricoh_mmc: Use suspend_late/resume_early mmci: Add support for ST Micro derivate mmc: Add a MX2/MX3 specific SDHC driver
2009-02-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: icside: fix PCB version 6 support (v2) tx4939ide: typo fix and minor cleanup ide: add CS5536 host driver (v3) ide: Force VIA IDE legacy interrupts for AmigaOne boards IDE: Unregister and disable devices if initialization fails. ide: fix ide_register_port() failure handling ide: struct device - replace bus_id with dev_name(), dev_set_name() ide-cd: fix DMA for non bio-backed requests
2009-02-02Merge branch 'for-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dvrabel/uwb * 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/dvrabel/uwb: uwb: lock rc->rsvs_lock with spin_lock_bh() wusb: timeout when waiting for ASL/PZL updates in whci-hcd uwb: remove unused #include <version.h>'s wusb: return -ENOTCONN when resetting a port with no connected device uwb: safely remove all reservations
2009-02-02Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: add text file detailing queue/ sysfs files bio.h: If they MUST be inlined, then use __always_inline Fix misleading comment in bio.h block: fix inconsistent parenthesisation of QUEUE_FLAG_DEFAULT block: fix oops in blk_queue_io_stat()
2009-02-02virtio-pci: do not oops on config change if driver not loadedMark McLoughlin
The host really shouldn't be notifying us of config changes before the device status is VIRTIO_CONFIG_S_DRIVER or VIRTIO_CONFIG_S_DRIVER_OK. However, if we do happen to be interrupted while we're not attached to a driver, we really shouldn't oops. Prevent this simply by checking that device->driver is non-NULL before trying to notify the driver of config changes. Problem observed by doing a "set_link virtio.0 down" with QEMU before the net driver had been loaded. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-02modules: Use a better scheme for refcountingEric Dumazet
Current refcounting for modules (done if CONFIG_MODULE_UNLOAD=y) is using a lot of memory. Each 'struct module' contains an [NR_CPUS] array of full cache lines. This patch uses existing infrastructure (percpu_modalloc() & percpu_modfree()) to allocate percpu space for the refcount storage. Instead of wasting NR_CPUS*128 bytes (on i386), we now use nr_cpu_ids*sizeof(local_t) bytes. On a typical distro, where NR_CPUS=8, shiping 2000 modules, we reduce size of module files by about 2 Mbytes. (1Kb per module) Instead of having all refcounters in the same memory node - with TLB misses because of vmalloc() - this new implementation permits to have better NUMA properties, since each CPU will use storage on its preferred node, thanks to percpu storage. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-02pxamci: enable DMA for write ops after CMD/RESPCliff Brake
With the PXA270 MMC hardware, there seems to be an issue of data corruption on writes where a 4KB data block is offset by one byte. If we delay enabling the DMA for writes until after the CMD/RESP has finished, the problem seems to be fixed. related to PXA270 Erratum #91 Tested-by: Vernon Sauder <VernonInHand@gmail.com> Signed-off-by: Cliff Brake <cbrake@bec-systems.com> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-02-02pxamci: replace #ifdef CONFIG_PXA27x with if (cpu_is_pxa27x())Cliff Brake
Signed-off-by: Cliff Brake <cbrake@bec-systems.com> Acked-by: Eric Miao <eric.miao@marvell.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-02-02ricoh_mmc: Use suspend_late/resume_earlyphilipl@overt.org
If ricoh_mmc suspends before sdhci_pci, it will pull the card out from under the controller, which could leave the system in a very confused state. Using suspend_late/resume_early ensures that sdhci_pci suspends first and resumes second. Signed-off-by: Philip Langdale <philipl@overt.org> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-02-02mmci: Add support for ST Micro derivateLinus Walleij
This patch adds support for the ST Microelectronics version of the PL180 PrimeCell. They use designer ID 0x80 and have a few alterations/bugfixes related to open drain and HW flow control. They also add some SDIO registers, I am unsure if these are in ST HW only or if this is things also added in later ARM revisions, but they are included in the mmci.h file for completeness. Signed-off-by: Linus Walleij <linus.walleij@ericsson.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-02-02mmc: Add a MX2/MX3 specific SDHC driverSascha Hauer
This patch adds a MX2/MX3 specific SDHC driver. The hardware is basically the same as in the MX1, but unlike the MX1 controller the MX2 controller just works as expected. Since the MX1 driver has more workarounds for bugs than anything else I had no success with supporting MX1 and MX2 in a sane way in one driver. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2009-02-02icside: fix PCB version 6 support (v2)Bartlomiej Zolnierkiewicz
We need to pass struct ide_port_info also to ide_host_register(). v2: Fix v5/v6 mismatch noticed by Russell. Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-02-02tx4939ide: typo fix and minor cleanupAtsushi Nemoto
The bcount is greater than 0 and less than or equal to 0x10000. Thus '(bcount & 0xffff) == 0x0000' can be simplified as 'bcount == 0x10000'. Suggested-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-02-02ide: add CS5536 host driver (v3)Bartlomiej Zolnierkiewicz
This is a port of libata's pata_cs5536.c (written by Martin K. Petersen) to IDE subsystem. Changes done while at it: * Reprogram PIO/MWDMA timings if needed before and after DMA transfer (chipset uses shared PIO/MWDMA timings). * Fix cable detection to report 80-wires cable if BIOS set it for any device on a port (IDE core will do drive-side cable detection later). * Don't disable UDMA while programming PIO timings. * Simplify PCI/MSR support. Pros of having IDE host driver in addition to libata's one: * IDE is much lighter than SCSI+libata, the host driver itself is also a bit smaller: text data bss dec hex filename 1261 496 4 1761 6e1 drivers/ata/pata_cs5536.o 1242 128 4 1374 55e drivers/ide/cs5536.o * This allows use of IDE features which are unavailable under libata. v2: * Fixes per review from Sergei: - simplify dependency check in Kconfig - use IDE_DRV_MASK also for ->drive_data - disable UDMA when programming MWDMA - program new DTC timings only when necessary - fix printk() level in cs5536_init_one() * Fix patch description according to comments from Alan and Sergei. v3: * Smarter masking of UDMA bits per Sergei's suggestion. Cc: Martin K. Petersen <mkp@mkp.net> Cc: Karl Auerbach <karl@iwl.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-02-02ide: Force VIA IDE legacy interrupts for AmigaOne boardsGerhard Pircher
The AmigaOne uses the onboard VIA IDE controller in legacy mode (like the Pegasos). Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net> Cc: "Grant Likely" <grant.likely@secretlab.ca> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-02-02IDE: Unregister and disable devices if initialization fails.Ian Campbell
On reboot the loop in device_shutdown gets confused by these partially initialized devices and goes into an infinite loop. Therefore unregister and disable these devices. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> [bart: remove leftover hwif->present clearing + update patch description] Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-02-02ide: fix ide_register_port() failure handlingBartlomiej Zolnierkiewicz
* Factor out port freeing from ide_host_free() to ide_free_port(). * Add ide_disable_port() and use it on ide_register_port() failure. Cc: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-02-02ide: struct device - replace bus_id with dev_name(), dev_set_name()Kay Sievers
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: linux-ide@vger.kernel.org Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-02-02ide-cd: fix DMA for non bio-backed requestsBorislav Petkov
This one fixes http://bugzilla.kernel.org/show_bug.cgi?id=12320. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-02-02Merge branch 'master' of ↵David Vrabel
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into for-upstream
2009-02-02block: add text file detailing queue/ sysfs filesJens Axboe
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-02bio.h: If they MUST be inlined, then use __always_inlineAlberto Bertogli
bvec_kmap_irq() and bvec_kunmap_irq() comments say they MUST be inlined, so mark them as __always_inline. Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-02Fix misleading comment in bio.hAlberto Bertogli
The comment says "remember to add offset!", but the function already adds it. Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-02Merge branches 'topic/slab/fixes' and 'topic/slub/fixes' into for-linusPekka Enberg
2009-02-02block: fix inconsistent parenthesisation of QUEUE_FLAG_DEFAULTJens Axboe
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-02block: fix oops in blk_queue_io_stat()Jens Axboe
Some initial probe requests don't have disk->queue mapped yet, so we can't rely on a non-NULL queue in blk_queue_io_stat(). Wrap it in blk_do_io_stat(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-02-02fbdev/atyfb: Fix DSP config on some PowerMacs & PowerBooksRisto Suominen
Since the complete re-write in 2.6.10, some PowerMacs (At least PowerMac 5500 and PowerMac G3 Beige rev A) with ATI Mach64 chip have suffered from unstable columns in their framebuffer image. This seems to depend on a value (4) read from PLL_EXT_CNTL register, which leads to incorrect DSP config parameters to be written to the chip. This patch uses a value calculated by aty_init_pll_ct instead, as a starting point. There are questions as to whether this should be extended to other platforms or maybe made dependent on specific chip types, but in the meantime, this has been tested on various powermacs and works for them so let's commit it. Signed-off-by: Risto Suominen <Risto.Suominen@gmail.com> Tested-by: Michael Pettersson <mike@it.uu.se> Cc: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-02powerpc: Fix oops on some machines due to incorrect pr_debug()Benjamin Herrenschmidt
Recently, a patch left DEBUG enabled in the powerpc common PCI code, resulting in an old bug in a pr_debug() statement to show up and cause a NULL dereference on some machines. This fixes the pr_debug() statement and reverts to DEBUG not being force-enabled in that file. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-02powerpc/ps3: Printing fixups for l64 to ll64 convserion drivers/netStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Geoff Levand <geoffrey.levand@am.sony.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-01Manually revert "mlock: downgrade mmap sem while populating mlocked regions"Linus Torvalds
This essentially reverts commit 8edb08caf68184fb170f4f69c7445929e199eaea. It downgraded our mmap semaphore to a read-lock while mlocking pages, in order to allow other threads (and external accesses like "ps" et al) to walk the vma lists and take page faults etc. Which is a nice idea, but the implementation does not work. Because we cannot upgrade the lock back to a write lock without releasing the mmap semaphore, the code had to release the lock entirely and then re-take it as a writelock. However, that meant that the caller possibly lost the vma chain that it was following, since now another thread could come in and mmap/munmap the range. The code tried to work around that by just looking up the vma again and erroring out if that happened, but quite frankly, that was just a buggy hack that doesn't actually protect against anything (the other thread could just have replaced the vma with another one instead of totally unmapping it). The only way to downgrade to a read map _reliably_ is to do it at the end, which is likely the right thing to do: do all the 'vma' operations with the write-lock held, then downgrade to a read after completing them all, and then do the "populate the newly mlocked regions" while holding just the read lock. And then just drop the read-lock and return to user space. The (perhaps somewhat simpler) alternative is to just make all the callers of mlock_vma_pages_range() know that the mmap lock got dropped, and just re-grab the mmap semaphore if it needs to mlock more than one vma region. So we can do this "downgrade mmap sem while populating mlocked regions" thing right, but the way it was done here was absolutely not correct. Thus the revert, in the expectation that we will do it all correctly some day. Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>