summaryrefslogtreecommitdiffstats
path: root/include/linux
AgeCommit message (Collapse)Author
2010-05-28Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (27 commits) ACPI: Don't let acpi_pad needlessly mark TSC unstable drivers/acpi/sleep.h: Checkpatch cleanup ACPI: Minor cleanup eliminating redundant PMTIMER_TICKS to NS conversion ACPI: delete unused c-state promotion/demotion data strucutures ACPI: video: fix acpi_backlight=video ACPI: EC: Use kmemdup drivers/acpi: use kasprintf ACPI, APEI, EINJ injection parameters support Add x64 support to debugfs ACPI, APEI, Use ERST for persistent storage of MCE ACPI, APEI, Error Record Serialization Table (ERST) support ACPI, APEI, Generic Hardware Error Source memory error support ACPI, APEI, UEFI Common Platform Error Record (CPER) header Unified UUID/GUID definition ACPI Hardware Error Device (PNP0C33) support ACPI, APEI, PCIE AER, use general HEST table parsing in AER firmware_first setup ACPI, APEI, Document for APEI ACPI, APEI, EINJ support ACPI, APEI, HEST table parsing ACPI, APEI, APEI supporting infrastructure ...
2010-05-28Merge branch 'acpi_enable' into releaseLen Brown
2010-05-28Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: remove detritus left by "mm: make read_cache_page synchronous" fix fs/sysv s_dirt handling fat: convert to use the new truncate convention. ext2: convert to use the new truncate convention. tmpfs: convert to use the new truncate convention fs: convert simple fs to new truncate kill spurious reference to vmtruncate fs: introduce new truncate sequence fs/super: fix kernel-doc warning fs/minix: bugfix, number of indirect block ptrs per block depends on block size rename the generic fsync implementations drop unused dentry argument to ->fsync fs: Add missing mutex_unlock Fix racy use of anon_inode_getfd() in perf_event.c get rid of the magic around f_count in aio VFS: fix recent breakage of FS_REVAL_DOT Revert "anon_inode: set S_IFREG on the anon_inode"
2010-05-27fs: introduce new truncate sequencenpiggin@suse.de
Introduce a new truncate calling sequence into fs/mm subsystems. Rather than setattr > vmtruncate > truncate, have filesystems call their truncate sequence from ->setattr if filesystem specific operations are required. vmtruncate is deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced previously should be used. simple_setattr is introduced for simple in-ram filesystems to implement the new truncate sequence. Eventually all filesystems should be converted to implement a setattr, and the default code in notify_change should go away. simple_setsize is also introduced to perform just the ATTR_SIZE portion of simple_setattr (ie. changing i_size and trimming pagecache). To implement the new truncate sequence: - filesystem specific manipulations (eg freeing blocks) must be done in the setattr method rather than ->truncate. - vmtruncate can not be used by core code to trim blocks past i_size in the event of write failure after allocation, so this must be performed in the fs code. - convert usage of helpers block_write_begin, nobh_write_begin, cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed variants. These avoid calling vmtruncate to trim blocks (see previous). - inode_setattr should not be used. generic_setattr is a new function to be used to copy simple attributes into the generic inode. - make use of the better opportunity to handle errors with the new sequence. Big problem with the previous calling sequence: the filesystem is not called until i_size has already changed. This means it is not allowed to fail the call, and also it does not know what the previous i_size was. Also, generic code calling vmtruncate to truncate allocated blocks in case of error had no good way to return a meaningful error (or, for example, atomically handle block deallocation). Cc: Christoph Hellwig <hch@lst.de> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27rename the generic fsync implementationsChristoph Hellwig
We don't name our generic fsync implementations very well currently. The no-op implementation for in-memory filesystems currently is called simple_sync_file which doesn't make too much sense to start with, the the generic one for simple filesystems is called simple_fsync which can lead to some confusion. This patch renames the generic file fsync method to generic_file_fsync to match the other generic_file_* routines it is supposed to be used with, and the no-op implementation to noop_fsync to make it obvious what to expect. In addition add some documentation for both methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27drop unused dentry argument to ->fsyncChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27get rid of the magic around f_count in aioAl Viro
__aio_put_req() plays sick games with file refcount. What it wants is fput() from atomic context; it's almost always done with f_count > 1, so they only have to deal with delayed work in rare cases when their reference happens to be the last one. Current code decrements f_count and if it hasn't hit 0, everything is fine. Otherwise it keeps a pointer to struct file (with zero f_count!) around and has delayed work do __fput() on it. Better way to do it: use atomic_long_add_unless( , -1, 1) instead of !atomic_long_dec_and_test(). IOW, decrement it only if it's not the last reference, leave refcount alone if it was. And use normal fput() in delayed work. I've made that atomic_long_add_unless call a new helper - fput_atomic(). Drops a reference to file if it's safe to do in atomic (i.e. if that's not the last one), tells if it had been able to do that. aio.c converted to it, __fput() use is gone. req->ki_file *always* contributes to refcount now. And __fput() became static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: libata: implement dump_id force param libata: disable ATAPI AN by default libata-sff: make BMDMA optional libata-sff: kill dummy BMDMA ops from sata_qstor and pata_octeon_cf libata-sff: separate out BMDMA init libata-sff: separate out BMDMA irq handler libata-sff: ata_sff_irq_clear() is BMDMA specific sata_mv: drop unncessary EH callback resetting
2010-05-27Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits) tracing: Add __used annotation to event variable perf, trace: Fix !x86 build bug perf report: Support multiple events on the TUI perf annotate: Fix up usage of the build id cache x86/mmiotrace: Remove redundant instruction prefix checks perf annotate: Add TUI interface perf tui: Remove annotate from popup menu after failure perf report: Don't start the TUI if -D is used perf: Fix getline undeclared perf: Optimize perf_tp_event_match() perf: Remove more code from the fastpath perf: Optimize the !vmalloc backed buffer perf: Optimize perf_output_copy() perf: Fix wakeup storm for RO mmap()s perf-record: Share per-cpu buffers perf-record: Remove -M perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig ...
2010-05-27Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-backlightLinus Torvalds
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-backlight: gta02: Use pcf50633 backlight driver instead of platform backlight driver. backlight: pcf50633: Register a pcf50633-backlight device in pcf50633 core driver. backlight: Add pcf50633 backlight driver backlight: 88pm860x_bl: fix error handling in pm860x_backlight_probe backlight: max8925_bl: Fix error handling path backlight: l4f00242t03: fix error handling in l4f00242t03_probe backlight: add S6E63M0 AMOLED LCD Panel driver backlight: adp8860: add support for ADP8861 & ADP8863 backlight: mbp_nvidia_bl - Fix DMI_SYS_VENDOR for MacBook1,1 backlight: Add Cirrus EP93xx backlight driver backlight: l4f00242t03: Fix regulators handling code in remove function backlight: fix adp8860_bl build errors backlight: new driver for the ADP8860 backlight parts backlight: 88pm860x_bl - potential memory leak backlight: mbp_nvidia_bl - add support for older MacBookPro and MacBook 6,1. backlight: Kconfig cleanup backlight: backlight_device_register() return ERR_PTR()
2010-05-27Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-ledsLinus Torvalds
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds: leds: Add mx31moboard MC13783 led support leds: Add mc13783 LED support leds: leds-ss4200: fix led_classdev_unregister twice in error handling leds: leds-lp3944: properly handle lp3944_configure fail in lp3944_probe leds: led-class: set permissions on max_brightness file to 0444 leds: leds-gpio: Change blink_set callback to be able to turn off blinking leds: Add LED driver for the Soekris net5501 board leds: 88pm860x - fix checking in probe function
2010-05-27Merge branch 'hwmon-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: (23 commits) hwmon: (lm75) Add support for the Texas Instruments TMP105 hwmon: (ltc4245) Read only one GPIO pin hwmon: (dme1737) Add SCH5127 support hwmon: (tmp102) Don't always stop chip at exit hwmon: (tmp102) Fix suspend and resume functions hwmon: (tmp102) Various fixes hwmon: Driver for TI TMP102 temperature sensor hwmon: EMC1403 thermal sensor support hwmon: (applesmc) Add temperature sensor labels to sysfs interface hwmon: (applesmc) Add generic support for MacBook Pro 7 hwmon: (applesmc) Add generic support for MacBook Pro 6 hwmon: (applesmc) Add support for MacBook Pro 5,3 and 5,4 hwmon: (tmp401) Reorganize code to get rid of static forward declarations hwmon: (tmp401) Use constants for sysfs file permissions hwmon: (adm1031) Allow setting update rate hwmon: Add description of the update_rate sysfs attribute hwmon: (lm90) Use programmed update rate hwmon: (f71882fg) Acquire I/O regions while we're working with them hwmon: (f71882fg) Code cleanup hwmon: (f71882fg) Use strict_stro(l|ul) instead of simple_strto$1 ...
2010-05-27hwmon: (asus_atk0110) Don't load if ACPI resources aren't enforcedJean Delvare
When the user passes the kernel parameter acpi_enforce_resources=lax, the ACPI resources are no longer protected, so a native driver can make use of them. In that case, we do not want the asus_atk0110 to be loaded. Unfortunately, this driver loads automatically due to its MODULE_DEVICE_TABLE, so the user ends up with two drivers loaded for the same device - this is bad. So I suggest that we prevent the asus_atk0110 driver from loading if acpi_enforce_resources=lax. Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Luca Tettamanti <kronos.it@gmail.com> Cc: Len Brown <lenb@kernel.org>
2010-05-27Merge branch 'sfi-release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6 * 'sfi-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6: SFI: add sysfs interface for SFI tables. SFI: add support for v0.81 spec
2010-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (27 commits) Btrfs: add more error checking to btrfs_dirty_inode Btrfs: allow unaligned DIO Btrfs: drop verbose enospc printk Btrfs: Fix block generation verification race Btrfs: fix preallocation and nodatacow checks in O_DIRECT Btrfs: avoid ENOSPC errors in btrfs_dirty_inode Btrfs: move O_DIRECT space reservation to btrfs_direct_IO Btrfs: rework O_DIRECT enospc handling Btrfs: use async helpers for DIO write checksumming Btrfs: don't walk around with task->state != TASK_RUNNING Btrfs: do aio_write instead of write Btrfs: add basic DIO read/write support direct-io: do not merge logically non-contiguous requests direct-io: add a hook for the fs to provide its own submit_bio function fs: allow short direct-io reads to be completed via buffered IO Btrfs: Metadata ENOSPC handling for balance Btrfs: Pre-allocate space for data relocation Btrfs: Metadata ENOSPC handling for tree log Btrfs: Metadata reservation for orphan inodes Btrfs: Introduce global metadata reservation ...
2010-05-27Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: Make fsync sync new parent directories in no-journal mode ext4: Drop whitespace at end of lines ext4: Fix compat EXT4_IOC_ADD_GROUP ext4: Conditionally define compat ioctl numbers tracing: Convert more ext4 events to DEFINE_EVENT ext4: Add new tracepoints to track mballoc's buddy bitmap loads ext4: Add a missing trace hook ext4: restart ext4_ext_remove_space() after transaction restart ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted ext4: Avoid crashing on NULL ptr dereference on a filesystem error ext4: Use bitops to read/modify i_flags in struct ext4_inode_info ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE() ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks() ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks() ext4: Use our own write_cache_pages() ext4: Show journal_checksum option ext4: Fix for ext4_mb_collect_stats() ext4: check for a good block group before loading buddy pages ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate ext4: Remove extraneous newlines in ext4_msg() calls ... Fixed up trivial conflict in fs/ext4/fsync.c
2010-05-27Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: ieee1394: schedule for removal firewire: core: use separate timeout for each transaction firewire: core: Fix tlabel exhaustion problem firewire: core: make transaction label allocation more robust firewire: core: clean up config ROM related defined constants ieee1394: mark char device files as not seekable firewire: cdev: mark char device files as not seekable firewire: ohci: cleanups and fix for nonstandard build without debug facility firewire: ohci: wait for PHY register accesses to complete firewire: ohci: fix up configuration of TI chips firewire: ohci: enable 1394a enhancements firewire: ohci: do not clear PHY interrupt status inadvertently firewire: ohci: add a function for reading PHY registers Trivial conflicts in Documentation/feature-removal-schedule.txt
2010-05-27Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: usbtouchscreen - support bigger iNexio touchscreens Input: ads7846 - return error on regulator_get() failure Input: twl4030-vibra - correct the power down sequence Input: enable onkey driver of max8925 Input: use ABS_CNT rather than (ABS_MAX + 1)
2010-05-27numa: introduce numa_mem_id()- effective local memory node idLee Schermerhorn
Introduce numa_mem_id(), based on generic percpu variable infrastructure to track "nearest node with memory" for archs that support memoryless nodes. Define API in <linux/topology.h> when CONFIG_HAVE_MEMORYLESS_NODES defined, else stubs. Architectures will define HAVE_MEMORYLESS_NODES if/when they support them. Archs can override definitions of: numa_mem_id() - returns node number of "local memory" node set_numa_mem() - initialize [this cpus'] per cpu variable 'numa_mem' cpu_to_mem() - return numa_mem for specified cpu; may be used as lvalue Generic initialization of 'numa_mem' occurs in __build_all_zonelists(). This will initialize the boot cpu at boot time, and all cpus on change of numa_zonelist_order, or when node or memory hot-plug requires zonelist rebuild. Archs that support memoryless nodes will need to initialize 'numa_mem' for secondary cpus as they're brought on-line. [akpm@linux-foundation.org: fix build] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27numa: add generic percpu var numa_node_id() implementationLee Schermerhorn
Rework the generic version of the numa_node_id() function to use the new generic percpu variable infrastructure. Guard the new implementation with a new config option: CONFIG_USE_PERCPU_NUMA_NODE_ID. Archs which support this new implemention will default this option to 'y' when NUMA is configured. This config option could be removed if/when all archs switch over to the generic percpu implementation of numa_node_id(). Arch support involves: 1) converting any existing per cpu variable implementations to use this implementation. x86_64 is an instance of such an arch. 2) archs that don't use a per cpu variable for numa_node_id() will need to initialize the new per cpu variable "numa_node" as cpus are brought on-line. ia64 is an example. 3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g., when NUMA is configured. This is required because I have retained the old implementation by default to allow archs to be modified incrementally, as desired. Subsequent patches will convert x86_64 and ia64 to use this implemenation. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27vfs: introduce noop_llseek()jan Blunck
This is an implementation of ->llseek useable for the rare special case when userspace expects the seek to succeed but the (device) file is actually not able to perform the seek. In this case you use noop_llseek() instead of falling back to the default implementation of ->llseek. Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27aio: fix the compat vectored operationsJeff Moyer
The aio compat code was not converting the struct iovecs from 32bit to 64bit pointers, causing either EINVAL to be returned from io_getevents, or EFAULT as the result of the I/O. This patch passes a compat flag to io_submit to signal that pointer conversion is necessary for a given iocb array. A variant of this was tested by Michael Tokarev. I have also updated the libaio test harness to exercise this code path with good success. Further, I grabbed a copy of ltp and ran the testcases/kernel/syscall/readv and writev tests there (compiled with -m32 on my 64bit system). All seems happy, but extra eyes on this would be welcome. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix CONFIG_COMPAT=n build] Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Reported-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Zach Brown <zach.brown@oracle.com> Cc: <stable@kernel.org> [2.6.35.1] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27compat: factor out compat_rw_copy_check_uvector from compat_do_readv_writevJeff Moyer
It was reported in http://lkml.org/lkml/2010/3/8/309 that 32 bit readv and writev AIO operations were not functioning properly. It turns out that the code to convert the 32bit io vectors to 64 bits was never written. The results of that can be pretty bad, but in my testing, it mostly ended up in generating EFAULT as we walked off the list of I/O vectors provided. This patch set fixes the problem in my environment. are greatly appreciated. This patch: Factor out code that will be used by both compat_do_readv_writev and the compat aio submission code paths. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Reported-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Zach Brown <zach.brown@oracle.com> Cc: <stable@kernel.org> [2.6.35.1] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27dma-mapping: remove deprecated dma_sync_single and dma_sync_sg APIFUJITA Tomonori
Since 2.6.5, it had been commented, 'for backwards compatibility, removed in 2.7.x'. Since 2.6.31, it have been marked as __deprecated. I think that we can remove the API safely now. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27dma-mapping: remove unnecessary sync_single_range_* in dma_map_opsFUJITA Tomonori
sync_single_range_for_cpu and sync_single_range_for_device hooks are unnecessary because sync_single_for_cpu and sync_single_for_device can be used instead. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27swiotlb: remove unnecessary swiotlb_sync_single_range_*FUJITA Tomonori
swiotlb_sync_single_range_for_cpu and swiotlb_sync_single_range_for_device are unnecessary because swiotlb_sync_single_for_cpu and swiotlb_sync_single_for_device can be used instead. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27lib/random32: export pseudo-random number generator for modulesJoe Eykholt
This patch moves the definition of struct rnd_state and the inline __seed() function to linux/random.h. It renames the static __random32() function to prandom32() and exports it for use in modules. prandom32() is useful as a privately-seeded pseudo random number generator that can give the same result every time it is initialized. For FCoE FC-BB-6 VN2VN mode self-selected unique FC address generation, we need an pseudo-random number generator seeded with the 64-bit world-wide port name. A truly random generator or one seeded with randomness won't do because the same sequence of numbers should be generated each time we boot or the link comes up. A prandom32_seed() inline function is added to the header file. It is inlined not for speed, but so the function won't be expanded in the base kernel, but only in the module that uses it. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Acked-by: Matt Mackall <mpm@selenic.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27INIT_SIGHAND: use SIG_DFL instead of NULLOleg Nesterov
Cosmetic, no changes in the compiled code. Just s/NULL/SIG_DFL/ to make it more readable and grep-friendly. Note: probably SIG_IGN makes more sense, we could kill ignore_signals(). But then kernel_init() should do flush_signal_handlers() before exec(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Mathias Krause <Mathias.Krause@secunet.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27pids: init_struct_pid.tasks should never see the swapper processOleg Nesterov
"statically initialize struct pid for swapper" commit 820e45db says: Statically initialize a struct pid for the swapper process (pid_t == 0) and attach it to init_task. This is needed so task_pid(), task_pgrp() and task_session() interfaces work on the swapper process also. OK, but: - it doesn't make sense to add init_task.pids[].node into init_struct_pid.tasks[], and in fact this just wrong. idle threads are special, they shouldn't be visible on any global list. In particular do_each_pid_task(init_struct_pid) shouldn't see swapper. This is the actual reason why kill(0, SIGKILL) from /sbin/init (which starts with 0,0 special pids) crashes the kernel. The signal sent to pgid/sid == 0 must never see idle threads, even if the previous patch fixed the crash itself. - we have other idle threads running on the non-boot CPUs, see the next patch. Change INIT_STRUCT_PID/INIT_PID_LINK to create the empty/unhashed hlist_head/hlist_node. Like any other idle thread swapper can never exit, so detach_pid()->__hlist_del() is not possible, but we could change INIT_PID_LINK() to set pprev = &next if needed. All we need is the valid swapper->pids[].pid == &init_struct_pid. Reported-by: Mathias Krause <mathias.krause@secunet.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Mathias Krause <Mathias.Krause@secunet.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27INIT_TASK() should initialize ->thread_group listOleg Nesterov
The trivial /sbin/init doing int main(void) { kill(0, SIGKILL) } crashes the kernel. This happens because __kill_pgrp_info(init_struct_pid) also sends SIGKILL to the swapper process which runs with the uninitialized ->thread_group. Change INIT_TASK() to initialize ->thread_group properly. Note: the real problem is that the swapper process must not be visible to signals, see the next patch. But this change is right anyway and fixes the crash. Reported-and-tested-by: Mathias Krause <mathias.krause@secunet.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Mathias Krause <Mathias.Krause@secunet.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27pids: increase pid_max based on num_possible_cpusHedi Berriche
On a system with a substantial number of processors, the early default pid_max of 32k will not be enough. A system with 1664 CPU's, there are 25163 processes started before the login prompt. It's estimated that with 2048 CPU's we will pass the 32k limit. With 4096, we'll reach that limit very early during the boot cycle, and processes would stall waiting for an available pid. This patch increases the early maximum number of pids available, and increases the minimum number of pids that can be set during runtime. [akpm@linux-foundation.org: fix warnings] Signed-off-by: Hedi Berriche <hedi@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Robin Holt <holt@sgi.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Machek <pavel@ucw.cz> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Greg KH <gregkh@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: John Stoffel <john@stoffel.org> Cc: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27rapidio: add switch domain routinesAlexandre Bounine
Add switch specific domain routines required for 16-bit routing support in switches with hierarchical implementation of routing tables. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Thomas Moll <thomas.moll@sysgo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27rapidio: modify initialization of switch operationsAlexandre Bounine
Modify the way how RapidIO switch operations are declared. Multiple assignments through the linker script replaced by single initialization call. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Thomas Moll <thomas.moll@sysgo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27rapidio: add enabling SRIO port RX and TXThomas Moll
Add the functionality to enable Input receiver and Output transmitter of every port, to allow non-maintenance traffic. Signed-off-by: Thomas Moll <thomas.moll@sysgo.com> Signed-off-by: Alexandre Bounine <abounine@tundra.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27rapidio: add Port-Write handling for EMAlexandre Bounine
Add RapidIO Port-Write message handling in the context of Error Management Extensions Specification Rev.1.3. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Tested-by: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27rapidio: add IDT CPS/TSI switchesAlexandre Bounine
Extentions to RapidIO switch support: 1. modify switch route operation declarations to allow using single switch-specific file for family of switches that share the same route table operations. 2. add standard route table operations for switches that that support route table manipulation registers as defined in the Rev.1.3 of RapidIO specification. 3. add clear-route-table operation for switches 4. add CPSxx and TSIxxx families of RapidIO switches Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Tested-by: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27ipc/sem.c: cacheline align the ipc spinlock for semaphoresManfred Spraul
Cacheline align the spinlock for sysv semaphores. Without the patch, the spinlock and sem_otime [written by every semop that modified the array] and sem_base [read in the hot path of try_atomic_semop()] can be in the same cacheline. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27notifier: change notifier_from_errno(0) to return NOTIFY_OKAkinobu Mita
This changes notifier_from_errno(0) to be NOTIFY_OK instead of NOTIFY_STOP_MASK | NOTIFY_OK. Currently, the notifiers which return encapsulated errno value have to do something like this: err = do_something(); // returns -errno if (err) return notifier_from_errno(err); else return NOTIFY_OK; This change makes the above code simple: err = do_something(); // returns -errno return return notifier_from_errno(err); Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27proc: turn signal_struct->count into "int nr_threads"Oleg Nesterov
No functional changes, just s/atomic_t count/int nr_threads/. With the recent changes this counter has a single user, get_nr_threads() And, none of its callers need the really accurate number of threads, not to mention each caller obviously races with fork/exit. It is only used to report this value to the user-space, except first_tid() uses it to avoid the unnecessary while_each_thread() loop in the unlikely case. It is a bit sad we need a word in struct signal_struct for this, perhaps we can change get_nr_threads() to approximate the number of threads using signal->live and kill ->nr_threads later. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27proc: get_nr_threads() doesn't need ->siglock any longerOleg Nesterov
Now that task->signal can't go away get_nr_threads() doesn't need ->siglock to read signal->count. Also, make it inline, move into sched.h, and convert 2 other proc users of signal->count to use this (now trivial) helper. Henceforth get_nr_threads() is the only valid user of signal->count, we are ready to turn it into "int nr_threads" or, perhaps, kill it. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27kill the obsolete thread_group_cputime_free() helperOleg Nesterov
Kill the empty thread_group_cputime_free() helper. It was needed to free the per-cpu data which we no longer have. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Roland McGrath <roland@redhat.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27signals: kill the awful task_rq_unlock_wait() hackOleg Nesterov
Now that task->signal can't go away we can revert the horrible hack added by ad474caca3e2a0550b7ce0706527ad5ab389a4d4 ("fix for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock"). And we can do more cleanups sched_stats.h/posix-cpu-timers.c later. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27signals: make task_struct->signal immutable/refcountableOleg Nesterov
We have a lot of problems with accessing task_struct->signal, it can "disappear" at any moment. Even current can't use its ->signal safely after exit_notify(). ->siglock helps, but it is not convenient, not always possible, and sometimes it makes sense to use task->signal even after this task has already dead. This patch adds the reference counter, sigcnt, into signal_struct. This reference is owned by task_struct and it is dropped in __put_task_struct(). Perhaps it makes sense to export get/put_signal_struct() later, but currently I don't see the immediate reason. Rename __cleanup_signal() to free_signal_struct() and unexport it. With the previous changes it does nothing except kmem_cache_free(). Change __exit_signal() to not clear/free ->signal, it will be freed when the last reference to any thread in the thread group goes away. Note: - when the last thead exits signal->tty can point to nowhere, see the next patch. - with or without this patch signal_struct->count should go away, or at least it should be "int nr_threads" for fs/proc. This will be addressed later. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27exit: change zap_other_threads() to count sub-threadsOleg Nesterov
Change zap_other_threads() to return the number of other sub-threads found on ->thread_group list. Other changes are cosmetic: - change the code to use while_each_thread() helper - remove the obsolete comment about SIGKILL/SIGSTOP Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27umh: creds: kill subprocess_info->cred logicOleg Nesterov
Now that nobody ever changes subprocess_info->cred we can kill this member and related code. ____call_usermodehelper() always runs in the context of freshly forked kernel thread, it has the proper ->cred copied from its parent kthread, keventd. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27umh: creds: convert call_usermodehelper_keys() to use subprocess_info->init()Oleg Nesterov
call_usermodehelper_keys() uses call_usermodehelper_setkeys() to change subprocess_info->cred in advance. Now that we have info->init() we can change this code to set tgcred->session_keyring in context of execing kernel thread. Note: since currently call_usermodehelper_keys() is never called with UMH_NO_WAIT, call_usermodehelper_keys()->key_get() and umh_keys_cleanup() are not really needed, we could rely on install_session_keyring_to_cred() which does key_get() on success. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27exec: replace call_usermodehelper_pipe with use of umh init function and ↵Neil Horman
resolve limit The first patch in this series introduced an init function to the call_usermodehelper api so that processes could be customized by caller. This patch takes advantage of that fact, by customizing the helper in do_coredump to create the pipe and set its core limit to one (for our recusrsion check). This lets us clean up the previous uglyness in the usermodehelper internals and factor call_usermodehelper out entirely. While I'm at it, we can also modify the helper setup to look for a core limit value of 1 rather than zero for our recursion check Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27kmod: add init function to usermodehelperNeil Horman
About 6 months ago, I made a set of changes to how the core-dump-to-a-pipe feature in the kernel works. We had reports of several races, including some reports of apps bypassing our recursion check so that a process that was forked as part of a core_pattern setup could infinitely crash and refork until the system crashed. We fixed those by improving our recursion checks. The new check basically refuses to fork a process if its core limit is zero, which works well. Unfortunately, I've been getting grief from maintainer of user space programs that are inserted as the forked process of core_pattern. They contend that in order for their programs (such as abrt and apport) to work, all the running processes in a system must have their core limits set to a non-zero value, to which I say 'yes'. I did this by design, and think thats the right way to do things. But I've been asked to ease this burden on user space enough times that I thought I would take a look at it. The first suggestion was to make the recursion check fail on a non-zero 'special' number, like one. That way the core collector process could set its core size ulimit to 1, and enable the kernel's recursion detection. This isn't a bad idea on the surface, but I don't like it since its opt-in, in that if a program like abrt or apport has a bug and fails to set such a core limit, we're left with a recursively crashing system again. So I've come up with this. What I've done is modify the call_usermodehelper api such that an extra parameter is added, a function pointer which will be called by the user helper task, after it forks, but before it exec's the required process. This will give the caller the opportunity to get a call back in the processes context, allowing it to do whatever it needs to to the process in the kernel prior to exec-ing the user space code. In the case of do_coredump, this callback is ues to set the core ulimit of the helper process to 1. This elimnates the opt-in problem that I had above, as it allows the ulimit for core sizes to be set to the value of 1, which is what the recursion check looks for in do_coredump. This patch: Create new function call_usermodehelper_fns() and allow it to assign both an init and cleanup function, as we'll as arbitrary data. The init function is called from the context of the forked process and allows for customization of the helper process prior to calling exec. Its return code gates the continuation of the process, or causes its exit. Also add an arbitrary data pointer to the subprocess_info struct allowing for data to be passed from the caller to the new process, and the subsequent cleanup process Also, use this patch to cleanup the cleanup function. It currently takes an argp and envp pointer for freeing, which is ugly. Lets instead just make the subprocess_info structure public, and pass that to the cleanup and init routines Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27cpusets: randomize node rotor used in cpuset_mem_spread_node()Jack Steiner
Some workloads that create a large number of small files tend to assign too many pages to node 0 (multi-node systems). Part of the reason is that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at node 0 for newly created tasks. This patch changes the rotor to be initialized to a random node number of the cpuset. [akpm@linux-foundation.org: fix layout] [Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration] Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Paul Menage <menage@google.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27cpusets: new round-robin rotor for SLAB allocationsJack Steiner
We have observed several workloads running on multi-node systems where memory is assigned unevenly across the nodes in the system. There are numerous reasons for this but one is the round-robin rotor in cpuset_mem_spread_node(). For example, a simple test that writes a multi-page file will allocate pages on nodes 0 2 4 6 ... Odd nodes are skipped. (Sometimes it allocates on odd nodes & skips even nodes). An example is shown below. The program "lfile" writes a file consisting of 10 pages. The program then mmaps the file & uses get_mempolicy(..., MPOL_F_NODE) to determine the nodes where the file pages were allocated. The output is shown below: # ./lfile allocated on nodes: 2 4 6 0 1 2 6 0 2 There is a single rotor that is used for allocating both file pages & slab pages. Writing the file allocates both a data page & a slab page (buffer_head). This advances the RR rotor 2 nodes for each page allocated. A quick confirmation seems to confirm this is the cause of the uneven allocation: # echo 0 >/dev/cpuset/memory_spread_slab # ./lfile allocated on nodes: 6 7 8 9 0 1 2 3 4 5 This patch introduces a second rotor that is used for slab allocations. Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Paul Menage <menage@google.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>