asmadeus/linux.git - The linux kernel

Age	Commit message (Collapse)	Author
2011-05-10	ipv4: Pass explicit daddr arg to ip_send_reply().	David S. Miller
	This eliminates an access to rt->rt_src. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10	Merge branch 2.6.39-rc7 into usb-linus	Greg Kroah-Hartman
	This was needed to resolve a conflict in: drivers/usb/host/isp1760-hcd.c Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-10	tipc: Abort excessive send requests as early as possible	Allan Stephens
	Adds checks to TIPC's socket send routines to promptly detect and abort attempts to send more than 66,000 bytes in a single TIPC message or more than 2**31-1 bytes in a single TIPC byte stream request. In addition, this ensures that the number of iovecs in a send request does not exceed the limits of a standard integer variable. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-05-10	bcma: add Broadcom specific AMBA bus driver	Rafał Miłecki
	Broadcom has released cards based on a new AMBA-based bus type. From a programming point of view, this new bus type differs from AMBA and does not use AMBA common registers. It also differs enough from SSB. We decided that a new bus driver is needed to keep the code clean. In its current form, the driver detects devices present on the bus and registers them in the system. It allows registering BCMA drivers for specified bus devices and provides them basic operations. The bus driver itself includes two important bus managing drivers: ChipCommon core driver and PCI(c) core driver. They are early used to allow correct initialization. Currently code is limited to supporting buses on PCI(e) devices, however the driver is designed to be used also on other hosts. The host abstraction layer is implemented and already used for PCI(e). Support for PCI(e) hosts is working and seems to be stable (access to 80211 core was tested successfully on a few devices). We can still optimize it by using some fixed windows, but this can be done later without affecting any external code. Windows are just ranges in MMIO used for accessing cores on the bus. Cc: Greg KH <greg@kroah.com> Cc: Michael Büsch <mb@bu3sch.de> Cc: Larry Finger <Larry.Finger@lwfinger.net> Cc: George Kashperko <george@znau.edu.ua> Cc: Arend van Spriel <arend@broadcom.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Russell King <rmk@arm.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Botting <andy@andybotting.com> Cc: linuxdriverproject <devel@linuxdriverproject.org> Cc: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org> Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-05-10	Merge commit 'v2.6.39-rc7' into perf/core	Ingo Molnar
	Merge reason: pull in the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-10	nilfs2: add ioctl which limits range of segment to be allocated	Ryusuke Konishi
	This adds a new ioctl command which limits range of segment to be allocated. This is intended to gather data whithin a range of the partition before shrinking the filesystem, or to control new log location for some purpose. If a range is specified by the ioctl, segment allocator of nilfs tries to allocate new segments from the range unless no free segments are available there. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-05-10	nilfs2: super root size should change depending on inode size	Ryusuke Konishi
	The size of super root structure depends on inode size, so NILFS_SR_BYTES macro should be a function of the inode size. This fixes the issue. Even though a different size value will be written for a possible future filesystem with extended inode, but fortunately this does not break disk format compatibility. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-05-10	Merge branches 'dma-debug/next', 'amd-iommu/command-cleanups', ↵	Joerg Roedel
	'amd-iommu/ats' and 'amd-iommu/extended-features' into iommu/2.6.40 Conflicts: arch/x86/include/asm/amd_iommu_types.h arch/x86/kernel/amd_iommu.c arch/x86/kernel/amd_iommu_init.c
2011-05-10	treewide: fix a few typos in comments	Justin P. Mattock
	- kenrel -> kernel - whetehr -> whether - ttt -> tt - sss -> ss Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-05-10	IPVS: init and cleanup restructuring	Hans Schillstrom
	DESCRIPTION This patch tries to restore the initial init and cleanup sequences that was before namspace patch. Netns also requires action when net devices unregister which has never been implemented. I.e this patch also covers when a device moves into a network namespace, and has to be released. IMPLEMENTATION The number of calls to register_pernet_device have been reduced to one for the ip_vs.ko Schedulers still have their own calls. This patch adds a function __ip_vs_service_cleanup() and an enable flag for the netfilter hooks. The nf hooks will be enabled when the first service is loaded and never disabled again, except when a namespace exit starts. Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg> [horms@verge.net.au: minor edit to changelog] Signed-off-by: Simon Horman <horms@verge.net.au>
2011-05-10	ALSA: tea575x: unify read/write functions	Ondrej Zary
	Implement generic read/write functions to access TEA575x tuners. They're now implemented 4 times (once in es1968 and 3 times in fm801). This also allows mute to work on all cards. Also improve tuner detection/initialization. Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2011-05-09	RDMA/iwcm: Get rid of enum iw_cm_event_status	Roland Dreier
	The IW_CM_EVENT_STATUS_xxx values were used in only a couple of places; cma.c uses -Exxx values instead, and so do the amso1100, cxgb3 and cxgb4 drivers -- only nes was using the enum values (with the mild consequence that all nes connection failures were treated as generic errors rather than reported as timeouts or rejections). We can fix this confusion by getting rid of enum iw_cm_event_status and using a plain int for struct iw_cm_event.status, and converting nes to use -Exxx as the other iWARP drivers do. This also gets rid of the warning drivers/infiniband/core/cma.c: In function 'cma_iw_handler': drivers/infiniband/core/cma.c:1333:3: warning: case value '4294967185' not in enumerated type 'enum iw_cm_event_status' drivers/infiniband/core/cma.c:1336:3: warning: case value '4294967186' not in enumerated type 'enum iw_cm_event_status' drivers/infiniband/core/cma.c:1332:3: warning: case value '4294967192' not in enumerated type 'enum iw_cm_event_status' Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Faisal Latif <faisal.latif@intel.com>
2011-05-09	RDMA/cma: Add an ID_REUSEADDR option	Hefty, Sean
	Lustre requires that clients bind to a privileged port number before connecting to a remote server. On larger clusters (typically more than about 1000 nodes), the number of privileged ports is exhausted, resulting in lustre being unusable. To handle this, we add support for reusable addresses to the rdma_cm. This mimics the behavior of the socket option SO_REUSEADDR. A user may set an rdma_cm_id to reuse an address before calling rdma_bind_addr() (explicitly or implicitly). If set, other rdma_cm_id's may be bound to the same address, provided that they all have reuse enabled, and there are no active listens. If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it will only succeed if there are no other id's bound to that same address. The reuse option is exported to user space. The behavior of the kernel reuse implementation was verified against that given by sockets. This patch is derived from a path by Ira Weiny <weiny2@llnl.gov> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-09	ACPICA: Update to version 20110413	Bob Moore
	Version 20110413 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-05-09	ACPICA: Execute an orphan _REG method under the EC device	Bob Moore
	This change will force the execution of a _REG method underneath the EC device even if there is no corresponding operation region of type EmbeddedControl. Fixes a problem seen on some machines and apparently is compatible with Windows behavior. http://www.acpica.org/bugzilla/show_bug.cgi?id=875 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-05-09	ACPICA: Move ACPI_NUM_PREDEFINED_REGIONS to a more appropriate place	Bob Moore
	Moved to where the predefined regions are actually defined. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-05-09	ACPICA: Update internal address SpaceID for DataTable regions	Bob Moore
	Moved this internal space id in preparation for ACPI 5.0 changes that will include some new space IDs. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2011-05-09	Don't lock guardpage if the stack is growing up	Mikulas Patocka
	Linux kernel excludes guard page when performing mlock on a VMA with down-growing stack. However, some architectures have up-growing stack and locking the guard page should be excluded in this case too. This patch fixes lvm2 on PA-RISC (and possibly other architectures with up-growing stack). lvm2 calculates number of used pages when locking and when unlocking and reports an internal error if the numbers mismatch. [ Patch changed fairly extensively to also fix /proc/<pid>/maps for the grows-up case, and to move things around a bit to clean it all up and share the infrstructure with the /proc bits. Tested on ia64 that has both grow-up and grow-down segments - Linus ] Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Tested-by: Tony Luck <tony.luck@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-09	net: add mac_pton() for parsing MAC address	Alexey Dobriyan
	mac_pton() parses MAC address in form XX:XX:XX:XX:XX:XX and only in that form. mac_pton() doesn't dirty result until it's sure string representation is valid. mac_pton() doesn't care about characters _after_ last octet, it's up to caller to deal with it. mac_pton() diverges from 0/-E return value convention. Target usage: if (!mac_pton(str, whatever->mac)) return -EINVAL; /* ->mac being u8 [ETH_ALEN] is filled at this point. / / optionally check str[3 * ETH_ALEN - 1] for termination */ Use mac_pton() in pktgen and netconsole for start. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-09	vlan: remove one synchronize_net() call	Eric Dumazet
	At VLAN dismantle phase, unregister_vlan_dev() makes one synchronize_net() call after vlan_group_set_device(grp, vlan_id, NULL). This call can be safely removed because we are calling unregister_netdevice_queue() to queue device for deletion, and this process needs at least one rcu grace period to complete. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ben Greear <greearb@candelatech.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jesse Gross <jesse@nicira.com> Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-09	garp: remove one synchronize_rcu() call	Eric Dumazet
	Speedup vlan dismantling in CONFIG_VLAN_8021Q_GVRP=y cases, by using a call_rcu() to free the memory instead of waiting with expensive synchronize_rcu() [ while RTNL is held ] Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ben Greear <greearb@candelatech.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-09	Merge branch 'drm-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: drm/radeon/kms: add pci id to acer travelmate quirk for 5730 drm/radeon: fix order of doing things in radeon_crtc_cursor_set drm: mm: fix debug output drm/radeon/kms: ATPX switcheroo fixes drm/nouveau: Fix a crash at card takedown for NV40 and older cards
2011-05-08	ipv4: Pass flow key down into ip_append_*().	David S. Miller
	This way rt->rt_dst accesses are unnecessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	ipv4: Pass flow keys down into datagram packet building engine.	David S. Miller
	This way ip_output.c no longer needs rt->rt_{src,dst}. We already have these keys sitting, ready and waiting, on the stack or in a socket structure. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-09	mxm/wmi: add MXMX interface entry point.	Dave Airlie
	The MXMX method appears to be a mutex of some sort. Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-05-08	ide: Use linux/mutex.h	Anton Blanchard
	The IDE code is still including asm/mutex.h instead of linux/mutex.h Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-09	drm: mm: fix debug output	Daniel Vetter
	The looping helper didn't do anything due to a superficial semicolon. Furthermore one of the two dump functions suffered from copy&paste fail. While staring at the code I've also noticed that the replace helper (currently unused) is a bit broken. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-05-08	net: Allow ethtool to set interface in loopback mode.	Mahesh Bandewar
	This patch enables ethtool to set the loopback mode on a given interface. By configuring the interface in loopback mode in conjunction with a policy route / rule, a userland application can stress the egress / ingress path exposing the flows of the change in progress and potentially help developer(s) understand the impact of those changes without even sending a packet out on the network. Following set of commands illustrates one such example - a) ip -4 addr add 192.168.1.1/24 dev eth1 b) ip -4 rule add from all iif eth1 lookup 250 c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250 d) arp -Ds 192.168.1.100 eth1 e) arp -Ds 192.168.1.200 eth1 f) sysctl -w net.ipv4.ip_nonlocal_bind=1 g) sysctl -w net.ipv4.conf.all.accept_local=1 # Assuming that the machine has 8 cores h) taskset 000f netserver -L 192.168.1.200 i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30 Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	ethtool: Add 20G bit definitions	Yaniv Rosner
	Add 20G supported and advertising bit definitions. 20G will be supported with the 57840 chips. Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> ------ include/linux/ethtool.h \| 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	inet: Pass flowi to ->queue_xmit().	David S. Miller
	This allows us to acquire the exact route keying information from the protocol, however that might be managed. It handles all of the possibilities, from the simplest case of storing the key in inet->cork.fl to the more complex setup SCTP has where individual transports determine the flow. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	ipv4: Create inet_csk_route_child_sock().	David S. Miller
	This is just like inet_csk_route_req() except that it operates after we've created the new child socket. In this way we can use the new socket's cork flow for proper route key storage. This will be used by DCCP and TCP child socket creation handling. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	sctp: Store a flowi in transports to provide persistent keying.	David S. Miller
	Several future simplifications are possible now because of this. For example, the sctp_addr unions can simply refer directly to the flowi information. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08	ethtool: remove phys_id from ethtool_ops	Stephen Hemminger
	After that all the upstream kernel drivers now use phys_id, and the old ethtool_ops interface (phys_id) can be removed. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-07	net,rcu: convert call_rcu(sctp_local_addr_free) to kfree_rcu()	Lai Jiangshan
	The rcu callback sctp_local_addr_free() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(sctp_local_addr_free). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-07	Merge branch 'perf-fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf tools: Makefile: Use gcc to determine ARCH perf events, x86: Fix Intel Nehalem and Westmere last level cache event definitions hw_breakpoints, powerpc: Fix CONFIG_HAVE_HW_BREAKPOINT off-case in ptrace_set_debugreg() sh, hw_breakpoints: Fix racy access to ptrace breakpoints arm, hw_breakpoints: Fix racy access to ptrace breakpoints powerpc, hw_breakpoints: Fix racy access to ptrace breakpoints x86, hw_breakpoints: Fix racy access to ptrace breakpoints ptrace: Prepare to fix racy accesses on task breakpoints
2011-05-07	slub: Remove CONFIG_CMPXCHG_LOCAL ifdeffery	Christoph Lameter
	Remove the #ifdefs. This means that the irqsafe_cpu_cmpxchg_double() is used everywhere. There may be performance implications since: A. We now have to manage a transaction ID for all arches B. The interrupt holdoff for arches not supporting CONFIG_CMPXCHG_LOCAL is reduced to a very short irqoff section. There are no multiple irqoff/irqon sequences as a result of this change. Even in the fallback case we only have to do one disable and enable like before. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2011-05-06	block: Remove extra discard_alignment from hd_struct.	Tao Ma
	Currently, hd_struct.discard_alignment is only used when we show /sys/block/sdx/sdx/discard_alignment. So remove it and calculate when it is asked to show. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06	USB: OTG: msm: Add PHY suspend support for MSM8960	Pavankumar Kondeti
	Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-06	USB: OTG: msm: Implement charger detection	Pavankumar Kondeti
	Implement good battery algorithm defined in the battery charging V1.2 spec for detecting different charging ports. USB hardware is put into low power mode when connected to a dedicated charging port. vbus_draw and set_power methods are implemented for determining the allowed current from Host in different states (un-configured/suspend/configured). The charger block is implemented using vendor specific registers and the PHY used in MSM8960(28nm PHY) different from older targets like MSM8x60 and MSM7x30(45nm PHY). The PHY vendor and product id registers are not implemented in the above chipsets. Hence PHY type is passed via platform data. Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-06	USB: OTG: msm: vote for dayatona fabric clock	Anji jonnala
	HSUSB core clock is derived from daytona fabric clock and for HSUSB operational require minimum core clock at 55MHz. Since, HSUSB cannot tolerate daytona fabric clock change in the middle of HSUSB operational, vote for maximum Daytona fabric clock while usb is operational Signed-off-by: Anji jonnala <anjir@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-06	usb: fix building musb drivers	Anatolij Gustschin
	Commit 3dacdf11 "usb: factor out state_string() on otg drivers" broke building musb drivers since there is already another otg_state_string() function in musb drivers, but with different prototype. Fix musb drivers to use common otg_state_string(), too. Also provide a nop for otg_state_string() if CONFIG_USB_OTG_UTILS is not defined. Signed-off-by: Anatolij Gustschin <agust@denx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-06	driver core: Add the device driver-model structures to kerneldoc	Wanlong Gao
	Add the comments to the structure bus_type, device_driver, device, class to device.h for generating the driver-model kerneldoc. With another patch these all removed from the files in Documentation/driver-model/ since they are out of date. That will keep things up to date and provide a better way to document this stuff. Signed-off-by: Wanlong Gao <wanlong.gao@gmail.com> Acked-by: Harry Wei <harryxiyou@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-06	inet: Decrease overhead of on-stack inet_cork.	David S. Miller
	When we fast path datagram sends to avoid locking by putting the inet_cork on the stack we use up lots of space that isn't necessary. This is because inet_cork contains a "struct flowi" which isn't used in these code paths. Split inet_cork to two parts, "inet_cork" and "inet_cork_full". Only the latter of which has the "struct flowi" and is what is stored in inet_sock. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2011-05-06	Regression: partial revert "tracing: Remove lock_depth from event entry"	Arjan van de Ven
	This partially reverts commit e6e1e2593592a8f6f6380496655d8c6f67431266. That commit changed the structure layout of the trace structure, which in turn broke PowerTOP (1.9x generation) quite badly. I appreciate not wanting to expose the variable in question, and PowerTOP was not using it, so I've replaced the variable with just a padding field - that way if in the future a new field is needed it can just use this padding field. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-06	Merge branch 'perf/stat' into perf/core	Ingo Molnar
	Merge reason: the perf stat improvements are tested and ready now. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-06	block: hold queue if flush is running for non-queueable flush drive	shaohua.li@intel.com
	In some drives, flush requests are non-queueable. When flush request is running, normal read/write requests can't run. If block layer dispatches such request, driver can't handle it and requeue it. Tejun suggested we can hold the queue when flush is running. This can avoid unnecessary requeue. Also this can improve performance. For example, we have request flush1, write1, flush 2. flush1 is dispatched, then queue is hold, write1 isn't inserted to queue. After flush1 is finished, flush2 will be dispatched. Since disk cache is already clean, flush2 will be finished very soon, so looks like flush2 is folded to flush1. In my test, the queue holding completely solves a regression introduced by commit 53d63e6b0dfb95882ec0219ba6bbd50cde423794: block: make the flush insertion use the tail of the dispatch list It's not a preempt type request, in fact we have to insert it behind requests that do specify INSERT_FRONT. which causes about 20% regression running a sysbench fileio workload. Stable: 2.6.39 only Cc: stable@kernel.org Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-06	block: add a non-queueable flush flag	shaohua.li@intel.com
	flush request isn't queueable in some drives. Add a flag to let driver notify block layer about this. We can optimize flush performance with the knowledge. Stable: 2.6.39 only Cc: stable@kernel.org Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-05	rcu: provide rcu_virt_note_context_switch() function.	Gleb Natapov
	Provide rcu_virt_note_context_switch() for vitalization use to note quiescent state during guest entry. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-05-05	rcu: introduce kfree_rcu()	Lai Jiangshan
	Many rcu callbacks functions just call kfree() on the base structure. These functions are trivial, but their size adds up, and furthermore when they are used in a kernel module, that module must invoke the high-latency rcu_barrier() function at module-unload time. The kfree_rcu() function introduced by this commit addresses this issue. Rather than encoding a function address in the embedded rcu_head structure, kfree_rcu() instead encodes the offset of the rcu_head structure within the base structure. Because the functions are not allowed in the low-order 4096 bytes of kernel virtual memory, offsets up to 4095 bytes can be accommodated. If the offset is larger than 4095 bytes, a compile-time error will be generated in __kfree_rcu(). If this error is triggered, you can either fall back to use of call_rcu() or rearrange the structure to position the rcu_head structure into the first 4096 bytes. Note that the allowable offset might decrease in the future, for example, to allow something like kmem_cache_free_rcu(). The new kfree_rcu() function can replace code as follows: call_rcu(&p->rcu, simple_kfree_callback); where "simple_kfree_callback()" might be defined as follows: void simple_kfree_callback(struct rcu_head p) { struct foo q = container_of(p, struct foo, rcu); kfree(q); } with the following: kfree_rcu(&p->rcu, rcu); Note that the "rcu" is the name of a field in the structure being freed. The reason for using this rather than passing in a pointer to the base structure is that the above approach allows better type checking. This commit is based on earlier work by Lai Jiangshan and Manfred Spraul: Lai's V1 patch: http://lkml.org/lkml/2008/9/18/1 Manfred's patch: http://lkml.org/lkml/2009/1/2/115 Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05	rcu: add DEBUG_OBJECTS_RCU_HEAD check for alignment	Paul E. McKenney
	Verify that rcu_head structures are aligned to a four-byte boundary. This check is enabled by CONFIG_DEBUG_OBJECTS_RCU_HEAD. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>