summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2006-09-02[SCSI] SCSI and FC Transport: add netlink support for posting of transport ↵James Smart
events This patch formally adds support for the posting of FC events via netlink. It is a followup to the original RFC at: http://marc.theaimsgroup.com/?l=linux-scsi&m=114530667923464&w=2 and the initial posting at: http://marc.theaimsgroup.com/?l=linux-scsi&m=115507374832500&w=2 The patch has been updated to optimize the send path, per the discussions in the initial posting. Per discussions at the Storage Summit and at OLS, we are to use netlink for async events from transports. Also per discussions, to avoid a netlink protocol per transport, I've create a single NETLINK_SCSITRANSPORT protocol, which can then be used by all transports. This patch: - Creates new files scsi_netlink.c and scsi_netlink.h, which contains the single and shared definitions for the SCSI Transport. It is tied into the base SCSI subsystem intialization. Contains a single interface routine, scsi_send_transport_event(), for a transport to send an event (via multicast to a protocol specific group). - Creates a new scsi_netlink_fc.h file, which contains the FC netlink event messages - Adds 3 new routines to the fc transport: fc_get_event_number() - to get a FC event # fc_host_post_event() - to send a simple FC event (32 bits of data) fc_host_post_vendor_event() - to send a Vendor unique event, with arbitrary amounts of data. Note: the separation of event number allows for a LLD to send a standard event, followed by vendor-specific data for the event. Note: This patch assumes 2 prior fc transport patches have been installed: http://marc.theaimsgroup.com/?l=linux-scsi&m=115555807316329&w=2 http://marc.theaimsgroup.com/?l=linux-scsi&m=115581614930261&w=2 Sorry - next time I'll do something like making these individual patches of the same posting when I know they'll be posted closely together. Signed-off-by: James Smart <James.Smart@emulex.com> Tidy up configuration not to make SCSI always select NET Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] add failure return to scsi_init_shared_tag_map()James Bottomley
And use it in the stex driver. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] stex: add shared tags from blockEd Lin
Use block shared tags entirely within the driver. In the case of shutdown, assume that there are no other outstanding commands, so tag 0 is fine. Signed-off-by: Ed Lin <ed.lin@promise.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] Add Promise SuperTrak driverJeff Garzik
Add Promise SuperTrak 'stex' driver, supporting SuperTrak EX8350/8300/16350/16300 controllers. The controller's firmware accepts SCSI commands, handing them to the underlying RAID or JBOD disks. The driver consisted of the following cleanups and fixes, beyond its initial submission: Ed Lin: stex: cleanup and minor fixes stex: add new device ids stex: update internal copy code path stex: add hard reset function stex: adjust command timeout in slave_config routine stex: use more efficient method for unload/shutdown flush Jeff Garzik: [SCSI] Add Promise SuperTrak 'shasta' driver. Rename drivers/scsi/shasta.c to stex.c ("SuperTrak EX"). [SCSI] stex: update with community comments from 'Promise SuperTrak' thread [SCSI] stex: Fix warning, trim trailing whitespace. [SCSI] stex: remove last remnants of "shasta" project code name [SCSI] stex: removed 6-byte command emulation [SCSI] stex: minor cleanups [SCSI] stex: minor fixes: irq flag, error return value [SCSI] stex: use dma_alloc_coherent() Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] Wrong size information for devices with disabled read accessHannes Reinecke
When accessing a device with disabled read access the capacity is set randomly to 1GB. This makes it impossible to userspace tools to detect invalid device capacities. Signed-off-by: Mike Anderson <andmike@us.ibm.com> Acked-by: Chris Mason <mason@suse.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] iscsi class: update versionMike Christie
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] libiscsi: don't call into lld to cleanup taskMike Christie
In the normal IO path we should not be calling back into the LLD since the LLD will have cleaned up the task before or after calling complete pdu. For the fail_command path we still need to do this to force the cleanup. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] libiscsi: check that command ptr is set before accessing itMike Christie
If the scsi eh sends a TUR and the session is down we could return SCSI_ML_HOST_BUSY. scsi eh will ignore this and send ask us to abort the command and we blindly accesst the command ptr. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] iscsi_tcp: fix partial digest recvMike Christie
When a digest is spread across two network buffers, we currently ignore this and try to check the digest with the partial buffer. Or course this fails. This patch has use iscsi_tcp_copy to copy the whole digest before testing it. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] libiscsi: only check burst lengths when sending unsol dataMike Christie
The first burst length is only relevant if immedate data = Yes or if Initial R2T is No Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] iscsi_tcp: update header size during reloginMike Christie
When we relogin to a target, we have not yet negotiated digests so we must reset the hdr_size var. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] iscsi_tcp: fix header resendMike Christie
This patch built over the last ones fixes a bug in the partial header resend code, where we add on another 4 bytes to the send length on the resend. We want just the header plus digest. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] scsi_tcp: rm data rx and tx tfmsMike Christie
We currently allocated seperate tfms for data and header digests. There is no reason for this since we can never calculate a rx header and digest at the same time. Same for sends. So this patch removes the data tfms and has the send and recv sides use the rx_tfm or tx_tfm. I also made the connection creation code preallocate the tfms because I thought I hit a bug where I changed the digests settings during a relogin but could not allocate the tfm and then we just failed. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] iscsi_tcp: fix padding, data digests, and IO at weird offsetsMike Christie
iscsi_tcp calculates padding by using the expected transfer length. This has the problem where if we have immediate data = no and initial R2T = yes, and the transfer length ended up needing padding then we send: 1. header 2. padding which should have gone after data 3. data Besides this bug, we also assume the target will always ask for nice transfer lengths and the first burst length will always be a nice value. As far as I can tell form the RFC this is not a requirement. It would be silly to do this, but if someone did it we will end doing bad things. Finally the last bug in that bit of code is in our handling of the recalculation of data digests when we do not send a whole iscsi_buf in one try. The bug here is that we call crypto_digest_final on a iscsi_sendpage error, then when we send the rest of the iscsi_buf, we doiscsi_data_digest_init and this causes the previous data digest to be lost. And to make matters worse, some of these bugs are replicated over and over and over again for immediate data, solicited data and unsolicited data. So the attached patch made over the iscsi git tree (see kernel.org/git for details) which I updated today to include the patches I said I merged, consolidates the sending of data, padding and digests and calculation of data digests and fixes the above bugs. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] attempt to complete r2t with data len greater than max burstMike Christie
A couple targets like string bean and MDS, send r2ts with a data len greater than the max burst we agreed to. We were being strict in our enforcing of the iscsi rfc in that code path, but there is no driver limitation that prevents us from fullfilling the request. To allow those targets to work we will ignore the max_burst length and send as much data as the target asks for assuming it has consciously decided to override its max burst length. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] add refcouting around ctask usage in main IO patchMike Christie
It is possible that a ctask could be completing and getting cleaned up at the same time, we are finishing up the last data transfer. This could then result in the data transfer code using stale or invalid values. This patch adds a refcount to the ctask. When the count goes to zero then we know the transmit thread and recv thread or softirq are not touching it and we can safely release it. The eh should not need to grab a reference because it only cleans up a task if it has both the xmit mutex and recv lock (or recv side suspended). Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] libiscsi, iscsi_tcp, iscsi_iser: check that burst lengths are valid.Mike Christie
iSCSI RFC states that the first burst length must be smaller than the max burst length. We currently assume targets will be good, but that may not be the case, so this patch adds a check. This patch also moves the unsol data out offset to the lib so the LLDs do not have to track it. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-02[SCSI] SCSI: sanitize INQUIRY stringsAlan Stern
Sanitize the Vendor, Product, and Revision strings contained in an INQUIRY result by setting all non-graphic or non-ASCII characters to ' '. Since the standard disallows such characters, this will affect only non-compliant devices. To help maintain backward compatibility, NUL characters are treated specially. They are taken as string terminators; they and all the following characters are set to ' '. If some valid characters get erased as a result... well, we weren't seeing them before so we haven't lost anything. The primary purpose of this change is to allow blacklist entries to match devices with illegal Vendor or Product strings. In addition, the patch updates a couple of function prototypes, giving inq_result its correct type (unsigned char *). Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-09-01[SCSI] sd: fix cache flushing on module removal (and individual device removal)James Bottomley
The fix isn't actually in sd: it's in scsi_device_get(). I modified it to allow devices to be returned in SDEV_CANCEL, but not SDEV_DEL. This means that the device_remove_driver, which occurs in device_del() in scsi_remove_device() after the device has gone into SDEV_CANCEL is now effective at flushing the cache. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-08-31[SCSI] add shared tag map helpersJames Bottomley
This patch adds support for sharing tag maps at the host level (i.e. either every queue [LUN] has its own tag map or there's a single one for the entire host). This formulation is primarily intended to help single issue queue hardware, like the aic7xxx Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-08-31[SCSI] block: add support for shared tag mapsJames Bottomley
The current block queue implementation already contains most of the machinery for shared tag maps. The only remaining pieces are a way to allocate and destroy a tag map independently of the queues (so that the maps can be managed on the life cycle of the overseeing entity) Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-08-30[SCSI] aic94xx: Increase can_queue for better performanceDarrick J. Wong
This patch sets can_queue in the aic94xx driver's scsi_host to better performing values than what's there currently. It seems that asd_ha->seq.can_queue reflects the number of requests that can be queued per controller; so long as there's one scsi_host per controller, it seems logical that the scsi_host ought to have the same can_queue value. To the best of my (still limited) knowledge, this method provides the correct value. The effect of leaving this value set to 1 is terrible performance in the case of either (a) certain Maxtor SAS drives flying solo or (b) flooding several disks with I/O simultaneously (md-raid). There may be more scenarios where we see similar problems that I haven't uncovered. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-08-30[SCSI] aic94xx: add MODULE_FIRMWARE tagJames Bottomley
Add a tag which shows what the firmware file we're requesting is. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-08-30[SCSI] MODULE_FIRMWARE for binary firmware(s)Jon Masters
Right now, various kernel modules are being migrated over to use request_firmware in order to pull in binary firmware blobs from userland when the module is loaded. This makes sense. However, there is right now little mechanism in place to automatically determine which binary firmware blobs must be included with a kernel in order to satisfy the prerequisites of these drivers. This affects vendors, but also regular users to a certain extent too. The attached patch introduces MODULE_FIRMWARE as a mechanism for advertising that a particular firmware file is to be loaded - it will then show up via modinfo and could be used e.g. when packaging a kernel. Signed-off-by: Jon Masters <jcm@redhat.com> Comments added in line with all the other MODULE_ tag Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-08-29[SCSI] aic94xx: new driverJames Bottomley
This is the end point of the separate aic94xx driver based on the original driver and transport class from Luben Tuikov <ltuikov@yahoo.com> The log of the separate development is: Alexis Bruemmer: o aic94xx: fix hotplug/unplug for expanderless systems o aic94xx: disable split completion timer/setting by default o aic94xx: wide port off expander support o aic94xx: remove various inline functions o aic94xx: use bitops o aic94xx: remove queue comment o aic94xx: remove sas_common.c o aic94xx: sas remove depot's o aic94xx: use available list_for_each_entry_safe_reverse() o aic94xx: sas header file merge James Bottomley: o aic94xx: fix TF_TMF_NO_CTX processing o aic94xx: convert to request_firmware interface o aic94xx: fix hotplug/unplug o aic94xx: add link error counts to the expander phys o aic94xx: add transport class phy reset capability o aic94xx: remove local_attached flag o Remove README o Fixup Makefile variable for libsas rename o Rename sas->libsas o aic94xx: correct return code for sas_discover_event o aic94xx: use parent backlink port o aic94xx: remove channel abstraction o aic94xx: fix routing algorithms o aic94xx: add backlink port o aic94xx: fix cascaded expander properties o aic94xx: fix sleep under lock o aic94xx: fix panic on module removal in complex topology o aic94xx: make use of the new sas_port o rename sas_port to asd_sas_port o Fix for eh_strategy_handler move o aic94xx: move entirely over to correct transport class formulation o remove last vestages of sas_rphy_alloc() o update for eh_timed_out move o Preliminary expander support for aic94xx o sas: remove event thread o minor warning cleanups o remove last vestiges of id mapping arrays o Further updates o Convert aic94xx over entirely to the transport class end device and o update aic94xx/sas to use the new sas transport class end device o [PATCH] aic94xx: attaching to the sas transport class o Add missing completion removal from prior patch o [PATCH] aic94xx: attaching to the sas transport class o Build fixes from akpm Jeff Garzik: o [scsi aic94xx] Remove ->owner from PCI info table Luben Tuikov: o initial aic94xx driver Mike Anderson: o aic94xx: fix panic on module insertion o aic94xx: stub out SATA_DEV case o aic94xx: compile warning cleanups o aic94xx: sas_alloc_task o aic94xx: ref count update o aic94xx nexus loss time value o [PATCH] aic94xx: driver assertion in non-x86 BIOS env Randy Dunlap: o libsas: externs not needed Robert Tarte: o aic94xx: sequence patch - fixes SATA support Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-08-27[SCSI] scsi_transport_sas: remove local_attached flagJames Bottomley
This flag denotes local attachment of the phy. There are two problems with it: 1) It's actually redundant ... you can get the same information simply by seeing whether a host is the phys parent 2) we condition a lot of phy parameters on it on the false assumption that we can only control local phys. I'm wiring up phy resets in the aic94xx now, and it will be able to reset non-local phys as well. I fixed 2) by moving the local check into the reset and stats function of the mptsas, since that seems to be the only HBA that can't (currently) control non-local phys. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-08-27Merge ../linux-2.6James Bottomley
2006-08-27[PATCH] Fix tty layer DoS and comment relevant codeAlan Cox
Unlike the other tty comment patch this one has code changes. Specifically it limits the queue size for a tty to 64K characters (128Kbytes) worst case even if the tty is ignoring tty->throttle. This is because certain drivers don't honour the throttle value correctly, although it is a useful safeguard anyway. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] tty layer comment the locking assumptions and functions somewhatAlan Cox
Doesn't fix them but does show up some interesting areas that need review and fixing. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] cdrom/gdsc: fix printk format warningRandy Dunlap
Fix printk format warning: drivers/cdrom/gscd.c:269: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘unsigned int’ Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] x86: NUMAQ Kconfig fixKAMEZAWA Hiroyuki
When we select NUMA with i386, the system is only X86_NUMAQ or using ACPI. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] /proc/meminfo: don't put spaces in namesAndrew Morton
None of the other /proc/meminfo lines have a space in the identifier. This post-2.6.17 addition has the potential to break existing parsers, so use an underscore instead (like Committed_AS). Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] fix up lockdep trace in fs/exec.cDave Jones
This fixes the locking error noticed by lockdep: ============================================= [ INFO: possible recursive locking detected ] --------------------------------------------- init/1 is trying to acquire lock: (&sighand->siglock){....}, at: [<c047a78a>] flush_old_exec+0x3ae/0x859 but task is already holding lock: (&sighand->siglock){....}, at: [<c047a77a>] flush_old_exec+0x39e/0x859 other info that might help us debug this: 2 locks held by init/1: #0: (tasklist_lock){..--}, at: [<c047a76a>] flush_old_exec+0x38e/0x859 #1: (&sighand->siglock){....}, at: [<c047a77a>] flush_old_exec+0x39e/0x859 stack backtrace: [<c04051e1>] show_trace_log_lvl+0x54/0xfd [<c040579d>] show_trace+0xd/0x10 [<c04058b6>] dump_stack+0x19/0x1b [<c043b33a>] __lock_acquire+0x773/0x997 [<c043bacf>] lock_acquire+0x4b/0x6c [<c060630b>] _spin_lock+0x19/0x28 [<c047a78a>] flush_old_exec+0x3ae/0x859 [<c0498053>] load_elf_binary+0x4aa/0x1628 [<c0479cab>] search_binary_handler+0xa7/0x24e [<c047b577>] do_execve+0x15b/0x1f9 [<c04022b4>] sys_execve+0x29/0x4d [<c0403faf>] syscall_call+0x7/0xb Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] lockdep: annotate reiserfsIngo Molnar
reiserfs seems to have another locking level layer for the i_mutex due to the xattrs-are-a-directory thing. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] lockdep: annotate idescsi_pc_intr()Ingo Molnar
idescsi_pc_intr() uses local_irq_enable() in IRQ context: annotate it. (this has no effect on kernels with lockdep disabled. On kernels with lockdep enabled this means that we wont actually disable interrupts, and the warning message will go away as well.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] CONFIG_ACPI_SRAT NUMA build fixKAMEZAWA Hiroyuki
In file included from include/asm/mmzone.h:18, from include/linux/mmzone.h:439, <snip> include/asm/srat.h:31:2: error: #error CONFIG_ACPI_SRAT not defined, and srat.h header has been included make[1]: *** [arch/i386/kernel/asm-offsets.s] Error 1 This can happen with CONFIG_NUMA && !CONFIG_ACPI && !CONFIG_X86_NUMAQ Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] cpuset: oom panic fixNick Piggin
cpuset_excl_nodes_overlap always returns 0 if current is exiting. This caused customer's systems to panic in the OOM killer when processes were having trouble getting memory for the final put_user in mm_release. Even though there were lots of processes to kill. Change to returning 1 in this case. This achieves parity with !CONFIG_CPUSETS case, and was observed to fix the problem. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] register_one_node() compile fixKAMEZAWA Hiroyuki
register_one_node()'s should be defined under CONFIG_NUMA=n. fixes following bug. CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 mm/built-in.o: In function `add_memory': undefined reference to `register_one_node' Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] Manage jbd allocations from its own slabsBadari Pulavarty
JBD currently allocates commit and frozen buffers from slabs. With CONFIG_SLAB_DEBUG, its possible for an allocation to cross the page boundary causing IO problems. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=200127 So, instead of allocating these from regular slabs - manage allocation from its own slabs and disable slab debug for these slabs. [akpm@osdl.org: cleanups] Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] cpuset: top_cpuset tracks hotplug changes to cpu_online_mapPaul Jackson
Change the list of cpus allowed to tasks in the top (root) cpuset to dynamically track what cpus are online, using a CPU hotplug notifier. Make this top cpus file read-only. On systems that have cpusets configured in their kernel, but that aren't actively using cpusets (for some distros, this covers the majority of systems) all tasks end up in the top cpuset. If that system does support CPU hotplug, then these tasks cannot make use of CPUs that are added after system boot, because the CPUs are not allowed in the top cpuset. This is a surprising regression over earlier kernels that didn't have cpusets enabled. In order to keep the behaviour of cpusets consistent between systems actively making use of them and systems not using them, this patch changes the behaviour of the 'cpus' file in the top (root) cpuset, making it read only, and making it automatically track the value of cpu_online_map. Thus tasks in the top cpuset will have automatic use of hot plugged CPUs allowed by their cpuset. Thanks to Anton Blanchard and Nathan Lynch for reporting this problem, driving the fix, and earlier versions of this patch. Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Nathan Lynch <ntl@pobox.com> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] md: fix recent breakage of md/raid1 array checkingNeilBrown
A recent patch broke the ability to do a user-request check of a raid1. This patch fixes the breakage and also moves a comment that was dislocated by the same patch. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] md: avoid backward event updates in md superblock when degraded.NeilBrown
If we - shut down a clean array, - restart with one (or more) drive(s) missing - make some changes - pause, so that they array gets marked 'clean', the event count on the superblock of included drives will be the same as that of the removed drives. So adding the removed drive back in will cause it to be included with no resync. To avoid this, we only update the eventcount backwards when the array is not degraded. In this case there can (should) be no non-connected drives that we can get confused with, and this is the particular case where updating-backwards is valuable. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] eventpoll.c compile fixMasoud Asgharifard Sharbiani
Fix two compile failures in eventpoll.c code which would happen if DEBUG_EPOLL is bigger than zero. Signed-off-by: Masoud Sharbiani <masouds@google.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] Documentation update for relay interfaceTom Zanussi
Here's updated documentation for the relay interface, rewritten to match the relayfs->relay changes. It also moves relayfs.txt to relay.txt in the process. It includes the changes to relayfs.txt previously posted by Randy Dunlap, thanks for those. The relay-apps examples have also been updated to match, and can be found on the sourceforge relayfs website. Signed-off-by: Tom Zanussi <zanussi@us.ibm.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] Remove redundant up() in stop_machine()Yingchao Zhou
An up() is called in kernel/stop_machine.c on failure, and also in the caller (unconditionally). Signed-off-by: Zhou Yingchao <yingchao.zhou@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] ufs: truncate correctionEvgeniy Dushistov
1) When we allocated last fragment in ufs_truncate, we read page, check if block mapped to address, and if not trying to allocate it. This is wrong behaviour, fragment may be NOT allocated, but mapped, this happened because of "block map" function not checked allocated fragment or not, it just take address of the first fragment in the block, add offset of fragment and return result, this is correct behaviour in almost all situation except call from ufs_truncate. 2) Almost all implementation of UFS, which I can investigate have such "defect": if you have full disk, and try truncate file, for example 3GB to 2MB, and have hole in this region, truncate return -ENOSPC. I tried evade from this problem, but "block allocation" algorithm is tied to right value of i_lastfrag, and fix of this corner case may slow down of ordinaries scenarios, so this patch makes behavior of "truncate" operations similar to what other UFS implementations do. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] ufs: write to hole in big fileEvgeniy Dushistov
On UFS, this scenario: open(O_TRUNC) lseek(1024 * 1024 * 80) write("A") lseek(1024 * 2) write("A") may cause access to invalid address. This happened because of "goal" is calculated in wrong way in block allocation path, as I see this problem exists also in 2.4. We use construction like this i_data[lastfrag], i_data array of pointers to direct blocks, indirect and so on, it has ceratain size ~20 elements, and lastfrag may have value for example 40000. Also this patch fixes related to handling such scenario issues, wrong zeroing metadata, in case of block(not fragment) allocation, and wrong goal calculation, when we allocate block Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] ext3 filesystem bogus ENOSPC with reservation fixMingming Cao
To handle the earlier bogus ENOSPC error caused by filesystem full of block reservation, current code falls back to non block reservation, starts to allocate block(s) from the goal allocation block group as if there is no block reservation. Current code needs to re-load the corresponding block group descriptor for the initial goal block group in this case. The patch fixes this. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] ext2: prevent div-by-zero on corrupted fsAndries Brouwer
Mounting an ext2 filesystem with zero s_inodes_per_group will cause a divide error. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27[PATCH] Fix for minix crashAndries Brouwer
Mounting a (corrupt) minix filesystem with zero s_zmap_blocks gives a spectacular crash on my 2.6.17.8 system, no doubt because minix/inode.c does an unconditional minix_set_bit(0,sbi->s_zmap[0]->b_data); [akpm@osdl.org: make labels conistent while we're there] Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>