summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2009-09-06Merge branch 'tracing/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into tracing/core
2009-09-06Merge commit 'v2.6.31-rc9' into tracing/coreIngo Molnar
Merge reason: move from -rc5 to -rc9. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-05Linux 2.6.31-rc9v2.6.31-rc9Linus Torvalds
2009-09-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: firewire: sbp2: fix freeing of unallocated memory firewire: ohci: fix Ricoh R5C832, video reception firewire: ohci: fix Agere FW643 and multiple cameras firewire: core: fix crash in iso resource management
2009-09-05powerpc: Fix i8259 interrupt driver kernel crash on ML510Roderick Colenbrander
This patch fixes a null pointer exception caused by removal of 'ack()' for level interrupts in the Xilinx interrupt driver. A recent change to the xilinx interrupt controller removed the ack hook for level irqs. Signed-off-by: Roderick Colenbrander <thunderbird2k@gmail.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05Merge git://git.infradead.org/~dwmw2/mtd-2.6.31Linus Torvalds
* git://git.infradead.org/~dwmw2/mtd-2.6.31: JFFS2: add missing verify buffer allocation/deallocation mtd: nftl: fix offset alignments mtd: nftl: write support is broken mtd: m25p80: fix null pointer dereference bug
2009-09-05Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: Allow changing max_sectors_kb above the default 512
2009-09-05Merge branch 'fix/oxygen' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'fix/oxygen' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: sound: oxygen: handle cards with missing EEPROM sound: oxygen: fix MCLK rate for 192 kHz playback
2009-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: tc: Fix unitialized kernel memory leak pkt_sched: Revert tasklet_hrtimer changes. net: sk_free() should be allowed right after sk_alloc() gianfar: gfar_remove needs to call unregister_netdev() ipw2200: firmware DMA loading rework
2009-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: skcipher - Fix skcipher_dequeue_givcrypt NULL test
2009-09-05Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Re-enable cpufreq suspend and resume code
2009-09-05Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] fix csum_ipv6_magic() [IA64] Fix warning in dma-mapping.c
2009-09-05Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: actually enable the swapext compat handler
2009-09-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key
2009-09-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: don't assume existence of cpu0
2009-09-05Merge branch 'slab/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: Fix kmem_cache_destroy() with SLAB_DESTROY_BY_RCU
2009-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dmLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: dm snapshot: fix on disk chunk size validation dm exception store: split set_chunk_size dm snapshot: fix header corruption race on invalidation dm snapshot: refactor zero_disk_area to use chunk_io dm log: userspace add luid to distinguish between concurrent log instances dm raid1: do not allow log_failure variable to unset after being set dm log: remove incorrect field from userspace table output dm log: fix userspace status output dm stripe: expose correct io hints dm table: add more context to terse warning messages dm table: fix queue_limit checking device iterator dm snapshot: implement iterate devices dm multipath: fix oops when request based io fails when no paths
2009-09-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI SR-IOV: correct broken resource alignment calculations
2009-09-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: sparc64: Fix bootup with mcount in some configs. sparc64: Kill spurious NMI watchdog triggers by increasing limit to 30 seconds.
2009-09-05Merge branch 'perfcounters-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf_counter/powerpc: Fix cache event codes for POWER7 perf_counter: Fix /0 bug in swcounters perf_counters: Increase paranoia level
2009-09-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: atkbd - add Compaq Presario R4000-series repeat quirk Input: i8042 - add Acer Aspire 5536 to the nomux list
2009-09-05ext2: fix unbalanced kmap()/kunmap()Nicolas Pitre
In ext2_rename(), dir_page is acquired through ext2_dotdot(). It is then released through ext2_set_link() but only if old_dir != new_dir. Failing that, the pkmap reference count is never decremented and the page remains pinned forever. Repeat that a couple times with highmem pages and all pkmap slots get exhausted, and every further kmap() calls end up stalling on the pkmap_map_wait queue at which point the whole system comes to a halt. Signed-off-by: Nicolas Pitre <nico@marvell.com> Acked-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: ocfs2_write_begin_nolock() should handle len=0 ocfs2: invalidate dentry if its dentry_lock isn't initialized.
2009-09-05pty: don't limit the writes to 'pty_space()' inside 'pty_write()'Linus Torvalds
The whole write-room thing is something that is up to the _caller_ to worry about, not the pty layer itself. The total buffer space will still be limited by the buffering routines themselves, so there is no advantage or need in having pty_write() artificially limit the size somehow. And what happened was that the caller (the n_tty line discipline, in this case) may have verified that there is room for 2 bytes to be written (for NL -> CRNL expansion), and it used to then do those writes as two single-byte writes. And if the first byte written (CR) then caused a new tty buffer to be allocated, pty_space() may have returned zero when trying to write the second byte (LF), and then incorrectly failed the write - leading to a lost newline character. This should finally fix http://bugzilla.kernel.org/show_bug.cgi?id=14015 Reported-by: Mikael Pettersson <mikpe@it.uu.se> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05n_tty: do O_ONLCR translation as a single writeLinus Torvalds
When translating CR to CRNL in the n_tty line discipline, we did it as two tty_put_char() calls. Which works, but is stupid, and has caused problems before too with bad interactions with the write_room() logic. The generic USB serial driver had that problem, for example. Now the pty layer had similar issues after being moved to the generic tty buffering code (in commit d945cb9cce20ac7143c2de8d88b187f62db99bdc: "pty: Rework the pty layer to use the normal buffering logic"). So stop doing the silly separate two writes, and do it as a single write instead. That's what the n_tty layer already does for the space expansion of tabs (XTABS), and it means that we'll now always have just a single write for the CRNL to match the single 'tty_write_room()' test, which hopefully means that the next time somebody screws up buffering, it won't cause weeks of debugging. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05exec: do not sleep in TASK_TRACED under ->cred_guard_mutexOleg Nesterov
Tom Horsley reports that his debugger hangs when it tries to read /proc/pid_of_tracee/maps, this happens since "mm_for_maps: take ->cred_guard_mutex to fix the race with exec" 04b836cbf19e885f8366bccb2e4b0474346c02d commit in 2.6.31. But the root of the problem lies in the fact that do_execve() path calls tracehook_report_exec() which can stop if the tracer sets PT_TRACE_EXEC. The tracee must not sleep in TASK_TRACED holding this mutex. Even if we remove ->cred_guard_mutex from mm_for_maps() and proc_pid_attr_write(), another task doing PTRACE_ATTACH should not hang until it is killed or the tracee resumes. With this patch do_execve() does not use ->cred_guard_mutex directly and we do not hold it throughout, instead: - introduce prepare_bprm_creds() helper, it locks the mutex and calls prepare_exec_creds() to initialize bprm->cred. - install_exec_creds() drops the mutex after commit_creds(), and thus before tracehook_report_exec()->ptrace_stop(). or, if exec fails, free_bprm() drops this mutex when bprm->cred != NULL which indicates install_exec_creds() was not called. Reported-by: Tom Horsley <tom.horsley@att.net> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05page-allocator: always change pageblock ownership when anti-fragmentation is ↵Mel Gorman
disabled On low-memory systems, anti-fragmentation gets disabled as fragmentation cannot be avoided on a sufficiently large boundary to be worthwhile. Once disabled, there is a period of time when all the pageblocks are marked MOVABLE and the expectation is that they get marked UNMOVABLE at each call to __rmqueue_fallback(). However, when MAX_ORDER is large the pageblocks do not change ownership because the normal criteria are not met. This has the effect of prematurely breaking up too many large contiguous blocks. This is most serious on NOMMU systems which depend on high-order allocations to boot. This patch causes pageblocks to change ownership on every fallback when anti-fragmentation is disabled. This prevents the large blocks being prematurely broken up. This is a fix to commit 49255c619fbd482d704289b5eb2795f8e3b7ff2e [page allocator: move check for disabled anti-fragmentation out of fastpath] and the problem affects 2.6.31-rc8. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Tested-by: Paul Mundt <lethal@linux-sh.org> Cc: David Howells <dhowells@redhat.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05nommu: fix error handling in do_mmap_pgoff()David Howells
Fix the error handling in do_mmap_pgoff(). If do_mmap_shared_file() or do_mmap_private() fail, we jump to the error_put_region label at which point we cann __put_nommu_region() on the region - but we haven't yet added the region to the tree, and so __put_nommu_region() may BUG because the region tree is empty or it may corrupt the region tree. To get around this, we can afford to add the region to the region tree before calling do_mmap_shared_file() or do_mmap_private() as we keep nommu_region_sem write-locked, so no-one can race with us by seeing a transient region. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05workqueues: introduce __cancel_delayed_work()Oleg Nesterov
cancel_delayed_work() has to use del_timer_sync() to guarantee the timer function is not running after return. But most users doesn't actually need this, and del_timer_sync() has problems: it is not useable from interrupt, and it depends on every lock which could be taken from irq. Introduce __cancel_delayed_work() which calls del_timer() instead. The immediate reason for this patch is http://bugzilla.kernel.org/show_bug.cgi?id=13757 but hopefully this helper makes sense anyway. As for 13757 bug, actually we need requeue_delayed_work(), but its semantics are not yet clear. Merge this patch early to resolves cross-tree interdependencies between input and infiniband. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05firewire: sbp2: fix freeing of unallocated memoryStefan Richter
If a target writes invalid status (typically status of a command that already timed out), firewire-sbp2 attempts to put away an ORB that doesn't exist. https://bugzilla.redhat.com/show_bug.cgi?id=519772 Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-09-05firewire: ohci: fix Ricoh R5C832, video receptionStefan Richter
In dual-buffer DMA mode, no video frames are ever received from R5C832 by libdc1394. Fallback to packet-per-buffer DMA works reliably. http://thread.gmane.org/gmane.linux.kernel.firewire.devel/13393/focus=13476 Reported-by: Jonathan Cameron <jic23@cam.ac.uk> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-09-05firewire: ohci: fix Agere FW643 and multiple camerasStefan Richter
An Agere FW643 OHCI 1.1 card works fine for video reception from one camera but fails early if receiving from two cameras. After a short while, no IR IRQ events occur and the context control register does not react anymore. This happens regardless whether both IR DMA contexts are dual-buffer or one is dual-buffer and the other packet-per-buffer. This can be worked around by disabling dual buffer DMA mode entirely. http://sourceforge.net/mailarchive/message.php?msg_name=4A7C0594.2020208%40gmail.com (Reported by Samuel Audet.) In another report (by Jonathan Cameron), an FW643 works OK with two cameras in dual buffer mode. Whether this is due to different chip revisions or different usage patterns (different video formats) is not yet clear. However, as far as the current capabilities of firewire-core's isochronous I/O interface are concerned, simply switching off dual-buffer on non-working and working FW643s alike is not a problem in practice. We only need to revisit this issue if we are going to enhance the interface, e.g. so that applications can explicitly choose modes. Reported-by: Samuel Audet <samuel.audet@gmail.com> Reported-by: Jonathan Cameron <jic23@cam.ac.uk> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-09-05firewire: core: fix crash in iso resource managementStefan Richter
This fixes a regression due to post 2.6.30 commit "firewire: core: do not DMA-map stack addresses" 6fdc03709433ccc2005f0f593ae9d9dd04f7b485. As David Moore noted, a previously correct sizeof() expression became wrong since the commit changed its argument from an array to a pointer. This resulted in an oops in ohci_cancel_packet in the shared workqueue thread's context when an isochronous resource was to be freed. Reported-by: Jonathan Cameron <jic23@cam.ac.uk> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-09-04ring-buffer: only enable ring_buffer_swap_cpu when neededSteven Rostedt
Since the ability to swap the cpu buffers adds a small overhead to the recording of a trace, we only want to add it when needed. Only the irqsoff and preemptoff tracers use this feature, and both are not recommended for production kernels. This patch disables its use when neither irqsoff nor preemptoff is configured. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04ring-buffer: check for swapped buffers in start of committingSteven Rostedt
Because the irqsoff tracer can swap an internal CPU buffer, it is possible that a swap happens between the start of the write and before the committing bit is set (the committing bit will disable swapping). This patch adds a check for this and will fail the write if it detects it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04tracing: report error in trace if we fail to swap latency bufferSteven Rostedt
The irqsoff tracer will fail to swap the cpu buffer with the max buffer if it preempts a commit. Instead of ignoring this, this patch makes the tracer report it if the last max latency failed due to preempting a current commit. The output of the latency tracer will look like this: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 2.6.31-rc5 # -------------------------------------------------------------------- # latency: 112 us, #1/1, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: -4281 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: save_args # => ended at: __do_softirq # # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / # ||||| delay # cmd pid ||||| time | caller # \ / ||||| \ | / bash-4281 1d.s6 265us : update_max_tr_single: Failed to swap buffers due to commit in progress Note the latency time and the functions that disabled the irqs or preemption will still be listed. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04tracing: add trace_array_printk for internal tracers to useSteven Rostedt
This patch adds a trace_array_printk to allow a tracer to use the trace_printk on its own trace array. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04tracing: pass around ring buffer instead of tracerSteven Rostedt
The latency tracers (irqsoff and wakeup) can swap trace buffers on the fly. If an event is happening and has reserved data on one of the buffers, and the latency tracer swaps the global buffer with the max buffer, the result is that the event may commit the data to the wrong buffer. This patch changes the API to the trace recording to be recieve the buffer that was used to reserve a commit. Then this buffer can be passed in to the commit. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04tracing: make tracing_reset safe for external useSteven Rostedt
Reseting the trace buffer without first disabling the buffer and waiting for any writers to complete, can corrupt the ring buffer. This patch makes the external version of tracing_reset safe from corruption by disabling the ring buffer and calling synchronize_sched. This version can no longer be called from interrupt context. But all those callers have been removed. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04tracing: use timestamp to determine start of latency tracesSteven Rostedt
Currently the latency tracers reset the ring buffer. Unfortunately if a commit is in process (due to a trace event), this can corrupt the ring buffer. When this happens, the ring buffer will detect the corruption and then permanently disable the ring buffer. The bug does not crash the system, but it does prevent further tracing after the bug is hit. Instead of reseting the trace buffers, the timestamp of the start of the trace is used instead. The buffers will still contain the previous data, but the output will not count any data that is before the timestamp of the trace. Note, this only affects the static trace output (trace) and not the runtime trace output (trace_pipe). The runtime trace output does not make sense for the latency tracers anyway. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04tracing: Remove mentioning of legacy latency_trace file from documentationAlbin Tonnerre
The latency_trace file got removed a while back by commit 886b5b73d71e4027d7dc6c14f5f7ab102201ea6b and has been replaced by the latency-format option. This patch fixes the documentation by reflecting this change. Changes since v1: - mention that the trace format is configurable through the latency-format option - Fix a couple mistakes related to the timestamps Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20090831204007.GE4237@pc-ras4041.res.insa> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-04ocfs2: ocfs2_write_begin_nolock() should handle len=0Sunil Mushran
Bug introduced by mainline commit e7432675f8ca868a4af365759a8d4c3779a3d922 The bug causes ocfs2_write_begin_nolock() to oops when len=0. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Cc: stable@kernel.org Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04tracing/filters: Defer pred allocation, fix memory leakLi Zefan
The predicates of an event and their filter structure are allocated when we create an event filter for the first time. These objects must be created once but each time we come with a new filter, we overwrite such pre-existing allocation, if any. Thus, this patch checks if the filter has already been allocated before going ahead. Spotted-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <4A9CB1BA.3060402@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-09-04dm snapshot: fix on disk chunk size validationMikulas Patocka
Fix some problems seen in the chunk size processing when activating a pre-existing snapshot. For a new snapshot, the chunk size can either be supplied by the creator or a default value can be used. For an existing snapshot, the chunk size in the snapshot header on disk should always be used. If someone attempts to load an existing snapshot and has the 'default chunk size' option set, the kernel uses its default value even when it is incorrect for the snapshot being loaded. This patch ensures the correct on-disk value is always used. Secondly, when the code does use the chunk size stored on the disk it is prudent to revalidate it, so the code can exit cleanly if it got corrupted as happened in https://bugzilla.redhat.com/show_bug.cgi?id=461506 . Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04dm exception store: split set_chunk_sizeMikulas Patocka
Break the function set_chunk_size to two functions in preparation for the fix in the following patch. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04dm snapshot: fix header corruption race on invalidationMikulas Patocka
If a persistent snapshot fills up, a race can corrupt the on-disk header which causes a crash on any future attempt to activate the snapshot (typically while booting). This patch fixes the race. When the snapshot overflows, __invalidate_snapshot is called, which calls snapshot store method drop_snapshot. It goes to persistent_drop_snapshot that calls write_header. write_header constructs the new header in the "area" location. Concurrently, an existing kcopyd job may finish, call copy_callback and commit_exception method, that goes to persistent_commit_exception. persistent_commit_exception doesn't do locking, relying on the fact that callbacks are single-threaded, but it can race with snapshot invalidation and overwrite the header that is just being written while the snapshot is being invalidated. The result of this race is a corrupted header being written that can lead to a crash on further reactivation (if chunk_size is zero in the corrupted header). The fix is to use separate memory areas for each. See the bug: https://bugzilla.redhat.com/show_bug.cgi?id=461506 Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04dm snapshot: refactor zero_disk_area to use chunk_ioMikulas Patocka
Refactor chunk_io to prepare for the fix in the following patch. Pass an area pointer to chunk_io and simplify zero_disk_area to use chunk_io. No functional change. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04dm log: userspace add luid to distinguish between concurrent log instancesJonathan Brassow
Device-mapper userspace logs (like the clustered log) are identified by a universally unique identifier (UUID). This identifier is used to associate requests from the kernel to a specific log in userspace. The UUID must be unique everywhere, since multiple machines may use this identifier when communicating about a particular log, as is the case for cluster logs. Sometimes, device-mapper/LVM may re-use a UUID. This is the case during pvmoves, when moving from one segment of an LV to another, or when resizing a mirror, etc. In these cases, a new log is created with the same UUID and loaded in the "inactive" slot. When a device-mapper "resume" is issued, the "live" table is deactivated and the new "inactive" table becomes "live". (The "inactive" table can also be removed via a device-mapper 'clear' command.) The above two issues were colliding. More than one log was being created with the same UUID, and there was no way to distinguish between them. So, sometimes the wrong log would be swapped out during the exchange. The solution is to create a locally unique identifier, 'luid', to go along with the UUID. This new identifier is used to determine exactly which log is being referenced by the kernel when the log exchange is made. The identifier is not universally safe, but it does not need to be, since create/destroy/suspend/resume operations are bound to a specific machine; and these are the operations that make up the exchange. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04dm raid1: do not allow log_failure variable to unset after being setJonathan Brassow
This patch fixes a bug which was triggering a case where the primary leg could not be changed on failure even when the mirror was in-sync. The case involves the failure of the primary device along with the transient failure of the log device. The problem is that bios can be put on the 'failures' list (due to log failure) before 'fail_mirror' is called due to the primary device failure. Normally, this is fine, but if the log device failure is transient, a subsequent iteration of the work thread, 'do_mirror', will reset 'log_failure'. The 'do_failures' function then resets the 'in_sync' variable when processing bios on the failures list. The 'in_sync' variable is what is used to determine if the primary device can be switched in the event of a failure. Since this has been reset, the primary device is incorrectly assumed to be not switchable. The case has been seen in the cluster mirror context, where one machine realizes the log device is dead before the other machines. As the responsibilities of the server migrate from one node to another (because the mirror is being reconfigured due to the failure), the new server may think for a moment that the log device is fine - thus resetting the 'log_failure' variable. In any case, it is inappropiate for us to reset the 'log_failure' variable. The above bug simply illustrates that it can actually hurt us. Cc: stable@kernel.org Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04dm log: remove incorrect field from userspace table outputJonathan Brassow
The output of 'dmsetup table' includes an internal field that should not be there. This patch removes it. To make the fix simpler, we first reorder a constructor argument The 'device size' argument is generated internally. Currently it is placed as the last space-separated word of the constructor string. However, we need to use a version of the string without this word, so we move it to the beginning instead so it is trivial to skip past it. We keep a copy of the arguments passed to userspace for creating a log, just in case we need to resend them. These are the same arguments that are desired in the STATUSTYPE_TABLE request, except for one. When creating the userspace log, the userspace daemon must know the size of the mirror, so that is added to the arguments given in the constructor table. We were printing this extra argument out as well, which is a mistake. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>