summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2009-09-23virtio_blk: add support for cache flushChristoph Hellwig
Recent qemu has added a VIRTIO_BLK_F_FLUSH flag to advertise that the virtual disk has a volatile write cache that needs to be flushed. In case we see this feature implement tell the Linux block layer about the fact and use the new VIRTIO_BLK_T_FLUSH to flush the cache when required. This allows for an correct and simple implementation of write barriers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-23virtio: add virtio IDs fileFernando Luis Vazquez Cao
Virtio IDs are spread all over the tree which makes assigning new IDs bothersome. Putting them together should make the process less error-prone. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-23virtio: get rid of redundant VIRTIO_ID_9P definitionFernando Luis Vazquez Cao
VIRTIO_ID_9P is already defined in include/linux/virtio_9p.h so use that definition instead. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Eric Van Hensbergen <ericvh@gmail.com>
2009-09-23virtio: make add_buf return capacity remainingRusty Russell
This API change means that virtio_net can tell how much capacity remains for buffers. It's necessarily fuzzy, since VIRTIO_RING_F_INDIRECT_DESC means we can fit any number of descriptors in one, *if* we can kmalloc. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Dinesh Subhraveti <dineshs@us.ibm.com>
2009-09-23virtio_pci: minor MSI-X cleanupsRusty Russell
1) Rename vp_request_vectors to vp_request_msix_vectors, and take non-MSI-X case out to caller. 2) Comment weird pci_enable_msix API 3) Rename vp_find_vq to setup_vq. 4) Fix spaces to tabs 5) Make nvectors calc internal to vp_try_to_find_vqs() 6) Rename vector to msix_vector for more clarity. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com>
2009-09-23Merge branch 'bugfix' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into x86/urgent
2009-09-23perf tools: Fix module symbol loading bugMike Galbraith
Avi Kivity reported 'perf annotate' failures with modules, the requested function was not annotated. If there are no modules currently loaded, or the last module scanned is not loaded, dso__load_modules() steps on the value from dso__load_vmlinux(), so we happily load the kallsyms symbols on top of what we've already loaded. Fix that such that the total count of symbols loaded is returned. Should module symbol load fail after parsing of vmlinux, is's a hard failure, so do not silently fall-back to kallsyms. Reported-by: Avi Kivity <avi@redhat.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: rostedt@goodmis.org Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <1253697658.11461.36.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-23CRIS: Cleanup linker script using new linker script macros.Jesper Nilsson
Signed-off-by: Tim Abbott <tabbott@ksplice.com> Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
2009-09-23perf_event, x86: Fix 'perf sched record' crashing the machinePeter Zijlstra
Chris Malley reported that 'perf sched record' sometimes crashes his box with: [ 389.272175] BUG: unable to handle kernel paging request at ffffb300 [ 389.272294] IP: [<c011b0bd>] default_send_IPI_self+0x1d/0x50 [ 389.272366] *pde = 0073f067 *pte = 00000000 [ 389.274708] Call Trace: [ 389.274752] [<c010e3b4>] ? set_perf_event_pending+0x14/0x20 [ 389.274801] [<c01b9751>] ? perf_output_unlock+0x121/0x1a0 [ 389.274848] [<c01b981a>] ? perf_output_end+0x4a/0x70 [ 389.274893] [<c01ba690>] ? __perf_event_overflow+0x240/0x2f0 [ 389.274942] [<c030963e>] ? atomic64_cmpxchg+0x1e/0x30 [ 389.274988] [<c01ba8f4>] ? perf_swevent_ctx_event+0x1b4/0x1c0 [ 389.275035] [<c01ba773>] ? perf_swevent_ctx_event+0x33/0x1c0 [ 389.275081] [<c01ba9a7>] ? do_perf_sw_event+0xa7/0x160 [ 389.275127] [<c01baae2>] ? perf_tp_event+0x82/0xa0 [ 389.275174] [<c012e9c6>] ? ftrace_profile_sched_stat_runtime+0xe6/0x120 [ 389.275224] [<c012e8e0>] ? ftrace_profile_sched_stat_runtime+0x0/0x120 [ 389.275273] [<c013c85a>] ? update_curr+0x18a/0x230 [ 389.275318] [<c013cdc5>] ? put_prev_task_fair+0x155/0x160 [ 389.275366] [<c01618b5>] ? sched_clock_cpu+0xd5/0x110 [ 389.275413] [<c04e7525>] ? _spin_lock_irq+0x45/0x50 [ 389.275458] [<c04e424e>] ? schedule+0x20e/0xb10 The problem is that the box has no lapic enabled: [ 0.042445] Local APIC not detected. Using dummy APIC emulation. The below seems like the best fix. We disabled all lapic bits, except the self-IPI-resend logic. Reported-by: Chris Malley <mail@chrismalley.co.uk> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <7863dc4c0909221409v7893bfd3o4b590d5951a233ba@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-23ocfs2: Use buffer IO if we are appending a file.Tao Ma
In ocfs2_file_aio_write, we will prevent direct io if we find that we are appending(changing i_size) and call generic_file_aio_write_nolock. But actually O_DIRECT flag is there and this function will call generic_file_direct_write eventually which will update i_size and leave di->i_size alone. The bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1173. So this patch let ocfs2_direct_IO returns 0 directly if we are appending so that buffered write will be called and di->i_size get updated successfully. And this is also what we want in ocfs2_file_aio_write. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-23ocfs2: add spinlock protection when dealing with lockres->purge.Wengang Wang
when we check/modify lockres->purge, we should with the protection of lockres->spinlock. in dlm_purge_lockres(), the checking/modifying is not with the protectin. this patch fixes it. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-23dlmglue.c: add missed mlog linesColy Li
This patch adds the missed mlog_exit() and mlog_exit_void() lines when routines return. Signed-off-by: Coly Li <coly.li@suse.de> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-23ocfs2: __ocfs2_abort() should not enable panic for local mountsSunil Mushran
In a clustered setup, we have to panic the box on journal abort. This is because we don't have the facility to go hard readonly. With hard ro, another node would detect node failure and initiate recovery. Having said that, we shouldn't force panic if the volume is mounted locally. This patch defers the handling to the mount option, errors. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-23modules, tracing: Remove stale struct marker signature from module_layout()Ingo Molnar
Linus reported this new build warning: kernel/module.c:2951: warning: ?struct marker? declared inside parameter list kernel/module.c:2951: warning: its scope is only this definition or declaration, which is probably not what you want Caused by: fc53776: tracing: Remove markers module_layout() is an artificial symbol with 'significant' symbols listed in its argument list so that it gets a proper argument types signature that modversions can pick up to decide whether a module is version-compatible or not. If these dont match then we wont even look at a module. Remove the stale marker symbol. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <alpine.LFD.2.01.0909210908020.4950@localhost.localdomain> Cc: Christoph Hellwig <hch@lst.de> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-23Merge branch 'next' of ↵NeilBrown
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx into for-linus
2009-09-23md: raid-1/10: fix RW bits manipulationDmitry Monakhov
Recently Jens has changed bio_rw_flagged() logic by following commit 1f98a13f623e0ef666690a18c1250335fc6d7ef1. Now it returns bool instead of int. This broke raid1/raid10 RW bits manipulation logic. One of visible result is BUG_ON triggering due to empty barrier here scsi_lib.c:1108 scsi_setup_fs_cmnd() Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NeilBrown <neilb@suse.de>
2009-09-23md: remove unnecessary memset from multipath.NeilBrown
Recent commit bbba809e96539672f775a3d70102657d05816a5b replaced mempool_create_kzalloc_pool with mempool_create_kmalloc_pool plus a memset. This memset is not needed (and we didn't need kzalloc in the first place). Ever field of the allocated structure (struct multipath_bh) is initialised immediately except retry_list, and memset does not initial a list_head anyway. To remove the memset. Signed-off-by: NeilBrown <neilb@suse.de>
2009-09-23md: report device as congested when suspendedNeilBrown
This should writeback from coming when the device is temporarily suspended. Signed-off-by: NeilBrown <neilb@suse.de>
2009-09-23md: Improve name of threads created by md_register_threadNeilBrown
The management thread for raid4,5,6 arrays are all called mdX_raid5, independent of the actual raid level, which is wrong and can be confusion. So change md_register_thread to use the name from the personality unless no alternate name (like 'resync' or 'reshape') is given. This is simpler and more correct. Cc: Jinzc <zhenchengjin@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2009-09-23md: remove sparse warnings about lock context.NeilBrown
There was a real error here on a failure path where we incorrectly call rcu_read_unlock. Signed-off-by: NeilBrown <neilb@suse.de>
2009-09-23md: remove sparse waring "symbol xxx shadows an earlier one"NeilBrown
Rename some variable and remove some duplicate definitions to avoid there warnings. None of them are actual errors. Signed-off-by: NeilBrown <neilb@suse.de>
2009-09-23mtd: jedec_probe: add PSD4256G6V idMike Frysinger
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-09-22x86: ptrace: set TS_COMPAT when 32-bit ptrace sets orig_eax>=0Roland McGrath
The 32-bit ptrace syscall on a 64-bit kernel (32-bit debugger on 32-bit task) behaves differently than a native 32-bit kernel. When setting a register state of orig_eax>=0 and eax=-ERESTART* when the debugged task is NOT on its way out of a 32-bit syscall, the task will fail to do the syscall restart logic that it should do. Test case available at http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/erestartsys-trap.c?cvsroot=systemtap This happens because the 32-bit ptrace syscall sets eax=0xffffffff when it sets orig_eax>=0. The resuming task will not sign-extend this for the -ERESTART* check because TS_COMPAT is not set. (So the task thinks it is restarting after a 64-bit syscall, not a 32-bit one.) The fix is to have 32-bit ptrace calls set TS_COMPAT when setting orig_eax>=0. This ensures that the 32-bit syscall restart logic will apply when the child resumes. Signed-off-by: Roland McGrath <roland@redhat.com>
2009-09-22x86: ptrace: do not sign-extend orig_ax on writeRoland McGrath
The high 32 bits of orig_ax will be ignored when it matters, so don't fiddle them when setting it. Signed-off-by: Roland McGrath <roland@redhat.com>
2009-09-23score: update email address in MAINTAINERS.Chen Liqin
Signed-off-by: Chen Liqin <liqin.chen@sunplusct.com>
2009-09-23score: Cleanup linker script using new macros.Tim Abbott
Signed-off-by: Tim Abbott <tabbott@ksplice.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Chen Liqin <liqin.chen@sunplusct.com>
2009-09-23score: Make THREAD_SIZE available to assembly files.Tim Abbott
Signed-off-by: Tim Abbott <tabbott@ksplice.com> Acked-by: Sam Ravnborg <sam@ravnborg.org>
2009-09-23score: Make PAGE_SIZE available to assembly.Tim Abbott
Signed-off-by: Tim Abbott <tabbott@ksplice.com> Acked-by: Sam Ravnborg <sam@ravnborg.org>
2009-09-22Input: add driver for Atmel AT42QT2160 Sensor ChipRaphael Derosso Pereira
This version only supports individual cells (no slide support yet). The code has been tested on proprietary development ARM board, but should work fine on other machines. Signed-off-by: Raphael Derosso Pereira <raphaelpereira@gmail.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2009-09-22x86: ptrace: sysret path should reach syscall_trace_leaveRoland McGrath
If TIF_SYSCALL_TRACE or TIF_SINGLESTEP is set while inside a syscall, the path back to user mode should get to syscall_trace_leave. This does happen in most circumstances. The exception to this is on the 64-bit syscall fastpath, when no such flag was set on syscall entry and nothing else has punted it off the fastpath for exit. That one exit fastpath fails to check for _TIF_WORK_SYSCALL_EXIT flags. This makes the behavior inconsistent with what 32-bit tasks see and what the native 32-bit kernel always does, and what 64-bit tasks see in all cases where the iret path is taken anyhow. Perhaps the only example that is affected is a ptrace stop inside do_fork (for PTRACE_O_TRACE{CLONE,FORK,VFORK,VFORKDONE}). Other syscalls with internal ptrace stop points (execve) already take the iret exit path for unrelated reasons. Test cases for both PTRACE_SYSCALL and PTRACE_SINGLESTEP variants are at: http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/syscall-from-clone.c?cvsroot=systemtap http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/step-from-clone.c?cvsroot=systemtap There was no special benefit to the sysret path's special path to call do_notify_resume, because it always takes the iret exit path at the end. So this change just makes the sysret exit path join the iret exit path for all the signals and ptrace cases. The fastpath still applies to the plain syscall-audit and resched cases. Signed-off-by: Roland McGrath <roland@redhat.com> CC: Oleg Nesterov <oleg@redhat.com>
2009-09-22ocfs2: Add ioctl for reflink.Tao Ma
The ioctl will take 3 parameters: old_path, new_path and preserve and call vfs_reflink. It is useful when we backport reflink features to old kernels. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Enable refcount tree support.Tao Ma
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Implement ocfs2_reflink.Tao Ma
Implement ocfs2_reflink. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Add preserve to reflink.Tao Ma
reflink has 2 options for the destination file: 1. snapshot: reflink will attempt to preserve ownership, permissions, and all other security state in order to create a full snapshot. 2. new file: it will acquire the data extent sharing but will see the file's security state and attributes initialized as a new file. So add the option to ocfs2. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Create reflinked file in orphan dir.Tao Ma
reflink is a very complicated process, so it can't be integrated into one transaction. So if the system panic in the operation, we may leave a unfinished inode in the destication directory. So we will try to create an inode in orphan_dir first, reflink it to the src file and then move it to the destication file in the end. In that way we won't be afraid of any corruption during the reflink. This patch adds 2 functions for orphan_dir operation: 1. Create a new inode in orphand dir. 2. Move an inode to a target dir. Note: fsck.ocfs2 should work for us to remove the unfinished file in the orphan_dir. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Use proper parameter for some inode operation.Tao Ma
In order to make the original function more suitable for reflink, we modify the following inode operations. Both are tiny. 1. ocfs2_mknod_locked only use dentry for mlog, so move it to the caller so that reflink can use it without dentry. 2. ocfs2_prepare_orphan_dir only want inode to get its ip_blkno. So use ip_blkno instead. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Make transaction extend more efficient.Tao Ma
In ocfs2_extend_rotate_transaction, op_credits is the orignal credits in the handle and we only want to extend the credits for the rotation, but the old solution always double it. It is harmless for some minor operations, but for actions like reflink we may rotate tree many times and cause the credits increase dramatically. So this patch try to only increase the desired credits. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Don't merge in 1st refcount ops of reflink.Tao Ma
Actually the whole reflink will touch refcount tree 2 times: 1. It will add the clusters in the extent record to the tree if it isn't refcounted before. 2. It will add 1 refcount to these clusters when it add these extent records to the tree. So actually we shouldn't do merge in the 1st operation since the 2nd one will soon be called and we may have to split it again. Do a merge first and split soon is a waste of time. So we only merge in the 2nd round. This is done by adding a new internal __ocfs2_increase_refcount and call it with "not-merge" for 1st refcount operation in reflink. This also has a side-effect that we don't need to worry too much about the metadata allocation in the 2nd round since it will only merge and no split will happen for those records. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Modify removing xattr process for refcount.Tao Ma
The old xattr value remove is quite simple, it just erase the tree and free the clusters. But as we have added refcount support, The process is a little complicated. We have to lock the refcount tree at the beginning, what's more, we may split the refcount tree in some cases, so meta/credits are needed. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Add reflink support for xattr.Tao Ma
Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Create an xattr indexed block if needed.Tao Ma
With reflink, there is a need that we create a new xattr indexed block from the very beginning. So add a new parameter for ocfs2_create_xattr_block. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Call refcount tree remove process properly.Tao Ma
Now with xattr refcount support, we need to check whether we have xattr refcounted before we remove the refcount tree. Now the mechanism is: 1) Check whether i_clusters == 0, if no, exit. 2) check whether we have i_xattr_loc in dinode. if yes, exit. 2) Check whether we have inline xattr stored outside, if yes, exit. 4) Remove the tree. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Attach xattr clusters to refcount tree.Tao Ma
In ocfs2, when xattr's value is larger than OCFS2_XATTR_INLINE_SIZE, it will be kept outside of the blocks we store xattr entry. And they are stored in a b-tree also. So this patch try to attach all these clusters to refcount tree also. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Abstract ocfs2 xattr tree extend rec iteration process.Tao Ma
Currently we have ocfs2_iterate_xattr_buckets which can receive a para and a callback to iterate a series of bucket. It is good. But actually the 2 callers ocfs2_xattr_tree_list_index_block and ocfs2_delete_xattr_index_block are almost the same. The only difference is that the latter need to handle the extent record also. So add a new function named ocfs2_iterate_xattr_index_block. It can be given func callback which are used for exten record. So now we only have one iteration function for the xattr index block. Ane what's more, it is useful for our future reflink operations. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Abstract the creation of xattr block.Tao Ma
In xattr reflink, we also need to create xattr block, so abstract the process out. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Remove inode from ocfs2_xattr_bucket_get_name_value.Tao Ma
In ocfs2_xattr_bucket_get_name_value, actually we only use super_block. So use it. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Add CoW support for xattr.Tao Ma
In order to make 2 transcation(xattr and cow) independent with each other, we CoW the whole xattr out in case we are setting them. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Abstract duplicate clusters process in CoW.Tao Ma
We currently use pagecache to duplicate clusters in CoW, but it isn't suitable for xattr case. So abstract it out so that the caller can decide which method it use. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Return extent flags for xattr value tree.Tao Ma
With the new refcount tree, xattr value can also be refcounted among multiple files. So return the appropriate extent flags so that CoW can used it later. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: handle file attributes issue for reflink.Tao Ma
A reflink creates a snapshot of a file, that means the attributes must be identical except for three exceptions - nlink, ino, and ctime. As for time changes, Here is a brief description: 1. Source file: 1) atime: Ignore. Let the lazy atime code handle that. 2) mtime: don't touch. 3) ctime: If we change the tree (adding REFCOUNTED to at least one extent), update it. 2. Destination file: 1) atime: ignore. 2) mtime: we want it to appear identical to the source. 3) ctime: update. The idea here is that an ls -l will show the same time for the src and target - it shows mtime. Backup software like rsync and tar will treat the new file correctly too. Signed-off-by: Tao Ma <tao.ma@oracle.com>