Age | Commit message (Collapse) | Author |
|
... instead of mixing FMODE_ and O_
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Invalidate sb->s_bdev on remount,ro.
Fixes a problem reported by Jorge Boncompte who is seeing corruption
trying to snapshot a minix filesystem image. Some filesystems modify
their metadata via a path other than the bdev buffer cache (eg. they may
use a private linear mapping for their metadata, or implement directories
in pagecache, etc). Also, file data modifications usually go to the bdev
via their own mappings.
These updates are not coherent with buffercache IO (eg. via /dev/bdev)
and never have been. However there could be a reasonable expectation that
after a mount -oremount,ro operation then the buffercache should
subsequently be coherent with previous filesystem modifications.
So invalidate the bdev mappings on a remount,ro operation to provide a
coherency point.
The problem was exposed when we switched the old rd to brd because old rd
didn't really function like a normal block device and updates to rd via
mappings other than the buffercache would still end up going into its
buffercache. But the same problem has always affected other "normal"
block devices, including loop.
[akpm@linux-foundation.org: repair comment layout]
Reported-by: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Tested-by: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Cleanup EXPORT* macros according to Documantation/CodingStyle.
Move EXPORT* macros to the line immediately after the closing
function brace.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
EXPORT_SYMBOL(proc_symlink);
EXPORT_SYMBOL(proc_mkdir);
EXPORT_SYMBOL(create_proc_entry);
EXPORT_SYMBOL(proc_create_data);
EXPORT_SYMBOL(remove_proc_entry);
Those EXPORT_SYMBOL shouldn't be in fs/proc/root.c,
should be in fs/proc/generic.c.
Signed-off-by: Helight.Xu <helight.xu@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Remove the EXPORT_UNUSED_SYMBOL of simple_prepare_write
Collapse simple_prepare_write into it's only caller, though
making it simpler and clearer to understand.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
* simple_commit_write was only called by simple_write_end.
Open coding it makes it tiny bit less heavy on the arithmetic and
much more readable.
* While at it use zero_user() for clearing a partial page.
* While at it add a docbook comment for simple_write_end.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This reverts commit 213614d583748d00967a91cacd656f417efb36ce.
Alas, ->d_revalidate() can't rely on ->lookup() finishing what
it's started; if d_alloc() in do_lookup() fails, we are not going
to call ->lookup() at all.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: add reader's lock for cno in nilfs_ioctl_sync
nilfs2: delete unnecessary condition in load_segment_summary
nilfs2: move iterator to write log into segment buffer
nilfs2: get rid of s_dirt flag use
nilfs2: get rid of nilfs_segctor_req struct
nilfs2: delete unnecessary condition in nilfs_dat_translate
nilfs2: fix potential hang in nilfs_error on errors=remount-ro
nilfs2: use mnt_want_write in ioctls where write access is needed
nilfs2: issue discard request after cleaning segments
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (36 commits)
Ocfs2: Move ocfs2 ioctl definitions from ocfs2_fs.h to newly added ocfs2_ioctl.h
ocfs2: send SIGXFSZ if new filesize exceeds limit -v2
ocfs2/userdlm: Add tracing in userdlm
ocfs2: Use a separate masklog for AST and BASTs
dlm: allow dlm do recovery during shutdown
ocfs2: Only bug out in direct io write for reflinked extent.
ocfs2: fix warning in ocfs2_file_aio_write()
ocfs2_dlmfs: Enable the use of user cluster stacks.
ocfs2_dlmfs: Use the stackglue.
ocfs2_dlmfs: Don't honor truncate. The size of a dlmfs file is LVB_LEN
ocfs2: Pass the locking protocol into ocfs2_cluster_connect().
ocfs2: Remove the ast pointers from ocfs2_stack_plugins
ocfs2: Hang the locking proto on the cluster conn and use it in asts.
ocfs2: Attach the connection to the lksb
ocfs2: Pass lksbs back from stackglue ast/bast functions.
ocfs2_dlmfs: Move to its own directory
ocfs2_dlmfs: Use poll() to signify BASTs.
ocfs2_dlmfs: Add capabilities parameter.
ocfs2: Handle errors while setting external xattr values.
ocfs2: Set inline xattr entries with ocfs2_xa_set()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix large stack use
fuse: cleanup in fuse_notify_inval_...()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: add __percpu sparse annotations to what's left
percpu: add __percpu sparse annotations to fs
percpu: add __percpu sparse annotations to core kernel subsystems
local_t: Remove leftover local.h
this_cpu: Remove pageset_notifier
this_cpu: Page allocator conversion
percpu, x86: Generic inc / dec percpu instructions
local_t: Move local.h include to ringbuffer.c and ring_buffer_benchmark.c
module: Use this_cpu_xx to dynamically allocate counters
local_t: Remove cpu_local_xx macros
percpu: refactor the code in pcpu_[de]populate_chunk()
percpu: remove compile warnings caused by __verify_pcpu_ptr()
percpu: make accessors check for percpu pointer in sparse
percpu: add __percpu for sparse.
percpu: make access macros universal
percpu: remove per_cpu__ prefix.
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
GFS2: print glock numbers in hex
GFS2: ordered writes are backwards
GFS2: Remove old, unused linked list code from quota
GFS2: Remove loopy umount code
GFS2: Metadata address space clean up
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] pSesInfo->sesSem is used as mutex. Rename it to session_mutex and
[CIFS] Use unsigned ea length for clarity
cifs: set server_eof in cifs_fattr_to_inode
[CIFS] Minor cleanup to EA patch
cifs: merge CIFSSMBQueryEA with CIFSSMBQAllEAs
cifs: verify lengths of QueryAllEAs reply
cifs: increase maximum buffer size in CIFSSMBQAllEAs
cifs: rename name_len to list_len in CIFSSMBQAllEAs
cifs: clean up indentation in CIFSSMBQAllEAs
cifs: add parens around smb_var in BCC macros
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (38 commits)
SELinux: Make selinux_kernel_create_files_as() shouldn't just always return 0
TOMOYO: Protect find_task_by_vpid() with RCU.
Security: add static to security_ops and default_security_ops variable
selinux: libsepol: remove dead code in check_avtab_hierarchy_callback()
TOMOYO: Remove __func__ from tomoyo_is_correct_path/domain
security: fix a couple of sparse warnings
TOMOYO: Remove unneeded parameter.
TOMOYO: Use shorter names.
TOMOYO: Use enum for index numbers.
TOMOYO: Add garbage collector.
TOMOYO: Add refcounter on domain structure.
TOMOYO: Merge headers.
TOMOYO: Add refcounter on string data.
TOMOYO: Reduce lines by using common path for addition and deletion.
selinux: fix memory leak in sel_make_bools
TOMOYO: Extract bitfield
syslog: clean up needless comment
syslog: use defined constants instead of raw numbers
syslog: distinguish between /proc/kmsg and syscalls
selinux: allow MLS->non-MLS and vice versa upon policy reload
...
|
|
Currently we were adding ioctl cmds/structures for ocfs2 into ocfs2_fs.h
which was used for define ocfs2 on-disk layout. That sounds a little bit
confusing, and it may be quickly polluted espcially when growing the
ocfs2_info_request ioctls afterwards(it will grow i bet).
As a result, such OCFS2 IOCs do need to be placed somewhere other than
ocfs2_fs.h, a separated ocfs2_ioctl.h will be added to store such ioctl
structures and definitions which could also be used from userspace to
invoke ioctls call.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
|
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
Revert "blkdev: fix merge_bvec_fn return value checks"
|
|
This reverts commit 9f7cdbc33f36d28e57eaba0093f68f0d14c38c5b.
It's causing oopses om dm setups, so revert it until we investigate.
Reported-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
sunrpc_cache_update() will always call detail->update() from inside the
detail->hash_lock, so it cannot allocate memory.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
|
|
Ensure that we change the EXCHANGE_ID verifier (i.e. clp->cl_boot_time)
when we want to reset all state. This is mainly needed when the server
tells us that it is revoking our open or lock stateids.
Handle revoking of recallable state by expiring the delegations.
Handle callback path issues by expiring the delegations and then resetting
the session.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
renewd sends RENEW requests to the NFS server in order to renew state.
As the request is asynchronous, renewd should take a reference to the
nfs_client to prevent concurrent umounts from freeing the client
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
renewd sends SEQUENCE requests to the NFS server in order to renew state.
As the request is asynchronous, renewd should take a reference to the
nfs_client to prevent concurrent umounts from freeing the session/client
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
If the renewd send queue gets backlogged (e.g., if the server goes down),
we will keep filling the queue with periodic RENEW/SEQUENCE requests.
This patch schedules a new renewd request if and only if the previous one
returns (either success or failure)
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
[Trond.Myklebust@netapp.com: moved nfs4_schedule_state_renewal() into
separate nfs4_renew_release() and nfs41_sequence_release() callbacks
to ensure correct behaviour on call setup failure]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
renewd should be synchronously killed before we destroy the session in
nfs4_clear_minor_version
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
[Trond.Myklebust@netapp.com: clean up to remove 'unused function
warning when !CONFIG_NFS_V4]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1341 commits)
virtio_net: remove forgotten assignment
be2net: fix tx completion polling
sis190: fix cable detect via link status poll
net: fix protocol sk_buff field
bridge: Fix build error when IGMP_SNOOPING is not enabled
bnx2x: Tx barriers and locks
scm: Only support SCM_RIGHTS on unix domain sockets.
vhost-net: restart tx poll on sk_sndbuf full
vhost: fix get_user_pages_fast error handling
vhost: initialize log eventfd context pointer
vhost: logging thinko fix
wireless: convert to use netdev_for_each_mc_addr
ethtool: do not set some flags, if others failed
ipoib: returned back addrlen check for mc addresses
netlink: Adding inode field to /proc/net/netlink
axnet_cs: add new id
bridge: Make IGMP snooping depend upon BRIDGE.
bridge: Add multicast count/interval sysfs entries
bridge: Add hash elasticity/max sysfs entries
bridge: Add multicast_snooping sysfs toggle
...
Trivial conflicts in Documentation/feature-removal-schedule.txt
|
|
We always assume what dquot update result in changes in one data block
But ext4_quota_write() function may handle cross block boundary writes
In fact if this ever happen it will result in incorrect journal
credits reservation, and later a BUG_ON. As soon this never happen
the boundary cross loop is NOOP. In order to make things straight
let's remove this loop and assert cross boundary condition.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Convert a bunch of BUG_ONs to emit a ext4_error() message and return
EIO. This is a first pass and most notably does _not_ cover
mballoc.c, which is a morass of void functions.
Signed-off-by: Frank Mayhar <fmayhar@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Allocate uninitialized extent before ext4 buffer write and
convert the extent to initialized after io completes.
The purpose is to make sure an extent can only be marked
initialized after it has been written with new data so
we can safely drop the i_mutex lock in ext4 DIO read without
exposing stale data. This helps to improve multi-thread DIO
read performance on high-speed disks.
Skip the nobh and data=journal mount cases to make things simple for now.
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
This commit renames some of the direct I/O's block allocation flags,
variables, and functions introduced in Mingming's "Direct IO for holes
and fallocate" patches so that they can be used by ext4's buffered
write path as well. Also changed the related function comments
accordingly to cover both direct write and buffered write cases.
Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The callers of ext4_check_dir_entry() usually pass in the "file
offset" (ext4_readdir, htree_dirblock_to_tree, search_dirblock,
ext4_dx_find_entry, empty_dir), but a few callers (add_dirent_to_buf,
ext4_delete_entry) only pass in the buffer offset.
To accomodate those last two (which would be hard to fix otherwise),
this patch changes ext4_check_dir_entry() to print the physical block
number and the relative offset as well as the passed-in offset.
Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
In case of truncate errors we explicitly remove inode from in-core
orphan list via orphan_del(NULL, inode) without modifying the on-disk list.
But later on, the same inode may be inserted in the orphan list again
which will result the on-disk linked list getting corrupted. If inode
i_dtime contains valid value, then skip on-disk list modification.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Otherwise non-empty orphan list will be triggered on umount.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Set i_nlink to zero for temporary inode from very beginning.
otherwise we may fail to start new journal handle and this
inode will be unreferenced but with i_nlink == 1
Since we hold inode reference it can not be pruned.
Also add missed journal_start retval check.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Declare following list of mount options as deprecated:
- bsddf, miniddf
- grpid, bsdgroups, nogrpid, sysvgroups
Declare following list of default mount options as deprecated:
- bsdgroups
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The radix-tree code requires it's users to serialize tag updates
against other updates to the tree. While XFS protects tag updates
against each other it does not serialize them against updates of the
tree contents, which can lead to tag corruption. Fix the inode
cache to always take pag_ici_lock in exclusive mode when updating
radix tree tags.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Patrick Schreurs <patrick@news-service.com>
Tested-by: Patrick Schreurs <patrick@news-service.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
The ext4 multiblock allocator decides whether to use group or file
preallocation based on the file size. When the file size reaches
s_mb_stream_request (default is 16 blocks), it changes to use a
file-specific preallocation. This is cool, but it has a tiny problem.
See a simple script:
mkfs.ext4 -b 1024 /dev/sda8 1000000
mount -t ext4 -o nodelalloc /dev/sda8 /mnt/ext4
for((i=0;i<5;i++))
do
cat /mnt/4096>>/mnt/ext4/a #4096 is a file with 4096 characters.
cat /mnt/4096>>/mnt/ext4/b
done
debuge4fs -R 'stat a' /dev/sda8|grep BLOCKS -A 1
And you get
BLOCKS:
(0-14):8705-8719, (15):2356, (16-19):8465-8468
So there are 3 extents, a bit strange for the lonely 15th logical
block. As we write to the 16 blocks, we choose file preallocation in
ext4_mb_group_or_file, but in ext4_mb_normalize_request, we meet with
the 16*1024 range, so no preallocation will be carried. file b then
reserves the space after '2356', so when when write 16, we start from
another part.
This patch just change the check in ext4_mb_group_or_file, so
that for the lonely 15 we will still use group preallocation.
After the patch, we will get:
debuge4fs -R 'stat a' /dev/sda8|grep BLOCKS -A 1
BLOCKS:
(0-15):8705-8720, (16-19):8465-8468
Looks more sane. Thanks.
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
The flush_dirty_caps() used to loop over the first entry of the cap_dirty
dirty list on the assumption that after calling ceph_check_caps() it would
be removed from the list. This isn't true for caps that are being
migrated between MDSs, where we've received the EXPORT but not the IMPORT.
Instead, do a safe list iteration, and pin the next inode on the list via
the CEPH_I_NOFLUSH flag.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
We should include caps that are mid-migration (we've received the EXPORT,
but not the IMPORT) in the issued caps set.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Add missing pointer dereference (p is a void **).
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Verify the file is actually open for the given caps when we are
waiting for caps. This ensures we will wake up and return EBADF
if another thread closes the file out from under us.
Note that EBADF is also the correct return code from write(2)
when called on a file handle opened for reading (although the
vfs should catch that).
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
We didn't set the front length correctly. When messages used
the message pool we ended up with the conservative max (4 KB), and
the rest of the time the slightly less conservative estimate. Even
though the OSD ignores the extra data, set it to the right value to avoid
sending extra data over the network.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Reset msg front len when a message is returned to the pool: the caller
may have changed it.
BUG if we try to send a message with a hdr.front_len that doesn't match
the front iov.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
This was simply broken. Apparently at some point we thought about putting
the snaptrace in the middle section, but didn't.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Use a single ceph_msg for the osd reply, even when we are getting multiple
replies.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Clear LOSSYTX bit, so that if/when we reconnect, said reconnect
will retry on failure.
Clear _PENDING bits too, to avoid polluting subsequent
connection state.
Drop unused REGISTERED bit.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
Inodes are only pinned/unpinned via the inode item methods, and lots of
code relies on that fact. So remove the separate xfs_ipin/xfs_iunpin
helpers and merge them into their only callers. This also fixes up
various duplicate and/or incorrect comments.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
Remove the inode item pointer and ili_last_lsn checks in
__xfs_iunpin_wait as any pinned inode is guaranteed to have them
valid. After this the xfs_iunpin_nowait case is nothing more than a
xfs_log_force_lsn, as we know that the caller has already checked
the pincount.
Make xfs_iunpin_nowait the new low-level routine just doing the log
force and rewrite xfs_iunpin_wait around it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
Move the two declarations to better fitting headers now that
xfs_lrw.c is gone.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|
|
Most of xfs_trans_bjoin is duplicated in xfs_trans_get_buf,
xfs_trans_getsb and xfs_trans_read_buf. Add a new _xfs_trans_bjoin
which can be called by all four functions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
|