Age | Commit message (Collapse) | Author |
|
get rid of the kludges in sysfs_readdir()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and get rid of that ridiculous mutex in bfs_readdir()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
what the hell is op_mutex for, BTW?
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
* sanity checks belong before risky operation, not after it
* don't quit as soon as we'd found an entry
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
new helper: dir_relax(inode). Call when you are in location that will
_not_ be invalidated by directory modifications (block boundary, in case
of ext*). Returns whether the directory has survived (dropping i_mutex
allows rmdir to kill the sucker; if it returns false to us, ->iterate()
is obviously done)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
new helpers - dir_emit_dot(file, ctx, dentry), dir_emit_dotdot(file, ctx),
dir_emit_dots(file, ctx).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
ext2, ufs, minix, sysv
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
New method - ->iterate(file, ctx). That's the replacement for ->readdir();
it takes callback from ctx->actor, uses ctx->pos instead of file->f_pos and
calls dir_emit(ctx, ...) instead of filldir(data, ...). It does *not*
update file->f_pos (or look at it, for that matter); iterate_dir() does the
update.
Note that dir_emit() takes the offset from ctx->pos (and eventually
filldir_t will lose that argument).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
iterate_dir(): new helper, replacing vfs_readdir().
struct dir_context: contains the readdir callback (and will get more stuff
in it), embedded into whatever data that callback wants to deal with;
eventually, we'll be passing it to ->readdir() replacement instead of
(data,filldir) pair.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no
mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are
in the middle of 'ubifs_readdir()'.
This means that 'file->private_data' can be freed while 'ubifs_readdir()' uses
it, and this is a very bad bug: not only 'ubifs_readdir()' can return garbage,
but this may corrupt memory and lead to all kinds of problems like crashes an
security holes.
This patch fixes the problem by using the 'file->f_version' field, which
'->llseek()' always unconditionally sets to zero. We set it to 1 in
'ubifs_readdir()' and whenever we detect that it became 0, we know there was a
seek and it is time to clear the state saved in 'file->private_data'.
I tested this patch by writing a user-space program which runds readdir and
seek in parallell. I could easily crash the kernel without these patches, but
could not crash it with these patches.
Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no
mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are
in the middle of 'ubifs_readdir()'.
First of all, this means that 'file->private_data' can be freed while
'ubifs_readdir()' uses it. But this particular patch does not fix the problem.
This patch is only a preparation, and the fix will follow next.
In this patch we make 'ubifs_readdir()' stop using 'file->f_pos' directly,
because 'file->f_pos' can be changed by '->llseek()' at any point. This may
lead 'ubifs_readdir()' to returning inconsistent data: directory entry names
may correspond to incorrect file positions.
So here we introduce a local variable 'pos', read 'file->f_pose' once at very
the beginning, and then stick to 'pos'. The result of this is that when
'ubifs_dir_llseek()' changes 'file->f_pos' while we are in the middle of
'ubifs_readdir()', the latter "wins".
Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Currently the trans2 ByteCount field is incorrectly left zero in
TRANS2_QUERY_FILE_INFO info_level=SMB_QUERY_FILE_ALL_INFO and
info_level=SMB_QUERY_FILE_UNIX_BASIC requests. The field should properly
reflect the FID, information_level and padding bytes carried in these
requests.
Leaving this field zero causes such requests to fail against Novell CIFS
servers. Other SMB servers (e.g. Samba) use the parameter count fields
for data length calculations instead, so do not suffer the same fate.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
|
|
Remove all incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD. Instead,
start using XFS_GQUOTA_.* XFS_PQUOTA_.* counterparts for GQUOTA and
PQUOTA respectively.
On-disk copy still uses XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD.
Read and write of the superblock does the conversion from *OQUOTA*
to *[PG]QUOTA*.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
This is patch 1/3 of a patch set that avoids what misleadingly appears
to be a error during boot:
ERST: Could not register with persistent store
This message is displayed if the system has a valid ACPI ERST table and the
pstore.backend kernel parameter has been used to disable use of ERST by
pstore. But this same message is used for errors that preclude registration.
As part of fixing this, return a unique error status from pstore_register
if the pstore.backend kernel parameter selects a specific facility other
than the requesting facility and check for this condition before any others.
This allows the caller to distinquish this benign case from the other failure
cases.
Also, print an informational console message about which facility
successfully registered as the pstore backend. Since there are various
kernel parameters, config build options, and boot-time errors that can
influence which facility registers with pstore, it's useful to have a
positive indication.
Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
Reported-by: Naotaka Hamaguchi <n.hamaguchi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
* labeled-nfs:
NFS: Apply v4.1 capabilities to v4.2
NFS: Add in v4.2 callback operation
NFS: Make callbacks minor version generic
Kconfig: Add Kconfig entry for Labeled NFS V4 client
NFS: Extend NFS xattr handlers to accept the security namespace
NFS: Client implementation of Labeled-NFS
NFS: Add label lifecycle management
NFS:Add labels to client function prototypes
NFSv4: Extend fattr bitmaps to support all 3 words
NFSv4: Introduce new label structure
NFSv4: Add label recommended attribute and NFSv4 flags
NFSv4.2: Added NFS v4.2 support to the NFS client
SELinux: Add new labeling type native labels
LSM: Add flags field to security_sb_set_mnt_opts for in kernel mount data.
Security: Add Hook to test if the particular xattr is part of a MAC model.
Security: Add hook to calculate context based on a negative dentry.
NFS: Add NFSv4.2 protocol constants
Conflicts:
fs/nfs/nfs4proc.c
|
|
NFS_CS_MIGRATION makes sense only for NFSv4 mounts. Introduced by
commit 89652617 (NFS: Introduce "migration" mount option) Fri Sep 14
17:24:11 2012.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
nfs4_init_session was originally written to be called prior to
nfs4_init_channel_attrs, setting the session target_max response and request
sizes that nfs4_init_channel_attrs would pay attention to.
In the current code flow, nfs4_init_session, just like nfs4_init_ds_session
for the data server case, is called after the session is all negotiated, and
is actually used in a RECLAIM COMPLETE call to the server.
Remove the un-needed fc_target_max response and request fields from
nfs4_session and just set the max_resp_sz and max_rqst_sz in
nfs4_init_channel_attrs.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The current scheme is to try and pick the auth flavor that the server
prefers. In some cases though, we may find that we're not actually
able to use that auth flavor later. For instance, the server may
prefer an AUTH_GSS flavor, but we may not be able to get GSSAPI creds.
The current code just gives up at that point. Change it instead to
try the ->create_server call using each of the different authflavors
in the server's list if one was not specified at mount time. Once
we have a successful ->create_server call, return the result. Only
give up and return error if all attempts fail.
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Instead of handling this as a special case in the auth-selection code,
we can simply fake up an auth_flavs list when the server doesn't
provide it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
In a later patch we're going to want to cycle over this list and attempt
to call ->create_server for each different flavor until one succeeds.
Move the list allocation to the stack of nfs_try_mount_request() and
pass a pointer to it and its length to nfs_request_mount().
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This looks like pointless refactoring for now, but we'll flesh out
the need_mount case a little more in a later patch.
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The GETDEVICEINFO gdia_maxcount represents all of the data being returned
within the GETDEVICEINFO4resok structure and includes the XDR overhead.
The CREATE_SESSION ca_maxresponsesize is the maximum reply and includes the RPC
headers (including security flavor credentials and verifiers).
Split out the struct pnfs_device field maxcount which is the gdia_maxcount
from the pglen field which is the reply (the total) buffer length.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Fallback should happen only when the request_key() call fails, because
this indicates that there was a problem running the nfsidmap program.
We shouldn't call the legacy code if the error was elsewhere.
Signed-off-by: Bryan Schumaker <bjschuma@netappp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
In preparation for combined pquota/gquota support, for the sake
of readability, change xfs_dquot_acct to be a 2-dimensional array.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
In preparation for combined pquota/gquota support, for the sake
of readability, do some code cleanup surrounding the affected
code.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
In preparation for combined pquota/gquota support, for the sake
of readability, change the macro to an inline function.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
In preparation for combined pquota/gquota support, for the sake
of readability, change the macro to an inline function.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
In preparation for combined pquota/gquota support, define
a new function to check if the given inode is a quota inode.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
For CRC enabled filesystems, add support for the monotonic inode
version change counter that is needed by protocols like NFSv4 for
determining if the inode has changed in any way at all between two
unrelated operations on the inode.
This bumps the change count the first time an inode is dirtied in a
transaction. Since all modifications to the inode are logged, this
will catch all changes that are made to the inode, including
timestamp updates that occur during data writes.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Code in blkdev.c moves a device inode to default_backing_dev_info when
the last reference to the device is put and moves the device inode back
to its bdi when the first reference is acquired. This includes moving to
wb.b_dirty list if the device inode is dirty. The code however doesn't
setup timer to wake corresponding flusher thread and while wb.b_dirty
list is non-empty __mark_inode_dirty() will not set it up either. Thus
periodic writeback is effectively disabled until a sync(2) call which can
lead to unexpected data loss in case of crash or power failure.
Fix the problem by setting up a timer for periodic writeback in case we
add the first dirty inode to wb.b_dirty list in bdev_inode_switch_bdi().
Reported-by: Bert De Jonghe <Bert.DeJonghe@amplidata.com>
CC: stable@vger.kernel.org # >= 3.0
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
* freezer:
af_unix: use freezable blocking calls in read
sigtimedwait: use freezable blocking call
nanosleep: use freezable blocking call
futex: use freezable blocking call
select: use freezable blocking call
epoll: use freezable blocking call
binder: use freezable blocking calls
freezer: add new freezable helpers using freezer_do_not_count()
freezer: convert freezable helpers to static inline where possible
freezer: convert freezable helpers to freezer_do_not_count()
freezer: skip waking up tasks with PF_FREEZER_SKIP set
freezer: shorten freezer sleep time using exponential backoff
lockdep: check that no locks held at freeze time
lockdep: remove task argument from debug_check_no_locks_held
freezer: add unsafe versions of freezable helpers for CIFS
freezer: add unsafe versions of freezable helpers for NFS
|
|
Commit 9ddec56131 (cifs: move handling of signed connections into
separate function) broke signing on SMB2/3 connections. While the code
to enable signing on the connections was very similar between the two,
the bits that get set in the sec_mode are different.
Declare a couple of new smb_version_values fields and set them
appropriately for SMB1 and SMB2/3. Then change cifs_enable_signing to
use those instead.
Reported-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
|
|
Replace the use of buffer based logging of inode initialisation,
uses the new logical form to describe the range to be initialised
in recovery. We continue to "log" the inode buffers to push them
into the AIL and ensure that the inode create transaction is not
removed from the log before the inode buffers are written to disk.
Update the transaction identifier and reservations to match the
changed implementation.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
When we find a icreate transaction, we need to get and initialise
the buffers in the range that has been passed. Extract and verify
the information in the item record, then loop over the range
initialising and issuing the buffer writes delayed.
Support an arbitrary size range to initialise so that in
future when we allocate inodes in much larger chunks all kernels
that understand this transaction can still recover them.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Define the log and space transaction sizes. Factor the current
create log reservation macro into the two logical halves and reuse
one half for the new icreate transactions. The icreate transaction
is transparent to all the high level create code - the
pre-calculated reservations will correctly set the reservations
dependent on whether the filesystem supports the icreate
transaction.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Introduce the inode create log item type for logical inode create logging.
Instead of logging the changes in buffers, pass the range to be
initialised through the log by a new transaction type. This reduces
the amount of log space required to record initialisation during
allocation from about 128 bytes per inode to a small fixed amount
per inode extent to be initialised.
This requires a new log item type to track it through the log
and the AIL. This is a relatively simple item - most callbacks are
noops as this item has the same life cycle as the transaction.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
If we have a buffer that we have modified but we do not wish to
physically log in a transaction (e.g. we've logged a logical
change), we still need to ensure that transactional integrity is
maintained. Hence we must not move the tail of the log past the
transaction that the buffer is associated with before the buffer is
written to disk.
This means these special buffers still need to be included in the
transaction and added to the AIL just like a normal buffer, but we
do not want the modifications to the buffer written into the
transaction. IOWs, what we want is an "ordered buffer" that
maintains the same transactional life cycle as a physically logged
buffer, just without the transcribing of the modifications to the
log.
Hence we need to flag the buffer as an "ordered buffer" to avoid
including it in vector size calculations or formatting during the
transaction. Once the transaction is committed, the buffer appears
for all intents to be the same as a physically logged buffer as it
transitions through the log and AIL.
Relogging will also work just fine for such an ordered buffer - the
logical transaction will be replayed before the subsequent
modifications that relog the buffer, so everything will be
reconstructed correctly by recovery.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
And "ordered log vector" is a log vector that is used for
tracking a log item through the CIL and into the AIL as part of the
log checkpointing. These ordered log vectors are special in that
they are not written to to journal in any way, and are not accounted
to the checkpoint being written.
The reason for this behaviour is to allow operations to attach items
to transactions and have them follow the normal transactional
lifecycle without actually having to write them to the journal. This
allows logging of items that track high level logical changes and
writing them to the log, while the physical items being modified
pass through into the AIL and pin the tail of the log (and therefore
the logical item in the log) until all the modified items are
physically written to disk.
IOWs, it allows us to write metadata without physically logging
every individual change but still maintain the full transactional
integrity guarantees we currently have w.r.t. crash recovery.
This change modifies some of the CIL item insertion loops, as
ordered log vectors introduce some new constraints as they don't
track any data. One advantage of this change is that it combines
two log vector chain walks into a single pass, so there is less
overhead in the transaction commit pass as well. It also kills some
unused code in the log vector walk loop when committing the CIL.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|
|
Long ago, bulkstat used to read inodes directly from the backing
buffer for speed. This had the unfortunate problem of being cache
incoherent with unlinks, and so xfs_ifree() had to mark the inode
as free directly in the backing buffer. bulkstat was changed some
time ago to use inode cache coherent lookups, and so will never see
unlinked inodes in it's lookups. Hence xfs_ifree() does not need to
touch the inode backing buffer anymore.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
|