Age | Commit message (Collapse) | Author |
|
Instead of using pre-allocated 'c->dbg->buf' buffer in
'dbg_check_ltab_lnum()', dynamically allocate it when needed. The
intend is to get rid of the pre-allocated 'c->dbg->buf' buffer and
save 128KiB of RAM (or more if PEB size is larger). Indeed,
currently we allocate this memory even if the user never enables
any self-check, which is wasteful.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
Instead of using pre-allocated 'c->dbg->buf' buffer in
'scan_check_cb()', dynamically allocate it when needed. The intend
is to get rid of the pre-allocated 'c->dbg->buf' buffer and save
128KiB of RAM (or more if PEB size is larger). Indeed, currently we
allocate this memory even if the user never enables any self-check,
which is wasteful.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
Instead of using pre-allocated 'c->dbg->buf' buffer in
'dbg_dump_leb()', dynamically allocate it when needed. The intend
is to get rid of the pre-allocated 'c->dbg->buf' buffer and save
128KiB of RAM (or more if PEB size is larger). Indeed, currently we
allocate this memory even if the user never enables any self-check,
which is wasteful.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
Change the default UBIFS behavior WRT data CRC checking. Currently,
UBIFS checks data CRC when reading, which slows it down quite a bit,
and this is the default option. However, it looks like in average
user does not need this feature and would prefer faster read speed
over extra reliability. And this seems to be de-facto standard that
file-systems do not check data CRC every time they read from the
media.
Thus, make UBIFS default behavior so that it does not check data
CRC. This corresponds to the no_chk_data_crc mount option. Those users
who need extra protection can always enable it using the chk_data_crc
option.
Please, read more information about this feature here:
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_checksumming
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
Remove debug message level and debug checks Kconfig options as they
proved to be useless anyway. We have sysfs interface which we can
use for fine-grained debugging messages and checks selection, see
Documentation/filesystems/ubifs.txt for mode details.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
Improve debugging messages by printing the maximum index node size
on mount.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
Running kernel 2.6.37, my PPC-based device occasionally gets an
order-2 allocation failure in UBIFS, which causes the root FS to
become unwritable:
kswapd0: page allocation failure. order:2, mode:0x4050
Call Trace:
[c787dc30] [c00085b8] show_stack+0x7c/0x194 (unreliable)
[c787dc70] [c0061aec] __alloc_pages_nodemask+0x4f0/0x57c
[c787dd00] [c0061b98] __get_free_pages+0x20/0x50
[c787dd10] [c00e4f88] ubifs_jnl_write_data+0x54/0x200
[c787dd50] [c00e82d4] do_writepage+0x94/0x198
[c787dd90] [c00675e4] shrink_page_list+0x40c/0x77c
[c787de40] [c0067de0] shrink_inactive_list+0x1e0/0x370
[c787de90] [c0068224] shrink_zone+0x2b4/0x2b8
[c787df00] [c0068854] kswapd+0x408/0x5d4
[c787dfb0] [c0037bcc] kthread+0x80/0x84
[c787dff0] [c000ef44] kernel_thread+0x4c/0x68
Similar problems were encountered last April by Tomasz Stanislawski:
http://patchwork.ozlabs.org/patch/50965/
This patch implements Artem's suggested fix: fall back to a
mutex-protected static buffer, allocated at mount time. I tested it
by forcing execution down the failure path, and didn't see any ill
effects.
Artem: massaged the patch a little, improved it so that we'd not
allocate the write reserve buffer when we are in R/O mode.
Signed-off-by: Matthew L. Creech <mlcreech@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
When recovering from unclean reboots UBIFS scans the journal and checks nodes.
If a corrupted node is found, UBIFS tries to check if this is the last node
in the LEB or not. This is is done by checking if there only 0xFF bytes
starting from the next min. I/O unit. However, since now we write in
c->max_write_size, we should actually check for 0xFFs starting from the
next max. write unit.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
Switch write-buffers from 'c->min_io_size' to 'c->max_write_size' which
presumably has to be more write speed-efficient. However, when write-buffer
is synchronized, write only the the min. I/O units which contain the
data, do not write whole write-buffer. This is more space-efficient.
Additionally, this patch takes into account that the LEB might not start
from the max. write unit-aligned address.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
Currently we assume write-buffer size is always min_io_size. But
this is about to change and write-buffers may be of variable size.
Namely, they will be of max_write_size at the beginning, but will
get smaller when we are approaching the end of LEB.
This is a preparation patch which introduces 'size' field in
the write-buffer structure which carries the current write-buffer
size.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
Incorporate the LEB offset information into UBIFS. We'll use this
information in one of the next patches to figure out what are the
max. write size offsets relative to the PEB. So this patch is just
a preparation.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
Incorporate maximum write size into the UBIFS description data
structure. This patch just introduces new 'c->max_write_size'
and 'c->max_write_shift' fields as a preparation for the following
patches.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
This is a minor patch which fixes the LEB number we print when
corrupted empty space is found.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
Don't allow everybody to dump sensitive information about filesystems.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
This patch adds more commentaries about UBIFS recovery logic which should
explain the famous UBIFS "corrupt empty space" errors.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
Just a tiny clean-up - remove ;;
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
This patch fixes suboptimal UBIFS 'sync_fs()' implementation which causes
flash I/O even if the file-system is synchronized. E.g., a 'printk()'
in the MTD erasure function (e.g., 'nand_erase_nand()') can show that
for every 'sync' shell command UBIFS erases at least one eraseblock.
So '$ while true; do sync; done' will cause huge amount of flash I/O.
The reason for this is that UBIFS commits in 'sync_fs()', and starts the
commit even if there is nothing to commit, e.g., it anyway changes the
log. This patch adds a check in the 'do_commit()' UBIFS functions which
prevents the commit if there is nothing to commit.
Reported-by: Hans J. Koch <hjk@linutronix.de>
Tested-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
This is a preparational patch which removes the 'c->always_chk_crc' which was
set during mounting and remounting to R/W mode and introduces 'c->mounting'
flag which is set when mounting. Now the 'c->always_chk_crc' flag is the
same as 'c->remounting_rw && c->mounting'.
This patch is a preparation for the next one which will need to know when we
are mounting and remounting to R/W mode, which is exactly what
'c->always_chk_crc' effectively is, but its name does not suite the
next patch. The other possibility would be to just re-name it, but then
we'd end up with less logical flags coverage.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
This is a cosmetic patch which re-arranges variables in 'struct ubifs_info'
so that all boolean-like variables which are only changed during mounting or
re-mounting to R/W mode are places together. Then they are turned into
bit-fields, which makes the structure a little bit smaller.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2: Fix system inodes cache overflow.
ocfs2: Hold ip_lock when set/clear flags for indexed dir.
ocfs2: Adjust masklog flag values
Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem.
ocfs2/dlm: Migrate lockres with no locks if it has a reference
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'linus-hot-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix on-line resizing regression
|
|
https://bugzilla.kernel.org/show_bug.cgi?id=25352
This regression was caused by commit a31437b85: "ext4: use
sb_issue_zeroout in setup_new_group_blocks", by accidentally dropping
the code which reserved the block group descriptor and inode table
blocks.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
This happens when __logfs_create() tries to write a new inode to the disk
which is full.
__logfs_create() associates the transaction pointer with inode. During
the logfs_write_inode() function call chain this transaction pointer is
moved from inode to page->private using function move_inode_to_page
(do_write_inode() -> inode_to_page() -> move_inode_to_page)
When the write inode fails, the transaction is aborted and iput is called
on the failed inode. During delete_inode the same transaction pointer
associated with the page is getting used. Thus causing kernel BUG.
The patch checks for error in write_inode() and restores the page->private
to NULL.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=20162
Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Florian Mickler <florian@mickler.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
do_logfs_journal_wl_pass() should use GFP_NOFS for memory allocation GC
code calls btree_insert32 with GFP_KERNEL while holding a mutex
super->s_write_mutex.
The same mutex is used in address_space_operations->writepage(), and a
call to writepage() could be triggered as a result of memory allocation
in btree_insert32, causing a deadlock.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=20342
Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Joern Engel <joern@logfs.org>
Cc: Florian Mickler <florian@mickler.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When we store system inodes cache in ocfs2_super,
we use a array for global system inodes. But unfortunately,
the range is calculated wrongly which makes it overflow and
pollute ocfs2_super->local_system_inodes.
This patch fix it by setting the range properly.
The corresponding bug is ossbug1303.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1303
Cc: stable@kernel.org
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: handle partial result from get_user_pages
ceph: mark user pages dirty on direct-io reads
ceph: fix null pointer dereference in ceph_init_dentry for nfs reexport
ceph: fix direct-io on non-page-aligned buffers
ceph: fix msgr_init error path
|
|
Buggered-in: 76dda93c6ae2 ("Btrfs: add snapshot/subvolume destroy
ioctl")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
For read operation, we have to set the argument _write_ of get_user_pages
to 1 since we will write data to pages. Also, we need to SetPageDirty before
releasing these pages.
Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
The fh_to_dentry etc. methods use ceph_init_dentry(), which assumes that
d_parent is defined. It isn't for those callers, so check!
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
fanotify: fill in the metadata_len field on struct fanotify_event_metadata
fanotify: split version into version and metadata_len
fanotify: Dont try to open a file descriptor for the overflow event
fanotify: Introduce FAN_NOFD
fanotify: do not leak user reference on allocation failure
inotify: stop kernel memory leak on file creation failure
fanotify: on group destroy allow all waiters to bypass permission check
fanotify: Dont allow a mask of 0 if setting or removing a mark
fanotify: correct broken ref counting in case adding a mark failed
fanotify: if set by user unset FMODE_NONOTIFY before fsnotify_perm() is called
fanotify: remove packed from access response message
fanotify: deny permissions when no event was sent
|
|
When we set/clear the dyn_features for an inode we hold the ip_lock.
So do it when we set/clear OCFS2_INDEXED_DIR_FL also.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
|
Two masklogs had the same flag value.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
|
On 2.6.37-rc1, garbage collection ioctl of nilfs was broken due to the
commit 263d90cefc7d82a0 ("nilfs2: remove own inode hash used for GC"),
and leading to filesystem corruption.
The patch doesn't queue gc-inodes for log writer if they are reused
through the vfs inode cache. Here, gc-inode is the inode which
buffers blocks to be relocated on GC. That patch queues gc-inodes in
nilfs_init_gcinode() function, but this function is not called when
they don't have I_NEW flag. Thus, some of live blocks are wrongly
overrode without being moved to new logs.
This resolves the problem by moving the gc-inode queueing to an outer
function to ensure it's done right.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
|
|
The user buffer may be 512-byte aligned, not page-aligned. We were
assuming the buffer was page-aligned and only accounting for
non-page-aligned io offsets.
Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix typo which broke '..' detection in ext4_find_entry()
ext4: Turn off multiple page-io submission by default
|
|
The install_special_mapping routine (used, for example, to setup the
vdso) skips the security check before insert_vm_struct, allowing a local
attacker to bypass the mmap_min_addr security restriction by limiting
the available pages for special mappings.
bprm_mm_init() also skips the check, and although I don't think this can
be used to bypass any restrictions, I don't see any reason not to have
the security check.
$ uname -m
x86_64
$ cat /proc/sys/vm/mmap_min_addr
65536
$ cat install_special_mapping.s
section .bss
resb BSS_SIZE
section .text
global _start
_start:
mov eax, __NR_pause
int 0x80
$ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s
$ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o
$ ./install_special_mapping &
[1] 14303
$ cat /proc/14303/maps
0000f000-00010000 r-xp 00000000 00:00 0 [vdso]
00010000-00011000 r-xp 00001000 00:19 2453665 /home/taviso/install_special_mapping
00011000-ffffe000 rwxp 00000000 00:00 0 [stack]
It's worth noting that Red Hat are shipping with mmap_min_addr set to
4096.
Signed-off-by: Tavis Ormandy <taviso@google.com>
Acked-by: Kees Cook <kees@ubuntu.com>
Acked-by: Robert Swiecki <swiecki@google.com>
[ Changed to not drop the error code - akpm ]
Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The fanotify_event_metadata now has a field which is supposed to
indicate the length of the metadata portion of the event. Fill in that
field as well.
Based-in-part-on-patch-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
There should be a check for the NUL character instead of '0'.
Fortunately the only thing that cares about this is NFS serving, which
is why we didn't notice this in the merge window testing.
Reported-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Jon Nelson has found a test case which causes postgresql to fail with
the error:
psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581
Under memory pressure, it looks like part of a file can end up getting
replaced by zero's. Until we can figure out the cause, we'll roll
back the change and use block_write_full_page() instead of
ext4_bio_write_page(). The new, more efficient writing function can
be used via the mount option mblk_io_submit, so we can test and fix
the new page I/O code.
To reproduce the problem, install postgres 8.4 or 9.0, and pin enough
memory such that the system just at the end of triggering writeback
before running the following sql script:
begin;
create temporary table foo as select x as a, ARRAY[x] as b FROM
generate_series(1, 10000000 ) AS x;
create index foo_a_idx on foo (a);
create index foo_b_idx on foo USING GIN (b);
rollback;
If the temporary table is created on a hard drive partition which is
encrypted using dm_crypt, then under memory pressure, approximately
30-40% of the time, pgsql will issue the above failure.
This patch should fix this problem, and the problem will come back if
the file system is mounted with the mblk_io_submit mount option.
Reported-by: Jon Nelson <jnelson@jamponi.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux:
nfsd: Fix possible BUG_ON firing in set_change_info
sunrpc: prevent use-after-free on clearing XPT_BUSY
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: prevent RAID level downgrades when space is low
Btrfs: account for missing devices in RAID allocation profiles
Btrfs: EIO when we fail to read tree roots
Btrfs: fix compiler warnings
Btrfs: Make async snapshot ioctl more generic
Btrfs: pwrite blocked when writing from the mmaped buffer of the same page
Btrfs: Fix a crash when mounting a subvolume
Btrfs: fix sync subvol/snapshot creation
Btrfs: Fix page leak in compressed writeback path
Btrfs: do not BUG if we fail to remove the orphan item for dead snapshots
Btrfs: fixup return code for btrfs_del_orphan_item
Btrfs: do not do fast caching if we are allocating blocks for tree_root
Btrfs: deal with space cache errors better
Btrfs: fix use after free in O_DIRECT
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: verify ioctl retries
fuse: fix ioctl when server is 32bit
|
|
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: log timestamp changes to the source inode in rename
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: fix ioctl magic
ceph: Behave better when handling file lock replies.
ceph: pass lock information by struct file_lock instead of as individual params.
ceph: Handle file locks in replies from the MDS.
ceph: avoid possible null deref in readdir after dir llseek
|
|
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix panic after nfs_umount()
nfs: remove extraneous and problematic calls to nfs_clear_request
nfs: kernel should return EPROTONOSUPPORT when not support NFSv4
NFS: Fix fcntl F_GETLK not reporting some conflicts
nfs: Discard ACL cache on mode update
NFS: Readdir cleanups
NFS: nfs_readdir_search_for_cookie() don't mark as eof if cookie not found
NFS: Fix a memory leak in nfs_readdir
Call the filesystem back whenever a page is removed from the page cache
NFS: Ensure we use the correct cookie in nfs_readdir_xdr_filler
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: remove bogus remapping of error in cifs_filldir()
cifs: allow calling cifs_build_path_to_root on incomplete cifs_sb
cifs: fix check of error return from is_path_accessable
cifs: remove Local_System_Name
cifs: fix use of CONFIG_CIFS_ACL
cifs: add attribute cache timeout (actimeo) tunable
|
|
The extent allocator has code that allows us to fill
allocations from any available block group, even if it doesn't
match the raid level we've requested.
This was put in because adding a new drive to a filesystem
made with the default mkfs options actually upgrades the metadata from
single spindle dup to full RAID1.
But, the code also allows us to allocate from a raid0 chunk when we
really want a raid1 or raid10 chunk. This can cause big trouble because
mkfs creates a small (4MB) raid0 chunk for data and metadata which then
goes unused for raid1/raid10 installs.
The allocator will happily wander in and allocate from that chunk when
things get tight, which is not correct.
The fix here is to make sure that we provide duplication when the
caller has asked for it. It does all the dups to be any raid level,
which preserves the dup->raid1 upgrade abilities.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
When we mount in RAID degraded mode without adding a new device to
replace the failed one, we can end up using the wrong RAID flags for
allocations.
This results in strange combinations of block groups (raid1 in a raid10
filesystem) and corruptions when we try to allocate blocks from single
spindle chunks on drives that are actually missing.
The first device has two small 4MB chunks in it that mkfs creates and
these are usually unused in a raid1 or raid10 setup. But, in -o degraded,
the allocator will fall back to these because the mask of desired raid groups
isn't correct.
The fix here is to count the missing devices as we build up the list
of devices in the system. This count is used when picking the
raid level to make sure we continue using the same levels that were
in place before we lost a drive.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
If we just get a plain IO error when we read tree roots, the code
wasn't properly sending that error up the chain. This allowed mounts to
continue when they should failed, and allowed operations
on partially setup root structs. The end result was usually oopsen
on spinlocks that hadn't been spun up correctly.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
... regarding an unused function when !MIGRATION, and regarding a
printk() format string vs argument mismatch.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|