summaryrefslogtreecommitdiffstats
path: root/fs/jbd2
AgeCommit message (Collapse)Author
2009-01-08Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits) jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs ext4: Remove "extents" mount option block: Add Kconfig help which notes that ext4 needs CONFIG_LBD ext4: Make printk's consistently prefixed with "EXT4-fs: " ext4: Add sanity checks for the superblock before mounting the filesystem ext4: Add mount option to set kjournald's I/O priority jbd2: Submit writes to the journal using WRITE_SYNC jbd2: Add pid and journal device name to the "kjournald2 starting" message ext4: Add markers for better debuggability ext4: Remove code to create the journal inode ext4: provide function to release metadata pages under memory pressure ext3: provide function to release metadata pages under memory pressure add releasepage hooks to block devices which can be used by file systems ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc ext4: Init the complete page while building buddy cache ext4: Don't allow new groups to be added during block allocation ext4: mark the blocks/inode bitmap beyond end of group as used ext4: Use new buffer_head flag to check uninit group bitmaps initialization ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() ext4: code cleanup ...
2009-01-06jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fsJan Kara
On 32-bit system with CONFIG_LBD getblk can fail because provided block number is too big. Add error checks so we fail gracefully if getblk() returns NULL (which can also happen on memory allocation failures). Thanks to David Maciejak from Fortinet's FortiGuard Global Security Research Team for reporting this bug. http://bugzilla.kernel.org/show_bug.cgi?id=12370 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> cc: stable@kernel.org
2009-01-05jbd2: Add buffer triggersJoel Becker
Filesystems often to do compute intensive operation on some metadata. If this operation is repeated many times, it can be very expensive. It would be much nicer if the operation could be performed once before a buffer goes to disk. This adds triggers to jbd2 buffer heads. Just before writing a metadata buffer to the journal, jbd2 will optionally call a commit trigger associated with the buffer. If the journal is aborted, an abort trigger will be called on any dirty buffers as they are dropped from pending transactions. ocfs2 will use this feature. Initially I tried to come up with a more generic trigger that could be used for non-buffer-related events like transaction completion. It doesn't tie nicely, because the information a buffer trigger needs (specific to a journal_head) isn't the same as what a transaction trigger needs (specific to a tranaction_t or perhaps journal_t). So I implemented a buffer set, with the understanding that journal/transaction wide triggers should be implemented separately. There is only one trigger set allowed per buffer. I can't think of any reason to attach more than one set. Contrast this with a journal or transaction in which multiple places may want to watch the entire transaction separately. The trigger sets are considered static allocation from the jbd2 perspective. ocfs2 will just have one trigger set per block type, setting the same set on every bh of the same type. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-04jbd2: Submit writes to the journal using WRITE_SYNCTheodore Ts'o
Since we will be waiting the write of the commit record to the journal to complete in journal_submit_commit_record(), submit it using WRITE_SYNC. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-03jbd2: Add pid and journal device name to the "kjournald2 starting" messageTheodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-06ext4: Remove code to create the journal inodeTheodore Ts'o
This code has been obsolete in quite some time, since the supported method for adding a journal inode is to use tune2fs (or to creating new filesystem with a journal via mke2fs or mkfs.ext4). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-11-06jbd2: Call journal commit callback without holding j_list_lockAneesh Kumar K.V
Avoid freeing the transaction in __jbd2_journal_drop_transaction() so the journal commit callback can run without holding j_list_lock, to avoid lock contention on this spinlock. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-11-05jbd2: Remove a large array of bh's from the stack of the checkpoint routineTheodore Ts'o
jbd2_log_do_checkpoint()n is one of the kernel's largest stack users. Move the array of buffer head's from the stack of jbd2_log_do_checkpoint() to the in-core journal structure. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-03ext4: add fsync batch tuning knobsTheodore Ts'o
Add new mount options, min_batch_time and max_batch_time, which controls how long the jbd2 layer should wait for additional filesystem operations to get batched with a synchronous write transaction. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-12-17ext4: display average commit timeTheodore Ts'o
Display the average commit time (which is used by the ext4 fsync batching patch) in /proc/fs/jbd2/*/info for performance tuning purposes. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-11-26jbd2: improve jbd2 fsync batchingJosef Bacik
This patch removes the static sleep time in favor of a more self optimizing approach where we measure the average amount of time it takes to commit a transaction to disk and the ammount of time a transaction has been running. If somebody does a sync write or an fsync() traditionally we would sleep for 1 jiffies, which depending on the value of HZ could be a significant amount of time compared to how long it takes to commit a transaction to the underlying storage. With this patch instead of sleeping for a jiffie, we check to see if the amount of time this transaction has been running is less than the average commit time, and if it is we sleep for the delta using schedule_hrtimeout to give us a higher precision sleep time. This greatly benefits high end storage where you could end up sleeping for longer than it takes to commit the transaction and therefore sitting idle instead of allowing the transaction to be committed by keeping the sleep time to a minimum so you are sure to always be doing something. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-05jbd2: Add barrier not supported test to journal_wait_on_commit_recordTheodore Ts'o
Xen doesn't report that barriers are not supported until buffer I/O is reported as completed, instead of when the buffer I/O is submitted. Add a check and a fallback codepath to journal_wait_on_commit_record() to detect this case, so that attempts to mount ext4 filesystems on LVM/devicemapper devices on Xen guests don't blow up with an "Aborting journal on device XXX"; "Remounting filesystem read-only" error. Thanks to Andreas Sundstrom for reporting this issue. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2008-11-02jbd2: deregister proc on failure in jbd2_journal_init_inodeSami Liedes
jbd2_journal_init_inode() does not call jbd2_stats_proc_exit() on all failure paths after calling jbd2_stats_proc_init(). This leaves dangling references to the fs in proc. This patch fixes a bug reported by Sami Leides at: http://bugzilla.kernel.org/show_bug.cgi?id=11493 Signed-off-by: Sami Liedes <sliedes@cc.hut.fi> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-11-06jbd2: don't give up looking for space so easily in __jbd2_log_wait_for_spaceTheodore Ts'o
Commit 23f8b79e introducd a regression because it assumed that if there were no transactions ready to be checkpointed, that no progress could be made on making space available in the journal, and so the journal should be aborted. This assumption is false; it could be the case that simply calling jbd2_cleanup_journal_tail() will recover the necessary space, or, for small journals, the currently committing transaction could be responsible for chewing up the required space in the log, so we need to wait for the currently committing transaction to finish before trying to force a checkpoint operation. This patch fixes a bug reported by Mihai Harpau at: https://bugzilla.redhat.com/show_bug.cgi?id=469582 This patch fixes a bug reported by François Valenduc at: http://bugzilla.kernel.org/show_bug.cgi?id=11840 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Duane Griffin <duaneg@dghda.com> Cc: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
2008-10-28jbd2: Call the commit callback before the transaction could get droppedTheodore Ts'o
The transaction can potentially get dropped if there are no buffers that need to be written. Make sure we call the commit callback before potentially deciding to drop the transaction. Also avoid dereferencing the commit_transaction pointer in the marker for the same reason. This patch fixes the bug reported by Eric Paris at: http://bugzilla.kernel.org/show_bug.cgi?id=11838 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Eric Sandeen <sandeen@redhat.com> Tested-by: Eric Paris <eparis@redhat.com>
2008-10-20fs/Kconfig: move ext2, ext3, ext4, JBD, JBD2 outAlexey Dobriyan
Use fs/*/Kconfig more, which is good because everything related to one filesystem is in one place and fs/Kconfig is quite fat. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16ext4: Replace hackish ext4_mb_poll_new_transaction with commit callbackTheodore Ts'o
The multiblock allocator needs to be able to release blocks (and issue a blkdev discard request) when the transaction which freed those blocks is committed. Previously this was done via a polling mechanism when blocks are allocated or freed. A much better way of doing things is to create a jbd2 callback function and attaching the list of blocks to be freed directly to the transaction structure. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-10ext4: add an option to control error handling on file dataHidehiro Kawai
If the journal doesn't abort when it gets an IO error in file data blocks, the file data corruption will spread silently. Because most of applications and commands do buffered writes without fsync(), they don't notice the IO error. It's scary for mission critical systems. On the other hand, if the journal aborts whenever it gets an IO error in file data blocks, the system will easily become inoperable. So this patch introduces a filesystem option to determine whether it aborts the journal or just call printk() when it gets an IO error in file data. If you mount an ext4 fs with data_err=abort option, it aborts on file data write error. If you mount it with data_err=ignore, it doesn't abort, just call printk(). data_err=ignore is the default. Here is the corresponding patch of the ext3 version: http://kerneltrap.org/mailarchive/linux-kernel/2008/9/9/3239374 Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-10-10jbd2: don't dirty original metadata buffer on abortHidehiro Kawai
Currently, original metadata buffers are dirtied when they are unfiled whether the journal has aborted or not. Eventually these buffers will be written-back to the filesystem by pdflush. This means some metadata buffers are written to the filesystem without journaling if the journal aborts. So if both journal abort and system crash happen at the same time, the filesystem would become inconsistent state. Additionally, replaying journaled metadata can overwrite the latest metadata on the filesystem partly. Because, if the journal gets aborted, journaled metadata are preserved and replayed during the next mount not to lose uncheckpointed metadata. This would also break the consistency of the filesystem. This patch prevents original metadata buffers from being dirtied on abort by clearing BH_JBDDirty flag from those buffers. Thus, no metadata buffers are written to the filesystem without journaling. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-10-10jbd2: fix error handling for checkpoint ioHidehiro Kawai
When a checkpointing IO fails, current JBD2 code doesn't check the error and continue journaling. This means latest metadata can be lost from both the journal and filesystem. This patch leaves the failed metadata blocks in the journal space and aborts journaling in the case of jbd2_log_do_checkpoint(). To achieve this, we need to do: 1. don't remove the failed buffer from the checkpoint list where in the case of __try_to_free_cp_buf() because it may be released or overwritten by a later transaction 2. jbd2_log_do_checkpoint() is the last chance, remove the failed buffer from the checkpoint list and abort the journal 3. when checkpointing fails, don't update the journal super block to prevent the journaled contents from being cleaned. For safety, don't update j_tail and j_tail_sequence either 4. when checkpointing fails, notify this error to the ext4 layer so that ext4 don't clear the needs_recovery flag, otherwise the journaled contents are ignored and cleaned in the recovery phase 5. if the recovery fails, keep the needs_recovery flag 6. prevent jbd2_cleanup_journal_tail() from being called between __jbd2_journal_drop_transaction() and jbd2_journal_abort() (a possible race issue between jbd2_log_do_checkpoint()s called by jbd2_journal_flush() and __jbd2_log_wait_for_space()) Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-10-12jbd2: abort when failed to log metadata buffersHidehiro Kawai
If we failed to write metadata buffers to the journal space and succeeded to write the commit record, stale data can be written back to the filesystem as metadata in the recovery phase. To avoid this, when we failed to write out metadata buffers, abort the journal before writing the commit record. We can also avoid this kind of corruption by using the journal checksum feature because it can detect invalid metadata blocks in the journal and avoid them from being replayed. So we don't need to care about asynchronous commit record writeout with a checksum. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-10-06jbd2: Fix buffer head leak when writing the commit blockTheodore Ts'o
Also make sure the buffer heads are marked clean before submitting bh for writing. The previous code was marking the buffer head dirty, which would have forced an unneeded write (and seek) to the journal for no good reason. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-05ext4: Add debugging markers that can be used by systemtapTheodore Ts'o
This debugging markers are designed to debug problems such as the random filesystem latency problems reported by Arjan. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-08jbd2: abort instead of waiting for nonexistent transactionDuane Griffin
The __jbd2_log_wait_for_space function sits in a loop checkpointing transactions until there is sufficient space free in the journal. However, if there are no transactions to be processed (e.g. because the free space calculation is wrong due to a corrupted filesystem) it will never progress. Check for space being required when no transactions are outstanding and abort the journal instead of endlessly looping. This patch fixes the bug reported by Sami Liedes at: http://bugzilla.kernel.org/show_bug.cgi?id=10976 Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: Sami Liedes <sliedes@cc.hut.fi> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-06ext4/jbd2: Avoid WARN() messages when failing to write to the superblockTheodore Ts'o
This fixes some very common warnings reported by kerneloops.org Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-09-16jbd2: clean up how the journal device name is printedTheodore Ts'o
Calculate the journal device name once and stash it away in the journal_s structure. This avoids needing to call bdevname() everywhere and reduces stack usage by not needing to allocate an on-stack buffer. In addition, we eliminate the '/' that can appear in device names (e.g. "cciss/c0d0p9" --- see kernel bugzilla #11321) that can cause problems when creating proc directory names, and include the inode number to support ocfs2 which creates multiple journals with different inode numbers. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-08-12Merge branch 'core/locking' into core/urgentIngo Molnar
2008-08-11lockdep: rename map_[acquire|release]() => lock_map_[acquire|release]()Ingo Molnar
the names were too generic: drivers/uio/uio.c:87: error: expected identifier or '(' before 'do' drivers/uio/uio.c:87: error: expected identifier or '(' before 'while' drivers/uio/uio.c:113: error: 'map_release' undeclared here (not in a function) Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11lockdep: map_acquirePeter Zijlstra
Most the free-standing lock_acquire() usages look remarkably similar, sweep them into a new helper. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-04mm: rename page trylockNick Piggin
Converting page lock to new locking bitops requires a change of page flag operation naming, so we might as well convert it to something nicer (!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked). This also facilitates lockdeping of page lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26ext4: unexport jbd2_journal_update_superblockTheodore Ts'o
Remove the unused EXPORT_SYMBOL(jbd2_journal_update_superblock). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-31jbd2: don't abort if flushing file data failedHidehiro Kawai
In ordered mode, the current jbd2 aborts the journal if a file data buffer has an error. But this behavior is unintended, and we found that it has been adopted accidentally. This patch undoes it and just calls printk() instead of aborting the journal. Unlike a similar patch for ext3/jbd, file data buffers are written via generic_writepages(). But we also need to set AS_EIO into their mappings because wait_on_page_writeback_range() clears AS_EIO before a user process sees it. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11ext4: Add ordered mode support for delallocAneesh Kumar K.V
This provides a new ordered mode implementation which gets rid of using buffer heads to enforce the ordering between metadata change with the related data chage. Instead, in the new ordering mode, it keeps track of all of the inodes touched by each transaction on a list, and when that transaction is committed, it flushes all of the dirty pages for those inodes. In addition, the new ordered mode reverses the lock ordering of the page lock and transaction lock, which provides easier support for delayed allocation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11jbd2: Remove data=ordered mode support using jbd buffer headsJan Kara
Signed-off-by: Jan Kara <jack@suse.cz>
2008-07-11jbd2: Implement data=ordered mode handling via inodesJan Kara
This patch adds necessary framework into JBD2 to be able to track inodes with each transaction and write-out their dirty data during transaction commit time. This new ordered mode brings all sorts of advantages such as possibility to get rid of journal heads and buffer heads for data buffers in ordered mode, better ordering of writes on transaction commit, simplification of some JBD code, no more anonymous pages when truncate of data being committed happens. Also with this new ordered mode, delayed allocation on ordered mode is much simpler. Signed-off-by: Jan Kara <jack@suse.cz>
2008-07-13jbd2: fix race between jbd2_journal_try_to_free_buffers() and jbd2 commit ↵Mingming Cao
transaction journal_try_to_free_buffers() could race with jbd commit transaction when the later is holding the buffer reference while waiting for the data buffer to flush to disk. If the caller of journal_try_to_free_buffers() request tries hard to release the buffers, it will treat the failure as error and return back to the caller. We have seen the directo IO failed due to this race. Some of the caller of releasepage() also expecting the buffer to be dropped when passed with GFP_KERNEL mask to the releasepage()->journal_try_to_free_buffers(). With this patch, if the caller is passing the GFP_KERNEL to indicating this call could wait, in case of try_to_free_buffers() failed, let's waiting for journal_commit_transaction() to finish commit the current committing transaction , then try to free those buffers again with journal locked. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11jbd2: Add commit time into the commit blockTheodore Ts'o
Carlo Wood has demonstrated that it's possible to recover deleted files from the journal. Something that will make this easier is if we can put the time of the commit into commit block. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-06-03jbd2: Fix barrier fallback code to re-lock the buffer headTheodore Ts'o
If the device doesn't support write barriers, the write is retried without ordered mode. But the buffer head needs to be re-locked or submit_bh will fail with on BUG(!buffer_locked(bh)). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-06-06jbd2: If a journal checksum error is detected, propagate the error to ext4Theodore Ts'o
If a journal checksum error is detected, the ext4 filesystem will call ext4_error(), and the mount will either continue, become a read-only mount, or cause a kernel panic based on the superblock flags indicating the user's preference of what to do in case of filesystem corruption being detected. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-05-26jbd2: Fix memory leak when verifying checksums in the journalTheodore Ts'o
Cc: Andreas Dilger <adilger@clusterfs.com> Cc: Girish Shilamkar <girish@clusterfs.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-05-15jbd2: update transaction t_state to T_COMMIT fixMingming Cao
Updating the current transaction's t_state is protected by j_state_lock. We need to do the same when updating the t_state to T_COMMIT. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-05-13Fix misuses of bdevname()Jean Delvare
bdevname() fills the buffer that it is given as a parameter, so calling strcpy() or snprintf() on the returned value is redundant (and probably not guaranteed to work - I don't think strcpy and snprintf support overlapping buffers.) Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Stephen Tweedie <sct@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-17jdb2: replace remaining __FUNCTION__ occurrencesHarvey Harrison
__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29jbd2: only create debugfs and stats entries if init is successfulDuane Griffin
jbd2 debugfs and stats entries should only be created if cache initialisation is successful. At the moment they are being created unconditionally which will leave them dangling if cache (and hence module) initialisation fails. Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17jbd2: fix kernel-doc notationRandy Dunlap
Fix kernel-doc notation in jbd2. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17jbd2: replace potentially false assertion with if blockDuane Griffin
If an error occurs during jbd2 cache initialisation it is possible for the journal_head_cache to be NULL when jbd2_journal_destroy_journal_head_cache is called. Replace the J_ASSERT with an if block to handle the situation correctly. Note that even with this fix things will break badly if jbd2 is statically compiled in and cache initialisation fails. Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17jbd2: eliminate duplicated code in revocation table init/destroy functionsDuane Griffin
The revocation table initialisation/destruction code is repeated for each of the two revocation tables stored in the journal. Refactoring the duplicated code into functions is tidier, simplifies the logic in initialisation in particular, and slightly reduces the code size. There should not be any functional change. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-28jbd2: tidy up revoke cache initialisation and destructionDuane Griffin
Make revocation cache destruction safe to call if initialisation fails partially or entirely. This allows it to be used to cleanup in the case of initialisation failure, simplifying the code slightly. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2008-04-17jbd2: fix possible journal overflow issuesJosef Bacik
There are several cases where the running transaction can get buffers added to its BJ_Metadata list which it never dirtied, which makes its t_nr_buffers counter end up larger than its t_outstanding_credits counter. This will cause issues when starting new transactions as while we are logging buffers we decrement t_outstanding_buffers, so when t_outstanding_buffers goes negative, we will report that we need less space in the journal than we actually need, so transactions will be started even though there may not be enough room for them. In the worst case scenario (which admittedly is almost impossible to reproduce) this will result in the journal running out of space. The fix is to only refile buffers from the committing transaction to the running transactions BJ_Modified list when b_modified is set on that journal, which is the only way to be sure if the running transaction has modified that buffer. This patch also fixes an accounting error in journal_forget, it is possible that we can call journal_forget on a buffer without having modified it, only gotten write access to it, so instead of freeing a credit, we only do so if the buffer was modified. The assert will help catch if this problem occurs. Without these two patches I could hit this assert within minutes of running postmark, with them this issue no longer arises. Cc: <linux-ext4@vger.kernel.org> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17jbd2: fix the way the b_modified flag is clearedJosef Bacik
Currently at the start of a journal commit we loop through all of the buffers on the committing transaction and clear the b_modified flag (the flag that is set when a transaction modifies the buffer) under the j_list_lock. The problem is that everywhere else this flag is modified only under the jbd2 lock buffer flag, so it will race with a running transaction who could potentially set it, and have it unset by the committing transaction. This is also a big waste, you can have several thousands of buffers that you are clearing the modified flag on when you may not need to. This patch removes this code and instead clears the b_modified flag upon entering do_get_write_access/journal_get_create_access, so if that transaction does indeed use the buffer then it will be accounted for properly, and if it does not then we know we didn't use it. That will be important for the next patch in this series. Tested thoroughly by myself using postmark/iozone/bonnie++. Cc: <linux-ext4@vger.kernel.org> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>