From 7eaceaccab5f40bbfda044629a6298616aeaed50 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Thu, 10 Mar 2011 08:52:07 +0100 Subject: block: remove per-queue plugging Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe --- mm/page-writeback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm/page-writeback.c') diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 2cb01f6ec5d..cc0ede169e4 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1239,7 +1239,7 @@ int set_page_dirty_lock(struct page *page) { int ret; - lock_page_nosync(page); + lock_page(page); ret = set_page_dirty(page); unlock_page(page); return ret; -- cgit v1.2.3-70-g09d2 From 9b6096a65f99a89dfd8328c4e469e7b53b3ae04a Mon Sep 17 00:00:00 2001 From: Shaohua Li Date: Thu, 17 Mar 2011 10:47:06 +0100 Subject: mm: make generic_writepages() use plugging This recovers a performance regression caused by the removal of the per-device plugging. Signed-off-by: Jens Axboe --- mm/page-writeback.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'mm/page-writeback.c') diff --git a/mm/page-writeback.c b/mm/page-writeback.c index cc0ede169e4..24b7ac2bc36 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1039,11 +1039,17 @@ static int __writepage(struct page *page, struct writeback_control *wbc, int generic_writepages(struct address_space *mapping, struct writeback_control *wbc) { + struct blk_plug plug; + int ret; + /* deal with chardevs and other special file */ if (!mapping->a_ops->writepage) return 0; - return write_cache_pages(mapping, wbc, __writepage, mapping); + blk_start_plug(&plug); + ret = write_cache_pages(mapping, wbc, __writepage, mapping); + blk_finish_plug(&plug); + return ret; } EXPORT_SYMBOL(generic_writepages); -- cgit v1.2.3-70-g09d2 From 278df9f451dc71dcd002246be48358a473504ad0 Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Tue, 22 Mar 2011 16:32:54 -0700 Subject: mm: reclaim invalidated page ASAP invalidate_mapping_pages is very big hint to reclaimer. It means user doesn't want to use the page any more. So in order to prevent working set page eviction, this patch move the page into tail of inactive list by PG_reclaim. Please, remember that pages in inactive list are working set as well as active list. If we don't move pages into inactive list's tail, pages near by tail of inactive list can be evicted although we have a big clue about useless pages. It's totally bad. Now PG_readahead/PG_reclaim is shared. fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for preventing fast reclaiming readahead marker page. In this series, PG_reclaim is used by invalidated page, too. If VM find the page is invalidated and it's dirty, it sets PG_reclaim to reclaim asap. Then, when the dirty page will be writeback, clear_page_dirty_for_io will clear PG_reclaim unconditionally. It disturbs this serie's goal. I think it's okay to clear PG_readahead when the page is dirty, not writeback time. So this patch moves ClearPageReadahead. In v4, ClearPageReadahead in set_page_dirty has a problem which is reported by Steven Barrett. It's due to compound page. Some driver(ex, audio) calls set_page_dirty with compound page which isn't on LRU. but my patch does ClearPageRelcaim on compound page. In non-CONFIG_PAGEFLAGS_EXTENDED, it breaks PageTail flag. I think it doesn't affect THP and pass my test with THP enabling but Cced Andrea for double check. Signed-off-by: Minchan Kim Reported-by: Steven Barrett Reviewed-by: Johannes Weiner Acked-by: Rik van Riel Acked-by: Mel Gorman Cc: Wu Fengguang Cc: KOSAKI Motohiro Cc: Nick Piggin Cc: Andrea Arcangeli Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page-writeback.c | 12 +++++++++++- mm/swap.c | 41 ++++++++++++++++++++++++++++++++++++++--- 2 files changed, 49 insertions(+), 4 deletions(-) (limited to 'mm/page-writeback.c') diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 2cb01f6ec5d..b437fe6257b 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1211,6 +1211,17 @@ int set_page_dirty(struct page *page) if (likely(mapping)) { int (*spd)(struct page *) = mapping->a_ops->set_page_dirty; + /* + * readahead/lru_deactivate_page could remain + * PG_readahead/PG_reclaim due to race with end_page_writeback + * About readahead, if the page is written, the flags would be + * reset. So no problem. + * About lru_deactivate_page, if the page is redirty, the flag + * will be reset. So no problem. but if the page is used by readahead + * it will confuse readahead and make it restart the size rampup + * process. But it's a trivial problem. + */ + ClearPageReclaim(page); #ifdef CONFIG_BLOCK if (!spd) spd = __set_page_dirty_buffers; @@ -1266,7 +1277,6 @@ int clear_page_dirty_for_io(struct page *page) BUG_ON(!PageLocked(page)); - ClearPageReclaim(page); if (mapping && mapping_cap_account_dirty(mapping)) { /* * Yes, Virginia, this is indeed insane. diff --git a/mm/swap.c b/mm/swap.c index 1b9e4ebaffc..0a33714a7cb 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -354,26 +354,61 @@ void add_page_to_unevictable_list(struct page *page) * head of the list, rather than the tail, to give the flusher * threads some time to write it out, as this is much more * effective than the single-page writeout from reclaim. + * + * If the page isn't page_mapped and dirty/writeback, the page + * could reclaim asap using PG_reclaim. + * + * 1. active, mapped page -> none + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim + * 3. inactive, mapped page -> none + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim + * 5. inactive, clean -> inactive, tail + * 6. Others -> none + * + * In 4, why it moves inactive's head, the VM expects the page would + * be write it out by flusher threads as this is much more effective + * than the single-page writeout from reclaim. */ static void lru_deactivate(struct page *page, struct zone *zone) { int lru, file; + bool active; - if (!PageLRU(page) || !PageActive(page)) + if (!PageLRU(page)) return; /* Some processes are using the page */ if (page_mapped(page)) return; + active = PageActive(page); + file = page_is_file_cache(page); lru = page_lru_base_type(page); - del_page_from_lru_list(zone, page, lru + LRU_ACTIVE); + del_page_from_lru_list(zone, page, lru + active); ClearPageActive(page); ClearPageReferenced(page); add_page_to_lru_list(zone, page, lru); - __count_vm_event(PGDEACTIVATE); + if (PageWriteback(page) || PageDirty(page)) { + /* + * PG_reclaim could be raced with end_page_writeback + * It can make readahead confusing. But race window + * is _really_ small and it's non-critical problem. + */ + SetPageReclaim(page); + } else { + /* + * The page's writeback ends up during pagevec + * We moves tha page into tail of inactive. + */ + list_move_tail(&page->lru, &zone->lru[lru].list); + mem_cgroup_rotate_reclaimable_page(page); + __count_vm_event(PGROTATED); + } + + if (active) + __count_vm_event(PGDEACTIVATE); update_page_reclaim_stat(zone, page, file, 0); } -- cgit v1.2.3-70-g09d2 From cf15b07cf448e19dcb31a19f0cbaf898b08ce975 Mon Sep 17 00:00:00 2001 From: Jun'ichi Nomura Date: Tue, 22 Mar 2011 16:33:40 -0700 Subject: writeback: make mapping->writeback_index to point to the last written page For range-cyclic writeback (e.g. kupdate), the writeback code sets a continuation point of the next writeback to mapping->writeback_index which is set the page after the last written page. This happens so that we evenly write the whole file even if pages in it get continuously redirtied. However, in some cases, sequential writer is writing in the middle of the page and it just redirties the last written page by continuing from that. For example with an application which uses a file as a big ring buffer we see: [1st writeback session] ... flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898514 + 8 flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898522 + 8 flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898530 + 8 flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898538 + 8 flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898546 + 8 kworker/0:1-11 4571: block_rq_issue: 8,0 W 0 () 94898514 + 40 >> flush-8:0-2743 4571: block_bio_queue: 8,0 W 94898554 + 8 >> flush-8:0-2743 4571: block_rq_issue: 8,0 W 0 () 94898554 + 8 [2nd writeback session after 35sec] flush-8:0-2743 4606: block_bio_queue: 8,0 W 94898562 + 8 flush-8:0-2743 4606: block_bio_queue: 8,0 W 94898570 + 8 flush-8:0-2743 4606: block_bio_queue: 8,0 W 94898578 + 8 ... kworker/0:1-11 4606: block_rq_issue: 8,0 W 0 () 94898562 + 640 kworker/0:1-11 4606: block_rq_issue: 8,0 W 0 () 94899202 + 72 ... flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899962 + 8 flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899970 + 8 flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899978 + 8 flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899986 + 8 flush-8:0-2743 4606: block_bio_queue: 8,0 W 94899994 + 8 kworker/0:1-11 4606: block_rq_issue: 8,0 W 0 () 94899962 + 40 >> flush-8:0-2743 4606: block_bio_queue: 8,0 W 94898554 + 8 >> flush-8:0-2743 4606: block_rq_issue: 8,0 W 0 () 94898554 + 8 So we seeked back to 94898554 after we wrote all the pages at the end of the file. This extra seek seems unnecessary. If we continue writeback from the last written page, we can avoid it and do not cause harm to other cases. The original intent of even writeout over the whole file is preserved and if the page does not get redirtied pagevec_lookup_tag() just skips it. As an exceptional case, when I/O error happens, set done_index to the next page as the comment in the code suggests. Tested-by: Wu Fengguang Signed-off-by: Jun'ichi Nomura Signed-off-by: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page-writeback.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'mm/page-writeback.c') diff --git a/mm/page-writeback.c b/mm/page-writeback.c index b437fe6257b..632b46479c9 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -927,7 +927,7 @@ retry: break; } - done_index = page->index + 1; + done_index = page->index; lock_page(page); @@ -977,6 +977,7 @@ continue_unlock: * not be suitable for data integrity * writeout). */ + done_index = page->index + 1; done = 1; break; } -- cgit v1.2.3-70-g09d2