diff options
Diffstat (limited to 'Documentation/vm')
-rw-r--r-- | Documentation/vm/cleancache.txt | 43 | ||||
-rw-r--r-- | Documentation/vm/page-types.c | 2 | ||||
-rw-r--r-- | Documentation/vm/pagemap.txt | 4 | ||||
-rw-r--r-- | Documentation/vm/unevictable-lru.txt | 8 |
4 files changed, 32 insertions, 25 deletions
diff --git a/Documentation/vm/cleancache.txt b/Documentation/vm/cleancache.txt index 36c367c7308..142fbb0f325 100644 --- a/Documentation/vm/cleancache.txt +++ b/Documentation/vm/cleancache.txt @@ -46,10 +46,11 @@ a negative return value indicates failure. A "put_page" will copy a the pool id, a file key, and a page index into the file. (The combination of a pool id, a file key, and an index is sometimes called a "handle".) A "get_page" will copy the page, if found, from cleancache into kernel memory. -A "flush_page" will ensure the page no longer is present in cleancache; -a "flush_inode" will flush all pages associated with the specified file; -and, when a filesystem is unmounted, a "flush_fs" will flush all pages in -all files specified by the given pool id and also surrender the pool id. +An "invalidate_page" will ensure the page no longer is present in cleancache; +an "invalidate_inode" will invalidate all pages associated with the specified +file; and, when a filesystem is unmounted, an "invalidate_fs" will invalidate +all pages in all files specified by the given pool id and also surrender +the pool id. An "init_shared_fs", like init_fs, obtains a pool id but tells cleancache to treat the pool as shared using a 128-bit UUID as a key. On systems @@ -62,12 +63,12 @@ of the kernel (e.g. by "tools" that control cleancache). Or a cleancache implementation can simply disable shared_init by always returning a negative value. -If a get_page is successful on a non-shared pool, the page is flushed (thus -making cleancache an "exclusive" cache). On a shared pool, the page -is NOT flushed on a successful get_page so that it remains accessible to +If a get_page is successful on a non-shared pool, the page is invalidated +(thus making cleancache an "exclusive" cache). On a shared pool, the page +is NOT invalidated on a successful get_page so that it remains accessible to other sharers. The kernel is responsible for ensuring coherency between cleancache (shared or not), the page cache, and the filesystem, using -cleancache flush operations as required. +cleancache invalidate operations as required. Note that cleancache must enforce put-put-get coherency and get-get coherency. For the former, if two puts are made to the same handle but @@ -77,22 +78,22 @@ if a get for a given handle fails, subsequent gets for that handle will never succeed unless preceded by a successful put with that handle. Last, cleancache provides no SMP serialization guarantees; if two -different Linux threads are simultaneously putting and flushing a page +different Linux threads are simultaneously putting and invalidating a page with the same handle, the results are indeterminate. Callers must lock the page to ensure serial behavior. CLEANCACHE PERFORMANCE METRICS -Cleancache monitoring is done by sysfs files in the -/sys/kernel/mm/cleancache directory. The effectiveness of cleancache +If properly configured, monitoring of cleancache is done via debugfs in +the /sys/kernel/debug/mm/cleancache directory. The effectiveness of cleancache can be measured (across all filesystems) with: succ_gets - number of gets that were successful failed_gets - number of gets that failed puts - number of puts attempted (all "succeed") -flushes - number of flushes attempted +invalidates - number of invalidates attempted -A backend implementatation may provide additional metrics. +A backend implementation may provide additional metrics. FAQ @@ -143,7 +144,7 @@ systems. The core hooks for cleancache in VFS are in most cases a single line and the minimum set are placed precisely where needed to maintain -coherency (via cleancache_flush operations) between cleancache, +coherency (via cleancache_invalidate operations) between cleancache, the page cache, and disk. All hooks compile into nothingness if cleancache is config'ed off and turn into a function-pointer- compare-to-NULL if config'ed on but no backend claims the ops @@ -184,15 +185,15 @@ or for real kernel-addressable RAM, it makes perfect sense for transcendent memory. 4) Why is non-shared cleancache "exclusive"? And where is the - page "flushed" after a "get"? (Minchan Kim) + page "invalidated" after a "get"? (Minchan Kim) The main reason is to free up space in transcendent memory and -to avoid unnecessary cleancache_flush calls. If you want inclusive, +to avoid unnecessary cleancache_invalidate calls. If you want inclusive, the page can be "put" immediately following the "get". If put-after-get for inclusive becomes common, the interface could -be easily extended to add a "get_no_flush" call. +be easily extended to add a "get_no_invalidate" call. -The flush is done by the cleancache backend implementation. +The invalidate is done by the cleancache backend implementation. 5) What's the performance impact? @@ -222,7 +223,7 @@ Some points for a filesystem to consider: as tmpfs should not enable cleancache) - To ensure coherency/correctness, the FS must ensure that all file removal or truncation operations either go through VFS or - add hooks to do the equivalent cleancache "flush" operations + add hooks to do the equivalent cleancache "invalidate" operations - To ensure coherency/correctness, either inode numbers must be unique across the lifetime of the on-disk file OR the FS must provide an "encode_fh" function. @@ -243,11 +244,11 @@ If cleancache would use the inode virtual address instead of inode/filehandle, the pool id could be eliminated. But, this won't work because cleancache retains pagecache data pages persistently even when the inode has been pruned from the -inode unused list, and only flushes the data page if the file +inode unused list, and only invalidates the data page if the file gets removed/truncated. So if cleancache used the inode kva, there would be potential coherency issues if/when the inode kva is reused for a different file. Alternately, if cleancache -flushed the pages when the inode kva was freed, much of the value +invalidated the pages when the inode kva was freed, much of the value of cleancache would be lost because the cache of pages in cleanache is potentially much larger than the kernel pagecache and is most useful if the pages survive inode cache removal. diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 7445caa26d0..0b13f02d405 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -98,6 +98,7 @@ #define KPF_HWPOISON 19 #define KPF_NOPAGE 20 #define KPF_KSM 21 +#define KPF_THP 22 /* [32-] kernel hacking assistances */ #define KPF_RESERVED 32 @@ -147,6 +148,7 @@ static const char *page_flag_names[] = { [KPF_HWPOISON] = "X:hwpoison", [KPF_NOPAGE] = "n:nopage", [KPF_KSM] = "x:ksm", + [KPF_THP] = "t:thp", [KPF_RESERVED] = "r:reserved", [KPF_MLOCKED] = "m:mlocked", diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt index df09b9650a8..4600cbe3d6b 100644 --- a/Documentation/vm/pagemap.txt +++ b/Documentation/vm/pagemap.txt @@ -60,6 +60,7 @@ There are three components to pagemap: 19. HWPOISON 20. NOPAGE 21. KSM + 22. THP Short descriptions to the page flags: @@ -97,6 +98,9 @@ Short descriptions to the page flags: 21. KSM identical memory pages dynamically shared between one or more processes +22. THP + contiguous pages which construct transparent hugepages + [IO related page flags] 1. ERROR IO error occurred 3. UPTODATE page has up-to-date data diff --git a/Documentation/vm/unevictable-lru.txt b/Documentation/vm/unevictable-lru.txt index 97bae3c576c..fa206cccf89 100644 --- a/Documentation/vm/unevictable-lru.txt +++ b/Documentation/vm/unevictable-lru.txt @@ -538,7 +538,7 @@ different reverse map mechanisms. process because mlocked pages are migratable. However, for reclaim, if the page is mapped into a VM_LOCKED VMA, the scan stops. - try_to_unmap_anon() attempts to acquire in read mode the mmap semphore of + try_to_unmap_anon() attempts to acquire in read mode the mmap semaphore of the mm_struct to which the VMA belongs. If this is successful, it will mlock the page via mlock_vma_page() - we wouldn't have gotten to try_to_unmap_anon() if the page were already mlocked - and will return @@ -619,11 +619,11 @@ all PTEs from the page. For this purpose, the unevictable/mlock infrastructure introduced a variant of try_to_unmap() called try_to_munlock(). try_to_munlock() calls the same functions as try_to_unmap() for anonymous and -mapped file pages with an additional argument specifing unlock versus unmap +mapped file pages with an additional argument specifying unlock versus unmap processing. Again, these functions walk the respective reverse maps looking for VM_LOCKED VMAs. When such a VMA is found for anonymous pages and file pages mapped in linear VMAs, as in the try_to_unmap() case, the functions -attempt to acquire the associated mmap semphore, mlock the page via +attempt to acquire the associated mmap semaphore, mlock the page via mlock_vma_page() and return SWAP_MLOCK. This effectively undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page. @@ -641,7 +641,7 @@ with it - the usual fallback position. Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA. However, the scan can terminate when it encounters a VM_LOCKED VMA and can -successfully acquire the VMA's mmap semphore for read and mlock the page. +successfully acquire the VMA's mmap semaphore for read and mlock the page. Although try_to_munlock() might be called a great many times when munlocking a large region or tearing down a large address space that has been mlocked via mlockall(), overall this is a fairly rare event. |