Age | Commit message (Collapse) | Author |
|
The SOFT_DIRTY bit shows that the content of memory was changed after a
defined point in the past. mprotect() doesn't change the content of
memory, so it must not change the SOFT_DIRTY bit.
This bug causes a malfunction: on the first iteration all pages are
dumped. On other iterations only pages with the SOFT_DIRTY bit are
dumped. So if the SOFT_DIRTY bit is cleared from a page by mistake, the
page is not dumped and its content will be restored incorrectly.
This patch does nothing with _PAGE_SWP_SOFT_DIRTY, becase pte_modify()
is called only for present pages.
Fixes commit 0f8975ec4db2 ("mm: soft-dirty bits for user memory changes
tracking").
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit abca7c496584 ("mm: fix slab->page _count corruption when using
slub") notes that we can not _set_ a page->counters directly, except
when using a real double-cmpxchg. Doing so can lose updates to
->_count.
That is an absolute rule:
You may not *set* page->counters except via a cmpxchg.
Commit abca7c496584 fixed this for the folks who have the slub
cmpxchg_double code turned off at compile time, but it left the bad case
alone. It can still be reached, and the same bug triggered in two
cases:
1. Turning on slub debugging at runtime, which is available on
the distro kernels that I looked at.
2. On 64-bit CPUs with no CMPXCHG16B (some early AMD x86-64
cpus, evidently)
There are at least 3 ways we could fix this:
1. Take all of the exising calls to cmpxchg_double_slab() and
__cmpxchg_double_slab() and convert them to take an old, new
and target 'struct page'.
2. Do (1), but with the newly-introduced 'slub_data'.
3. Do some magic inside the two cmpxchg...slab() functions to
pull the counters out of new_counters and only set those
fields in page->{inuse,frozen,objects}.
I've done (2) as well, but it's a bunch more code. This patch is an
attempt at (3). This was the most straightforward and foolproof way
that I could think to do this.
This would also technically allow us to get rid of the ugly
#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
in 'struct page', but leaving it alone has the added benefit that
'counters' stays 'unsigned' instead of 'unsigned long', so all the
copies that the slub code does stay a bit smaller.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As a result of commit 5606e3877ad8 ("mm: numa: Migrate on reference
policy"), /proc/<pid>/numa_maps prints the mempolicy for any <pid> as
"prefer:N" for the local node, N, of the process reading the file.
This should only be printed when the mempolicy of <pid> is
MPOL_PREFERRED for node N.
If the process is actually only using the default mempolicy for local
node allocation, make sure "default" is printed as expected.
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Robert Lippert <rlippert@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org> [3.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Finally, we separated zram->lock dependency from 32bit stat/ table
handling so there is no reason to use rw_semaphore between read and
write path so this patch removes the lock from read path totally and
changes rw_semaphore with mutex. So, we could do
old:
read-read: OK
read-write: NO
write-write: NO
Now:
read-read: OK
read-write: OK
write-write: NO
The below data proves mixed workload performs well 11 times and there is
also enhance on write-write path because current rw-semaphore doesn't
support SPIN_ON_OWNER. It's side effect but anyway good thing for us.
Write-related tests perform better (from 61% to 1058%) but read path has
good/bad(from -2.22% to 1.45%) but they are all marginal within stddev.
CPU 12
iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0
==Initial write ==Initial write
records: 10 records: 10
avg: 516189.16 avg: 839907.96
std: 22486.53 (4.36%) std: 47902.17 (5.70%)
max: 546970.60 max: 909910.35
min: 481131.54 min: 751148.38
==Rewrite ==Rewrite
records: 10 records: 10
avg: 509527.98 avg: 1050156.37
std: 45799.94 (8.99%) std: 40695.44 (3.88%)
max: 611574.27 max: 1111929.26
min: 443679.95 min: 980409.62
==Read ==Read
records: 10 records: 10
avg: 4408624.17 avg: 4472546.76
std: 281152.61 (6.38%) std: 163662.78 (3.66%)
max: 4867888.66 max: 4727351.03
min: 4058347.69 min: 4126520.88
==Re-read ==Re-read
records: 10 records: 10
avg: 4462147.53 avg: 4363257.75
std: 283546.11 (6.35%) std: 247292.63 (5.67%)
max: 4912894.44 max: 4677241.75
min: 4131386.50 min: 4035235.84
==Reverse Read ==Reverse Read
records: 10 records: 10
avg: 4565865.97 avg: 4485818.08
std: 313395.63 (6.86%) std: 248470.10 (5.54%)
max: 5232749.16 max: 4789749.94
min: 4185809.62 min: 3963081.34
==Stride read ==Stride read
records: 10 records: 10
avg: 4515981.80 avg: 4418806.01
std: 211192.32 (4.68%) std: 212837.97 (4.82%)
max: 4889287.28 max: 4686967.22
min: 4210362.00 min: 4083041.84
==Random read ==Random read
records: 10 records: 10
avg: 4410525.23 avg: 4387093.18
std: 236693.22 (5.37%) std: 235285.23 (5.36%)
max: 4713698.47 max: 4669760.62
min: 4057163.62 min: 3952002.16
==Mixed workload ==Mixed workload
records: 10 records: 10
avg: 243234.25 avg: 2818677.27
std: 28505.07 (11.72%) std: 195569.70 (6.94%)
max: 288905.23 max: 3126478.11
min: 212473.16 min: 2484150.69
==Random write ==Random write
records: 10 records: 10
avg: 555887.07 avg: 1053057.79
std: 70841.98 (12.74%) std: 35195.36 (3.34%)
max: 683188.28 max: 1096125.73
min: 437299.57 min: 992481.93
==Pwrite ==Pwrite
records: 10 records: 10
avg: 501745.93 avg: 810363.09
std: 16373.54 (3.26%) std: 19245.01 (2.37%)
max: 518724.52 max: 833359.70
min: 464208.73 min: 765501.87
==Pread ==Pread
records: 10 records: 10
avg: 4539894.60 avg: 4457680.58
std: 197094.66 (4.34%) std: 188965.60 (4.24%)
max: 4877170.38 max: 4689905.53
min: 4226326.03 min: 4095739.72
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit a0c516cbfc74 ("zram: don't grab mutex in zram_slot_free_noity")
introduced free request pending code to avoid scheduling by mutex under
spinlock and it was a mess which made code lenghty and increased
overhead.
Now, we don't need zram->lock any more to free slot so this patch
reverts it and then, tb_lock should protect it.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently, the zram table is protected by zram->lock but it's rather
coarse-grained lock and it makes hard for scalibility.
Let's use own rwlock instead of depending on zram->lock. This patch
adds new locking so obviously, it would make slow but this patch is just
prepartion for removing coarse-grained rw_semaphore(ie, zram->lock)
which is hurdle about zram scalability.
Final patch in this patchset series will remove the lock from read-path
and change rw_semaphore with mutex in write path. With bonus, we could
drop pending slot free mess in next patch.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Some of fields in zram->stats are protected by zram->lock which is
rather coarse-grained so let's use atomic operation without explict
locking.
This patch is ready for removing dependency of zram->lock in read path
which is very coarse-grained rw_semaphore. Of course, this patch adds
new atomic operation so it might make slow but my 12CPU test couldn't
spot any regression. All gain/lose is marginal within stddev.
iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0
==Initial write ==Initial write
records: 50 records: 50
avg: 412875.17 avg: 415638.23
std: 38543.12 (9.34%) std: 36601.11 (8.81%)
max: 521262.03 max: 502976.72
min: 343263.13 min: 351389.12
==Rewrite ==Rewrite
records: 50 records: 50
avg: 416640.34 avg: 397914.33
std: 60798.92 (14.59%) std: 46150.42 (11.60%)
max: 543057.07 max: 522669.17
min: 304071.67 min: 316588.77
==Read ==Read
records: 50 records: 50
avg: 4147338.63 avg: 4070736.51
std: 179333.25 (4.32%) std: 223499.89 (5.49%)
max: 4459295.28 max: 4539514.44
min: 3753057.53 min: 3444686.31
==Re-read ==Re-read
records: 50 records: 50
avg: 4096706.71 avg: 4117218.57
std: 229735.04 (5.61%) std: 171676.25 (4.17%)
max: 4430012.09 max: 4459263.94
min: 2987217.80 min: 3666904.28
==Reverse Read ==Reverse Read
records: 50 records: 50
avg: 4062763.83 avg: 4078508.32
std: 186208.46 (4.58%) std: 172684.34 (4.23%)
max: 4401358.78 max: 4424757.22
min: 3381625.00 min: 3679359.94
==Stride read ==Stride read
records: 50 records: 50
avg: 4094933.49 avg: 4082170.22
std: 185710.52 (4.54%) std: 196346.68 (4.81%)
max: 4478241.25 max: 4460060.97
min: 3732593.23 min: 3584125.78
==Random read ==Random read
records: 50 records: 50
avg: 4031070.04 avg: 4074847.49
std: 192065.51 (4.76%) std: 206911.33 (5.08%)
max: 4356931.16 max: 4399442.56
min: 3481619.62 min: 3548372.44
==Mixed workload ==Mixed workload
records: 50 records: 50
avg: 149925.73 avg: 149675.54
std: 7701.26 (5.14%) std: 6902.09 (4.61%)
max: 191301.56 max: 175162.05
min: 133566.28 min: 137762.87
==Random write ==Random write
records: 50 records: 50
avg: 404050.11 avg: 393021.47
std: 58887.57 (14.57%) std: 42813.70 (10.89%)
max: 601798.09 max: 524533.43
min: 325176.99 min: 313255.34
==Pwrite ==Pwrite
records: 50 records: 50
avg: 411217.70 avg: 411237.96
std: 43114.99 (10.48%) std: 33136.29 (8.06%)
max: 530766.79 max: 471899.76
min: 320786.84 min: 317906.94
==Pread ==Pread
records: 50 records: 50
avg: 4154908.65 avg: 4087121.92
std: 151272.08 (3.64%) std: 219505.04 (5.37%)
max: 4459478.12 max: 4435857.38
min: 3730512.41 min: 3101101.67
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit a0c516cbfc74 ("zram: don't grab mutex in zram_slot_free_noity")
introduced pending zram slot free in zram's write path in case of
missing slot free by memory allocation failure in zram_slot_free_notify
but it is not necessary because we have already freed the slot right
before overwriting.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Sergey reported we don't need to handle pending free request every I/O
so that this patch removes it in read path while we remain it in write
path.
Let's consider below example.
Swap subsystem ask to zram "A" block free by swap_slot_free_notify but
zram had been pended it without real freeing. Swap subsystem allocates
"A" block for new data but request pended for a long time just handled
and zram blindly free new data on the "A" block. :(
That's why we couldn't remove handle pending free request right before
zram-write.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Dan and Sergey reported that there is a racy between reset and flushing
of pending work so that it could make oops by freeing zram->meta in
reset while zram_slot_free can access zram->meta if new request is
adding during the race window.
This patch moves flush after taking init_lock so it prevents new request
so that it closes the race.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
tAdd adds maintainer information for zsmalloc into the MAINTAINERS file.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add maintainer information for zram into the MAINTAINERS file.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add my copyright to the zsmalloc source code which I maintain.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add my copyright to the zram source code which I maintain.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove the old private compcache project address so upcoming patches
should be sent to LKML because we Linux kernel community will take care.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Zram has lived in staging for a LONG LONG time and have been
fixed/improved by many contributors so code is clean and stable now. Of
course, there are lots of product using zram in real practice.
The major TV companys have used zram as swap since two years ago and
recently our production team released android smart phone with zram
which is used as swap, too and recently Android Kitkat start to use zram
for small memory smart phone. And there was a report Google released
their ChromeOS with zram, too and cyanogenmod have been used zram long
time ago. And I heard some disto have used zram block device for tmpfs.
In addition, I saw many report from many other peoples. For example,
Lubuntu start to use it.
The benefit of zram is very clear. With my experience, one of the
benefit was to remove jitter of video application with backgroud memory
pressure. It would be effect of efficient memory usage by compression
but more issue is whether swap is there or not in the system. Recent
mobile platforms have used JAVA so there are many anonymous pages. But
embedded system normally are reluctant to use eMMC or SDCard as swap
because there is wear-leveling and latency issues so if we do not use
swap, it means we can't reclaim anoymous pages and at last, we could
encounter OOM kill. :(
Although we have real storage as swap, it was a problem, too. Because
it sometime ends up making system very unresponsible caused by slow swap
storage performance.
Quote from Luigi on Google
"Since Chrome OS was mentioned: the main reason why we don't use swap
to a disk (rotating or SSD) is because it doesn't degrade gracefully
and leads to a bad interactive experience. Generally we prefer to
manage RAM at a higher level, by transparently killing and restarting
processes. But we noticed that zram is fast enough to be competitive
with the latter, and it lets us make more efficient use of the
available RAM. " and he announced.
http://www.spinics.net/lists/linux-mm/msg57717.html
Other uses case is to use zram for block device. Zram is block device
so anyone can format the block device and mount on it so some guys on
the internet start zram as /var/tmp.
http://forums.gentoo.org/viewtopic-t-838198-start-0.html
Let's promote zram and enhance/maintain it instead of removing.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch moves zsmalloc under mm directory.
Before that, description will explain why we have needed custom
allocator.
Zsmalloc is a new slab-based memory allocator for storing compressed
pages. It is designed for low fragmentation and high allocation success
rate on large object, but <= PAGE_SIZE allocations.
zsmalloc differs from the kernel slab allocator in two primary ways to
achieve these design goals.
zsmalloc never requires high order page allocations to back slabs, or
"size classes" in zsmalloc terms. Instead it allows multiple
single-order pages to be stitched together into a "zspage" which backs
the slab. This allows for higher allocation success rate under memory
pressure.
Also, zsmalloc allows objects to span page boundaries within the zspage.
This allows for lower fragmentation than could be had with the kernel
slab allocator for objects between PAGE_SIZE/2 and PAGE_SIZE. With the
kernel slab allocator, if a page compresses to 60% of it original size,
the memory savings gained through compression is lost in fragmentation
because another object of the same size can't be stored in the leftover
space.
This ability to span pages results in zsmalloc allocations not being
directly addressable by the user. The user is given an
non-dereferencable handle in response to an allocation request. That
handle must be mapped, using zs_map_object(), which returns a pointer to
the mapped region that can be used. The mapping is necessary since the
object data may reside in two different noncontigious pages.
The zsmalloc fulfills the allocation needs for zram perfectly
[sjenning@linux.vnet.ibm.com: borrow Seth's quote]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After commit 9a46ad6d6df3 ("smp: make smp_call_function_many() use logic
similar to smp_call_function_single()"), cfd->cpumask is accessed only
in smp_call_function_many(). So there is no more need to copy it into
cfd->cpumask_ipi before putting csd into the list. The cpumask_ipi
field is obsolete and can be removed.
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Wang YanQing <udknight@gmail.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Make smp_call_function_single and friends more efficient by using a
lockless list.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
It is required to call put_device() if device_register() fails, so that
we give up the last reference to the device. Calling put_device allows
for mdiobus_release to be executed, kfreeing the bus.
Signed-off-by: Levente Kurusa <levex@linux.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: David Daney <david.daney@cavium.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently we kfree the container of the device which failed to register.
This is wrong as the last reference is not given up with a put_device
call. Also, now that we have put_device() callen, we no longer need the
kfree as the new_ld->dev.release function will take care of kfreeing the
associated memory.
Signed-off-by: Levente Kurusa <levex@linux.com>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Now we have memblock_virt_alloc_low to replace original bootmem api in
swiotlb.
But we should not use BOOTMEM_LOW_LIMIT for arch that does not support
CONFIG_NOBOOTMEM, as old api take 0.
| #define alloc_bootmem_low(x) \
| __alloc_bootmem_low(x, SMP_CACHE_BYTES, 0)
|#define alloc_bootmem_low_pages_nopanic(x) \
| __alloc_bootmem_low_nopanic(x, PAGE_SIZE, 0)
and we have
#define BOOTMEM_LOW_LIMIT __pa(MAX_DMA_ADDRESS)
for CONFIG_NOBOOTMEM.
Restore goal to 0 to fix ia64 crash, that Tony found.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reported-by: Tony Luck <tony.luck@gmail.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://gitorious.org/linux-can/linux-can
linux-can-fixes-for-3.14-20140129
Marc Kleine-Budde says:
====================
Arnd Bergmann provides a fix for the flexcan driver, enabling compilation on
all combinations of big and little endian on ARM and PowerPc. A patch by Ira W.
Snyder fixes uninitialized variable warnings in the janz-ican3 driver.
Rostislav Lisovy contributes a patch to propagate the SO_PRIORITY of raw
sockets to skbs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In https://bugzilla.redhat.com/show_bug.cgi?id=994438 and
https://bugzilla.redhat.com/show_bug.cgi?id=970480 we
received different reports of e100 throwing the following
warning:
[<c06a0ba5>] ? pci_disable_device+0x85/0x90
[<c044a153>] warn_slowpath_fmt+0x33/0x40
[<c06a0ba5>] pci_disable_device+0x85/0x90
[<f7fdf7e0>] __e100_shutdown+0x80/0x120 [e100]
[<c0476ca5>] ? check_preempt_curr+0x65/0x90
[<f7fdf8d6>] e100_suspend+0x16/0x30 [e100]
[<c06a1ebb>] pci_legacy_suspend+0x2b/0xb0
[<c098fc0f>] ? wait_for_completion+0x1f/0xd0
[<c06a2d50>] ? pci_pm_poweroff+0xb0/0xb0
[<c06a2de4>] pci_pm_freeze+0x94/0xa0
[<c0767bb7>] dpm_run_callback+0x37/0x80
[<c076a204>] ? pm_wakeup_pending+0xc4/0x140
[<c0767f12>] __device_suspend+0xb2/0x1f0
[<c076806f>] async_suspend+0x1f/0x90
[<c04706e5>] async_run_entry_fn+0x35/0x140
[<c0478aef>] ? wake_up_process+0x1f/0x40
[<c0464495>] process_one_work+0x115/0x370
[<c0462645>] ? start_worker+0x25/0x30
[<c0464dc5>] ? manage_workers.isra.27+0x1a5/0x250
[<c0464f6e>] worker_thread+0xfe/0x330
[<c0464e70>] ? manage_workers.isra.27+0x250/0x250
[<c046a224>] kthread+0x94/0xa0
[<c0997f37>] ret_from_kernel_thread+0x1b/0x28
[<c046a190>] ? insert_kthread_work+0x30/0x30
This patch removes pci_disable_device() from __e100_shutdown().
pci_clear_master() is enough.
Signed-off-by: Michele Baldessari <michele@acksyn.org>
Tested-by: Mark Harig <idirectscm@aim.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Further discussion here: http://marc.info/?l=linux-kernel&m=139073901101034&w=2
kbuild, 0day kernel build service, outputs the warning:
arch/x86/kernel/irq.c:333:1: warning: the frame size of 2056 bytes
is larger than 2048 bytes [-Wframe-larger-than=]
because check_irq_vectors_for_cpu_disable() allocates two cpumasks on the
stack. Fix this by moving the two cpumasks to a global file context.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1390915331-27375-1-git-send-email-prarit@redhat.com
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Yang Zhang <yang.z.zhang@Intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Janet Morgan <janet.morgan@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ruiv Wang <ruiv.wang@gmail.com>
Cc: Gong Chen <gong.chen@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
Self generated skbuffs in net/can/bcm.c are setting a skb->sk reference but
no explicit destructor which is enforced since Linux 3.11 with commit
376c7311bdb6 (net: add a temporary sanity check in skb_orphan()).
This patch adds some helper functions to make sure that a destructor is
properly defined when a sock reference is assigned to a CAN related skb.
To create an unshared skb owned by the original sock a common helper function
has been introduced to replace open coded functions to create CAN echo skbs.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Tested-by: Andre Naujoks <nautsch2@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The loop in vxlan_gro_receive() over the current set of candidates for
coalescing was wrongly aborted once a match was found. In rare cases,
this can cause a false-positives matching in the next layer GRO checks.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since udp_add_offload() can be called from non-sleepable context e.g
under this call tree from the vxlan driver use case:
vxlan_socket_create() <-- holds the spinlock
-> vxlan_notify_add_rx_port()
-> udp_add_offload() <-- schedules
we should allocate the udp_offloads structure in atomic manner.
Fixes: b582ef0 ('net: Add GRO support for UDP encapsulating protocols')
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the GPIO bitbang MDIO driver is used instead of the Davinci MDIO driver
we need to configure the phy_id string differently. Otherwise this string
looks like this "gpio.6" instead of "gpio-0" and the PHY is not found when
phy_connect() is called.
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Lukas Stockmann <lukas.stockmann@siemens.com>
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit e8b373326d8efcaf9ec1da8b618556c89bd5ffc4. Many xHCI
host controllers can only handle 32-bit addresses, and writing 64-bits
at a time causes them to fail. Reading 64-bits at a time may also cause
them to return 0xffffffff, so revert this commit as well.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
|
|
into for-linus
|
|
request_queue bypassing is used to suppress higher-level function of a
request_queue so that they can be switched, reconfigured and shut
down. A request_queue does the followings while bypassing.
* bypasses elevator and io_cq association and queues requests directly
to the FIFO dispatch queue.
* bypasses block cgroup request_list lookup and always uses the root
request_list.
Once confirmed to be bypassing, specific elevator and block cgroup
policy implementations can assume that nothing is in flight for them
and perform various operations which would be dangerous otherwise.
Such confirmation is acheived by short-circuiting all new requests
directly to the dispatch queue and waiting for all the requests which
were issued before to finish. Unfortunately, while the request
allocating and draining sides were properly handled, we forgot to
actually plug the request dispatch path. Even after bypassing mode is
confirmed, if the attached driver tries to fetch a request and the
dispatch queue is empty, __elv_next_request() would invoke the current
elevator's elevator_dispatch_fn() callback. As all in-flight requests
were drained, the elevator wouldn't contain any request but once
bypass is confirmed we don't even know whether the elevator is even
there. It might be in the process of being switched and half torn
down.
Frank Mayhar reports that this actually happened while switching
elevators, leading to an oops.
Let's fix it by making __elv_next_request() avoid invoking the
elevator_dispatch_fn() callback if the queue is bypassing. It already
avoids invoking the callback if the queue is dying. As a dying queue
is guaranteed to be bypassing, we can simply replace blk_queue_dying()
check with blk_queue_bypass().
Reported-by: Frank Mayhar <fmayhar@google.com>
References: http://lkml.kernel.org/g/1390319905.20232.38.camel@bobble.lax.corp.google.com
Cc: stable@vger.kernel.org
Tested-by: Frank Mayhar <fmayhar@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Reserving a tag (request) for flush to avoid dead lock is a overkill. A
tag is valuable resource. We can track the number of flush requests and
disallow having too many pending flush requests allocated. With this
patch, blk_mq_alloc_request_pinned() could do a busy nop (but not a dead
loop) if too many pending requests are allocated and new flush request
is allocated. But this should not be a problem, too many pending flush
requests are very rare case.
I verified this can fix the deadlock caused by too many pending flush
requests.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
steal_tags only happens when free tags is more than half of the total
tags. This is too strict and can cause live lock. I found that if one
cpu has free tags, but other cpu can't steal (thread is bound to
specific cpus), threads which want to allocate tags are always
sleeping. I found this when I run next patch, but this could happen
without it I think.
I did performance test too with null_blk. Two cases (each cpu has enough
percpu tags, or total tags are limited), no performance changes were
observed.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull block IO driver changes from Jens Axboe:
- bcache update from Kent Overstreet.
- two bcache fixes from Nicholas Swenson.
- cciss pci init error fix from Andrew.
- underflow fix in the parallel IDE pg_write code from Dan Carpenter.
I'm sure the 1 (or 0) users of that are now happy.
- two PCI related fixes for sx8 from Jingoo Han.
- floppy init fix for first block read from Jiri Kosina.
- pktcdvd error return miss fix from Julia Lawall.
- removal of IRQF_SHARED from the SEGA Dreamcast CD-ROM code from
Michael Opdenacker.
- comment typo fix for the loop driver from Olaf Hering.
- potential oops fix for null_blk from Raghavendra K T.
- two fixes from Sam Bradshaw (Micron) for the mtip32xx driver, fixing
an OOM problem and a problem with handling security locked conditions
* 'for-3.14/drivers' of git://git.kernel.dk/linux-block: (47 commits)
mg_disk: Spelling s/finised/finished/
null_blk: Null pointer deference problem in alloc_page_buffers
mtip32xx: Correctly handle security locked condition
mtip32xx: Make SGL container per-command to eliminate high order dma allocation
drivers/block/loop.c: fix comment typo in loop_config_discard
drivers/block/cciss.c:cciss_init_one(): use proper errnos
drivers/block/paride/pg.c: underflow bug in pg_write()
drivers/block/sx8.c: remove unnecessary pci_set_drvdata()
drivers/block/sx8.c: use module_pci_driver()
floppy: bail out in open() if drive is not responding to block0 read
bcache: Fix auxiliary search trees for key size > cacheline size
bcache: Don't return -EINTR when insert finished
bcache: Improve bucket_prio() calculation
bcache: Add bch_bkey_equal_header()
bcache: update bch_bkey_try_merge
bcache: Move insert_fixup() to btree_keys_ops
bcache: Convert sorting to btree_keys
bcache: Convert debug code to btree_keys
bcache: Convert btree_iter to struct btree_keys
bcache: Refactor bset_tree sysfs stats
...
|
|
Pull core block IO changes from Jens Axboe:
"The major piece in here is the immutable bio_ve series from Kent, the
rest is fairly minor. It was supposed to go in last round, but
various issues pushed it to this release instead. The pull request
contains:
- Various smaller blk-mq fixes from different folks. Nothing major
here, just minor fixes and cleanups.
- Fix for a memory leak in the error path in the block ioctl code
from Christian Engelmayer.
- Header export fix from CaiZhiyong.
- Finally the immutable biovec changes from Kent Overstreet. This
enables some nice future work on making arbitrarily sized bios
possible, and splitting more efficient. Related fixes to immutable
bio_vecs:
- dm-cache immutable fixup from Mike Snitzer.
- btrfs immutable fixup from Muthu Kumar.
- bio-integrity fix from Nic Bellinger, which is also going to stable"
* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
xtensa: fixup simdisk driver to work with immutable bio_vecs
block/blk-mq-cpu.c: use hotcpu_notifier()
blk-mq: for_each_* macro correctness
block: Fix memory leak in rw_copy_check_uvector() handling
bio-integrity: Fix bio_integrity_verify segment start bug
block: remove unrelated header files and export symbol
blk-mq: uses page->list incorrectly
blk-mq: use __smp_call_function_single directly
btrfs: fix missing increment of bi_remaining
Revert "block: Warn and free bio if bi_end_io is not set"
block: Warn and free bio if bi_end_io is not set
blk-mq: fix initializing request's start time
block: blk-mq: don't export blk_mq_free_queue()
block: blk-mq: make blk_sync_queue support mq
block: blk-mq: support draining mq queue
dm cache: increment bi_remaining when bi_end_io is restored
block: fixup for generic bio chaining
block: Really silence spurious compiler warnings
block: Silence spurious compiler warnings
block: Kill bio_pair_split()
...
|
|
Pull nfsd updates from Bruce Fields:
- Handle some loose ends from the vfs read delegation support.
(For example nfsd can stop breaking leases on its own in a
fewer places where it can now depend on the vfs to.)
- Make life a little easier for NFSv4-only configurations
(thanks to Kinglong Mee).
- Fix some gss-proxy problems (thanks Jeff Layton).
- miscellaneous bug fixes and cleanup
* 'for-3.14' of git://linux-nfs.org/~bfields/linux: (38 commits)
nfsd: consider CLAIM_FH when handing out delegation
nfsd4: fix delegation-unlink/rename race
nfsd4: delay setting current_fh in open
nfsd4: minor nfs4_setlease cleanup
gss_krb5: use lcm from kernel lib
nfsd4: decrease nfsd4_encode_fattr stack usage
nfsd: fix encode_entryplus_baggage stack usage
nfsd4: simplify xdr encoding of nfsv4 names
nfsd4: encode_rdattr_error cleanup
nfsd4: nfsd4_encode_fattr cleanup
minor svcauth_gss.c cleanup
nfsd4: better VERIFY comment
nfsd4: break only delegations when appropriate
NFSD: Fix a memory leak in nfsd4_create_session
sunrpc: get rid of use_gssp_lock
sunrpc: fix potential race between setting use_gss_proxy and the upcall rpc_clnt
sunrpc: don't wait for write before allowing reads from use-gss-proxy file
nfsd: get rid of unused function definition
Define op_iattr for nfsd4_open instead using macro
NFSD: fix compile warning without CONFIG_NFSD_V3
...
|
|
Fix
drivers/char/ipmi/ipmi_si_intf.c: In function 'ipmi_parisc_probe':
drivers/char/ipmi/ipmi_si_intf.c:2752:2: error: 'rv' undeclared (first use in this function)
drivers/char/ipmi/ipmi_si_intf.c:2752:2: note: each undeclared identifier is reported only once for each function it appears in
Introduced by commit d02b3709ff8e ("ipmi: Cleanup error return")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Chris Mason reported a NULL pointer derefernence in generic_getxattr()
that was due to sb->s_xattr being NULL.
The reason is that the nfs #ifdef's for ACL support were misplaced, and
the nfs3 inode operations had the xattr operation pointers set up, even
though xattrs were not actually supported. As a result, the xattr code
was being called without the infrastructure having been set up.
Move the #ifdef's appropriately.
Reported-and-tested-by: Chris Mason <clm@fb.com>
Acked-by: Al Viro viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We had a bug that prevented us from removing a station
after we entered the drain flow:
We assign sta to be NULL if it was an error value.
Then we tested it against -EBUSY, but forget to retrieve
the value again from mvm->fw_id_to_mac_id[sta_id].
Due to this bug, we ended up never removing the STA from
the firmware. This led to an firmware assert when we remove
the GO vif.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
Configure scheduled scan to notify match found on every beacon
or probe response if the scan request doesn't contain valid ssid
list for filtering.
Without this configuration the FW passes all beacons to the host
but doesn't notify the stack that the scan results are ready for
processing.
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Reviewed-by: Alexander Bondar <alexander.bondar@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
Add 6 new HW IDs for the 7265 series.
Cc: <stable@vger.kernel.org> [3.13]
Signed-off-by: Oren Givon <oren.givon@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
This can be useful to be able to spot the firmware version
from the error reports without needing to fetch it from
another place.
Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
The iwlwifi scheduled scan implementation doesn't adhere to the
userspace API correctly - the API assumes that any new incoming
'incompatible' request (like scan or remain-on-channel for this
driver) will just cancel the scheduled scan. Instead our driver
relies on userspace cancelling it, thus breaking existing wpa_s
versions.
Cc: stable@vger.kernel.org [3.13]
Fixes: 35a000b7c1bb ("iwlwifi: mvm: support sched scan if supported by the fw")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
The address pointer used in the function shouldn't be static
since it's local data only. Having it static causes races if
a single machine has two devices, as the pointer would be
shared between instances.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
|
|
Atm we setup the HW panel power sequencer logic both for eDP and DP
ports. On eDP we then go on and start the power on sequence and commence
with link training when it's ready. On DP we don't do the power on
sequencing but do the link training immediately. At this point the DP
PHY block gets stuck, since - supposedly - it is waiting for the power
on sequence to finish. The actual register write that seems to hold off
the PHY is PIPEX_PP_ON_DELAYS[Panel Control Port Select]. Writing here
a non-0 value eventually sets PIPEX_PP_STATUS[Require Asset Status] to
1 and blocks the PHY until the panel power on is ready.
Fix this by not doing any PP sequencing setup for DP ports.
Thanks to Ville Syrjälä, Jesse Barnes and Todd Previte for the help in
tracking this down.
Note that on older gmch platforms (where we have lvds instead of edp)
we've hacked around this by writing the magic ABCD unlock key to PP
registers, which disables the hw sanity checks.
For edp all platforms thus far had the pch split, with the edp port in
the north display complex and the PP registers on the pch the hw
sanity checks (expressed through the "Require Asset Status" bit) was
never functional, hence never a real issue.
This regression has been introduce in
commit bf13e81b904a37d94d83dd6c3b53a147719a3ead
Author: Jani Nikula <jani.nikula@intel.com>
Date: Fri Sep 6 07:40:05 2013 +0300
drm/i915: add support for per-pipe power sequencing on vlv
Signed-off-by: Imre Deak <imre.deak@intel.com>
[danvet: Add note about the bigger story here.]
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
Signed-off-by: Peter Rosin <peda@lysator.liu.se>
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
Both clang 3.5 and GCC 4.9 will support this (as of r199754 and r207196
respectively). Both have been tested to produce booting kernels when the
16-bit code is built with -m16. (Modulo LLVM PR3997, at least.)
[ hpa: folded test for -m16 into M16_CFLAGS ]
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Link: http://lkml.kernel.org/r/1390997807.20153.133.camel@i7.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
Commit dd78b97367bd575918204cc89107c1479d3fc1a7 ("x86, boot: Move CPU
flags out of cpucheck") introduced ambiguous inline asm in the
has_eflag() function. In 16-bit mode want the instruction to be
'pushfl', but we just say 'pushf' and hope the compiler does what we
wanted.
When building with 'clang -m16', it won't, because clang doesn't use
the horrid '.code16gcc' hack that even 'gcc -m16' uses internally.
Say what we mean and don't make the compiler make assumptions.
[ hpa: ideally we would be able to use the gcc %zN construct here, but
that is broken for 64-bit integers in gcc < 4.5.
The code with plain "pushf/popf" is fine for 32- or 64-bit mode, but
not for 16-bit mode; in 16-bit mode those are 16-bit instructions in
.code16 mode, and 32-bit instructions in .code16gcc mode. ]
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Link: http://lkml.kernel.org/r/1391079628.26079.82.camel@shinybook.infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|