Age | Commit message (Collapse) | Author |
|
Provide a function that does not just delete an entry at a given index,
but also allows passing in an expected item. Delete only if that item
is still located at the specified index.
This is handy when lockless tree traversals want to delete entries as
well because they don't have to do an second, locked lookup to verify
the slot has not changed under them before deleting the entry.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Summary:
The VM maintains cached filesystem pages on two types of lists. One
list holds the pages recently faulted into the cache, the other list
holds pages that have been referenced repeatedly on that first list.
The idea is to prefer reclaiming young pages over those that have shown
to benefit from caching in the past. We call the recently used list
"inactive list" and the frequently used list "active list".
Currently, the VM aims for a 1:1 ratio between the lists, which is the
"perfect" trade-off between the ability to *protect* frequently used
pages and the ability to *detect* frequently used pages. This means
that working set changes bigger than half of cache memory go undetected
and thrash indefinitely, whereas working sets bigger than half of cache
memory are unprotected against used-once streams that don't even need
caching.
This happens on file servers and media streaming servers, where the
popular files and file sections change over time. Even though the
individual files might be smaller than half of memory, concurrent access
to many of them may still result in their inter-reference distance being
greater than half of memory. It's also been reported as a problem on
database workloads that switch back and forth between tables that are
bigger than half of memory. In these cases the VM never recognizes the
new working set and will for the remainder of the workload thrash disk
data which could easily live in memory.
Historically, every reclaim scan of the inactive list also took a
smaller number of pages from the tail of the active list and moved them
to the head of the inactive list. This model gave established working
sets more gracetime in the face of temporary use-once streams, but
ultimately was not significantly better than a FIFO policy and still
thrashed cache based on eviction speed, rather than actual demand for
cache.
This series solves the problem by maintaining a history of pages evicted
from the inactive list, enabling the VM to detect frequently used pages
regardless of inactive list size and facilitate working set transitions.
Tests:
The reported database workload is easily demonstrated on a 8G machine
with two filesets a 6G. This fio workload operates on one set first,
then switches to the other. The VM should obviously always cache the
set that the workload is currently using.
This test is based on a problem encountered by Citus Data customers:
http://citusdata.com/blog/72-linux-memory-manager-and-your-big-data
unpatched:
db1: READ: io=98304MB, aggrb=885559KB/s, minb=885559KB/s, maxb=885559KB/s, mint= 113672msec, maxt= 113672msec
db2: READ: io=98304MB, aggrb= 66169KB/s, minb= 66169KB/s, maxb= 66169KB/s, mint=1521302msec, maxt=1521302msec
sdb: ios=835750/4, merge=2/1, ticks=4659739/60016, in_queue=4719203, util=98.92%
real 27m15.541s
user 0m19.059s
sys 0m51.459s
patched:
db1: READ: io=98304MB, aggrb=877783KB/s, minb=877783KB/s, maxb=877783KB/s, mint=114679msec, maxt=114679msec
db2: READ: io=98304MB, aggrb=397449KB/s, minb=397449KB/s, maxb=397449KB/s, mint=253273msec, maxt=253273msec
sdb: ios=170587/4, merge=2/1, ticks=954910/61123, in_queue=1015923, util=90.40%
real 6m8.630s
user 0m14.714s
sys 0m31.233s
As can be seen, the unpatched kernel simply never adapts to the
workingset change and db2 is stuck indefinitely with secondary storage
speed. The patched kernel needs 2-3 iterations over db2 before it
replaces db1 and reaches full memory speed. Given the unbounded
negative affect of the existing VM behavior, these patches should be
considered correctness fixes rather than performance optimizations.
Another test resembles a fileserver or streaming server workload, where
data in excess of memory size is accessed at different frequencies.
There is very hot data accessed at a high frequency. Machines should be
fitted so that the hot set of such a workload can be fully cached or all
bets are off. Then there is a very big (compared to available memory)
set of data that is used-once or at a very low frequency; this is what
drives the inactive list and does not really benefit from caching.
Lastly, there is a big set of warm data in between that is accessed at
medium frequencies and benefits from caching the pages between the first
and last streamer of each burst.
unpatched:
hot: READ: io=128000MB, aggrb=160693KB/s, minb=160693KB/s, maxb=160693KB/s, mint=815665msec, maxt=815665msec
warm: READ: io= 81920MB, aggrb=109853KB/s, minb= 27463KB/s, maxb= 29244KB/s, mint=717110msec, maxt=763617msec
cold: READ: io= 30720MB, aggrb= 35245KB/s, minb= 35245KB/s, maxb= 35245KB/s, mint=892530msec, maxt=892530msec
sdb: ios=797960/4, merge=11763/1, ticks=4307910/796, in_queue=4308380, util=100.00%
patched:
hot: READ: io=128000MB, aggrb=160678KB/s, minb=160678KB/s, maxb=160678KB/s, mint=815740msec, maxt=815740msec
warm: READ: io= 81920MB, aggrb=147747KB/s, minb= 36936KB/s, maxb= 40960KB/s, mint=512000msec, maxt=567767msec
cold: READ: io= 30720MB, aggrb= 40960KB/s, minb= 40960KB/s, maxb= 40960KB/s, mint=768000msec, maxt=768000msec
sdb: ios=596514/4, merge=9341/1, ticks=2395362/997, in_queue=2396484, util=79.18%
In both kernels, the hot set is propagated to the active list and then
served from cache.
In both kernels, the beginning of the warm set is propagated to the
active list as well, but in the unpatched case the active list
eventually takes up half of memory and no new pages from the warm set
get activated, despite repeated access, and despite most of the active
list soon being stale. The patched kernel on the other hand detects the
thrashing and manages to keep this cache window rolling through the data
set. This frees up enough IO bandwidth that the cold set is served at
full speed as well and disk utilization even drops by 20%.
For reference, this same test was performed with the traditional
demotion mechanism, where deactivation is coupled to inactive list
reclaim. However, this had the same outcome as the unpatched kernel:
while the warm set does indeed get activated continuously, it is forced
out of the active list by inactive list pressure, which is dictated
primarily by the unrelated cold set. The warm set is evicted before
subsequent streamers can benefit from it, even though there would be
enough space available to cache the pages of interest.
Costs:
Page reclaim used to shrink the radix trees but now the tree nodes are
reused for shadow entries, where the cost depends heavily on the page
cache access patterns. However, with workloads that maintain spatial or
temporal locality, the shadow entries are either refaulted quickly or
reclaimed along with the inode object itself. Workloads that will
experience a memory cost increase are those that don't really benefit
from caching in the first place.
A more predictable alternative would be a fixed-cost separate pool of
shadow entries, but this would incur relatively higher memory cost for
well-behaved workloads at the benefit of cornercases. It would also
make the shadow entry lookup more costly compared to storing them
directly in the cache structure.
Future:
To simplify the merging process, this patch set is implementing thrash
detection on a global per-zone level only for now, but the design is
such that it can be extended to memory cgroups as well. All we need to
do is store the unique cgroup ID along the node and zone identifier
inside the eviction cookie to identify the lruvec.
Right now we have a fixed ratio (50:50) between inactive and active list
but we already have complaints about working sets exceeding half of
memory being pushed out of the cache by simple streaming in the
background. Ultimately, we want to adjust this ratio and allow for a
much smaller inactive list. These patches are an essential step in this
direction because they decouple the VMs ability to detect working set
changes from the inactive list size. This would allow us to base the
inactive list size on the combined readahead window size for example and
potentially protect a much bigger working set.
It's also a big step towards activating pages with a reuse distance
larger than memory, as long as they are the most frequently used pages
in the workload. This will require knowing more about the access
frequency of active pages than what we measure right now, so it's also
deferred in this series.
Another possibility of having thrashing information would be to revisit
the idea of local reclaim in the form of zero-config memory control
groups. Instead of having allocating tasks go straight to global
reclaim, they could try to reclaim the pages in the memcg they are part
of first as long as the group is not thrashing. This would allow a user
to drop e.g. a back-up job in an otherwise unconfigured memcg and it
would only inflate (and possibly do global reclaim) until it has enough
memory to do proper readahead. But once it reaches that point and stops
thrashing it would just recycle its own used-once pages without kicking
out the cache of any other tasks in the system more than necessary.
This patch (of 10):
Fengguang Wu's build testing spotted problems with inc_zone_state() and
dec_zone_state() on UP configurations in out-of-tree patches.
inc_zone_state() is declared but not defined, dec_zone_state() is
missing entirely.
Just like with *_zone_page_state(), they can be defined like their
preemption-unsafe counterparts on UP.
[akpm@linux-foundation.org: make it build]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is a race condition if we map a same file on different processes.
Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex.
When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only
mmap_sem (exclusively). This doesn't prevent other tasks from modifying
the region structure, so it can be modified by two processes
concurrently.
To solve this, introduce a spinlock to resv_map and make region
manipulation function grab it before they do actual work.
[davidlohr@hp.com: updated changelog]
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently, to track reserved and allocated regions, we use two different
ways, depending on the mapping. For MAP_SHARED, we use
address_mapping's private_list and, while for MAP_PRIVATE, we use a
resv_map.
Now, we are preparing to change a coarse grained lock which protect a
region structure to fine grained lock, and this difference hinder it.
So, before changing it, unify region structure handling, consistently
using a resv_map regardless of the kind of mapping.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since put_mems_allowed() is strictly optional, its a seqcount retry, we
don't need to evaluate the function if the allocation was in fact
successful, saving a smp_rmb some loads and comparisons on some relative
fast-paths.
Since the naming, get/put_mems_allowed() does suggest a mandatory
pairing, rename the interface, as suggested by Mel, to resemble the
seqcount interface.
This gives us: read_mems_allowed_begin() and read_mems_allowed_retry(),
where it is important to note that the return value of the latter call
is inverted from its previous incarnation.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Provide dqgrab() function to get quota structure reference when we are
sure it already has at least one active reference. Make use of this
function inside quota code.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
access_mutex is used only to guard operations on access_list. There's
no need for sleeping within this lock so just make a spinlock out of it.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove kmemleak_padding() and kmemleak_release().
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After commit 839a8e8660b6 ("writeback: replace custom worker pool
implementation with unbound workqueue") when device is removed while we
are writing to it we crash in bdi_writeback_workfn() ->
set_worker_desc() because bdi->dev is NULL.
This can happen because even though bdi_unregister() cancels all pending
flushing work, nothing really prevents new ones from being queued from
balance_dirty_pages() or other places.
Fix the problem by clearing BDI_registered bit in bdi_unregister() and
checking it before scheduling of any flushing work.
Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Derek Basehore <dbasehore@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
drm-next
* 'msm-next' of git://people.freedesktop.org/~robclark/linux:
drm/omap: Don't dereference list head when the connectors list is empty
drm/msm/mdp: add timeout for irq wait
drm/msm: validate flags, etc
drm/msm: use componentised device support
drm/msm: add chip-id param
drm/msm: crank down gpu when inactive
drm/msm: spin helper
drm/msm: add hang_debug module param
drm/msm: hdmi audio support
|
|
"len" contains sizeof(nf_ct_ext) and size of extensions. In a worst
case it can contain all extensions. Bellow you can find sizes for all
types of extensions. Their sum is definitely bigger than 256.
nf_ct_ext_types[0]->len = 24
nf_ct_ext_types[1]->len = 32
nf_ct_ext_types[2]->len = 24
nf_ct_ext_types[3]->len = 32
nf_ct_ext_types[4]->len = 152
nf_ct_ext_types[5]->len = 2
nf_ct_ext_types[6]->len = 16
nf_ct_ext_types[7]->len = 8
I have seen "len" up to 280 and my host has crashes w/o this patch.
The right way to fix this problem is reducing the size of the ecache
extension (4) and Florian is going to do this, but these changes will
be quite large to be appropriate for a stable tree.
Fixes: 5b423f6a40a0 (netfilter: nf_conntrack: fix racy timer handling with reliable)
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pull VFIO updates from Alex Williamson:
"VFIO updates for v3.15 include:
- Allow the vfio-type1 IOMMU to support multiple domains within a
container
- Plumb path to query whether all domains are cache-coherent
- Wire query into kvm-vfio device to avoid KVM x86 WBINVD emulation
- Always select CONFIG_ANON_INODES, vfio depends on it (Arnd)
The first patch also makes the vfio-type1 IOMMU driver completely
independent of the bus_type of the devices it's handling, which
enables it to be used for both vfio-pci and a future vfio-platform
(and hopefully combinations involving both simultaneously)"
* tag 'vfio-v3.15-rc1' of git://github.com/awilliam/linux-vfio:
vfio: always select ANON_INODES
kvm/vfio: Support for DMA coherent IOMMUs
vfio: Add external user check extension interface
vfio/type1: Add extension to test DMA cache coherence of IOMMU
vfio/iommu_type1: Multi-IOMMU domain support
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen features and fixes from David Vrabel:
"Support PCI devices with multiple MSIs, performance improvement for
kernel-based backends (by not populated m2p overrides when mapping),
and assorted minor bug fixes and cleanups"
* tag 'stable/for-linus-3.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/acpi-processor: fix enabling interrupts on syscore_resume
xen/grant-table: Refactor gnttab_[un]map_refs to avoid m2p_override
xen: remove XEN_PRIVILEGED_GUEST
xen: add support for MSI message groups
xen-pciback: Use pci_enable_msix_exact() instead of pci_enable_msix()
xen/xenbus: remove unused xenbus_bind_evtchn()
xen/events: remove unnecessary call to bind_evtchn_to_cpu()
xen/events: remove the unused resend_irq_on_evtchn()
drivers:xen-selfballoon:reset 'frontswap_inertia_counter' after frontswap_shrink
drivers: xen: Include appropriate header file in pcpu.c
drivers: xen: Mark function as static in platform-pci.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"A lot updates for cgroup:
- The biggest one is cgroup's conversion to kernfs. cgroup took
after the long abandoned vfs-entangled sysfs implementation and
made it even more convoluted over time. cgroup's internal objects
were fused with vfs objects which also brought in vfs locking and
object lifetime rules. Naturally, there are places where vfs rules
don't fit and nasty hacks, such as credential switching or lock
dance interleaving inode mutex and cgroup_mutex with object serial
number comparison thrown in to decide whether the operation is
actually necessary, needed to be employed.
After conversion to kernfs, internal object lifetime and locking
rules are mostly isolated from vfs interactions allowing shedding
of several nasty hacks and overall simplification. This will also
allow implmentation of operations which may affect multiple cgroups
which weren't possible before as it would have required nesting
i_mutexes.
- Various simplifications including dropping of module support,
easier cgroup name/path handling, simplified cgroup file type
handling and task_cg_lists optimization.
- Prepatory changes for the planned unified hierarchy, which is still
a patchset away from being actually operational. The dummy
hierarchy is updated to serve as the default unified hierarchy.
Controllers which aren't claimed by other hierarchies are
associated with it, which BTW was what the dummy hierarchy was for
anyway.
- Various fixes from Li and others. This pull request includes some
patches to add missing slab.h to various subsystems. This was
triggered xattr.h include removal from cgroup.h. cgroup.h
indirectly got included a lot of files which brought in xattr.h
which brought in slab.h.
There are several merge commits - one to pull in kernfs updates
necessary for converting cgroup (already in upstream through
driver-core), others for interfering changes in the fixes branch"
* 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits)
cgroup: remove useless argument from cgroup_exit()
cgroup: fix spurious lockdep warning in cgroup_exit()
cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c
cgroup: break kernfs active_ref protection in cgroup directory operations
cgroup: fix cgroup_taskset walking order
cgroup: implement CFTYPE_ONLY_ON_DFL
cgroup: make cgrp_dfl_root mountable
cgroup: drop const from @buffer of cftype->write_string()
cgroup: rename cgroup_dummy_root and related names
cgroup: move ->subsys_mask from cgroupfs_root to cgroup
cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding
cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}()
cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root
cgroup: reorganize cgroup bootstrapping
cgroup: relocate setting of CGRP_DEAD
cpuset: use rcu_read_lock() to protect task_cs()
cgroup_freezer: document freezer_fork() subtleties
cgroup: update cgroup_transfer_tasks() to either succeed or fail
cgroup: drop task_lock() protection around task->cgroups
cgroup: update how a newly forked task gets associated with css_set
...
|
|
Currently there is no way how to find out if a device supports busy
polling. So add a feature and make it dependent on ndo_busy_poll
existence.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, in packet_direct_xmit() we test the assigned netdevice queue
for netif_xmit_frozen_or_stopped() before doing an ndo_start_xmit().
This can have the side-effect that BQL enabled drivers which make use
of netdev_tx_sent_queue() internally, set __QUEUE_STATE_STACK_XOFF from
within the stack and would not fully fill the device's TX ring from
packet sockets with PACKET_QDISC_BYPASS enabled.
Instead, use a test without BQL bit so that bursts can be absorbed
into the NICs TX ring. Fix and code suggested by Eric Dumazet, thanks!
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Most of the changes were largely clean ups, and some documentation.
But there were a few features that were added:
Uprobes now work with event triggers and multi buffers and have
support under ftrace and perf.
The big feature is that the function tracer can now be used within the
multi buffer instances. That is, you can now trace some functions in
one buffer, others in another buffer, all functions in a third buffer
and so on. They are basically agnostic from each other. This only
works for the function tracer and not for the function graph trace,
although you can have the function graph tracer running in the top
level buffer (or any tracer for that matter) and have different
function tracing going on in the sub buffers"
* tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits)
tracing: Add BUG_ON when stack end location is over written
tracepoint: Remove unused API functions
Revert "tracing: Move event storage for array from macro to standalone function"
ftrace: Constify ftrace_text_reserved
tracepoints: API doc update to tracepoint_probe_register() return value
tracepoints: API doc update to data argument
ftrace: Fix compilation warning about control_ops_free
ftrace/x86: BUG when ftrace recovery fails
ftrace: Warn on error when modifying ftrace function
ftrace: Remove freelist from struct dyn_ftrace
ftrace: Do not pass data to ftrace_dyn_arch_init
ftrace: Pass retval through return in ftrace_dyn_arch_init()
ftrace: Inline the code from ftrace_dyn_table_alloc()
ftrace: Cleanup of global variables ftrace_new_pgs and ftrace_update_cnt
tracing: Evaluate len expression only once in __dynamic_array macro
tracing: Correctly expand len expressions from __dynamic_array macro
tracing/module: Replace include of tracepoint.h with jump_label.h in module.h
tracing: Fix event header migrate.h to include tracepoint.h
tracing: Fix event header writeback.h to include tracepoint.h
tracing: Warn if a tracepoint is not set via debugfs
...
|
|
Pull crypto updates from Herbert Xu:
"Here is the crypto update for 3.15:
- Added 3DES driver for OMAP4/AM43xx
- Added AVX2 acceleration for SHA
- Added hash-only AEAD algorithms in caam
- Removed tegra driver as it is not functioning and the hardware is
too slow
- Allow blkcipher walks over AEAD (needed for ARM)
- Fixed unprotected FPU/SSE access in ghash-clmulni-intel
- Fixed highmem crash in omap-sham
- Add (zero entropy) randomness when initialising hardware RNGs
- Fixed unaligned ahash comletion functions
- Added soft module depedency for crc32c for initrds that use crc32c"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (60 commits)
crypto: ghash-clmulni-intel - use C implementation for setkey()
crypto: x86/sha1 - reduce size of the AVX2 asm implementation
crypto: x86/sha1 - fix stack alignment of AVX2 variant
crypto: x86/sha1 - re-enable the AVX variant
crypto: sha - SHA1 transform x86_64 AVX2
crypto: crypto_wq - Fix late crypto work queue initialization
crypto: caam - add missing key_dma unmap
crypto: caam - add support for aead null encryption
crypto: testmgr - add aead null encryption test vectors
crypto: export NULL algorithms defines
crypto: caam - remove error propagation handling
crypto: hash - Simplify the ahash_finup implementation
crypto: hash - Pull out the functions to save/restore request
crypto: hash - Fix the pointer voodoo in unaligned ahash
crypto: caam - Fix first parameter to caam_init_rng
crypto: omap-sham - Map SG pages if they are HIGHMEM before accessing
crypto: caam - Dynamic memory allocation for caam_rng_ctx object
crypto: allow blkcipher walks over AEAD data
crypto: remove direct blkcipher_walk dependency on transform
hwrng: add randomness to system from rng sources
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
"Apart from reordering the SELinux mmap code to ensure DAC is called
before MAC, these are minor maintenance updates"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
selinux: correctly label /proc inodes in use before the policy is loaded
selinux: put the mmap() DAC controls before the MAC controls
selinux: fix the output of ./scripts/get_maintainer.pl for SELinux
evm: enable key retention service automatically
ima: skip memory allocation for empty files
evm: EVM does not use MD5
ima: return d_name.name if d_path fails
integrity: fix checkpatch errors
ima: fix erroneous removal of security.ima xattr
security: integrity: Use a more current logging style
MAINTAINERS: email updates and other misc. changes
ima: reduce memory usage when a template containing the n field is used
ima: restore the original behavior for sending data with ima template
Integrity: Pass commname via get_task_comm()
fs: move i_readcount
ima: use static const char array definitions
security: have cap_dentry_init_security return error
ima: new helper: file_inode(file)
kernel: Mark function as static in kernel/seccomp.c
capability: Use current logging styles
...
|
|
'ocrdma', 'qib', 'sgwrapper', 'srp' and 'usnic' into for-next
|
|
Separate HW initialization from device creation.
This is needed for suspend/resume support.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Pull networking updates from David Miller:
"Here is my initial pull request for the networking subsystem during
this merge window:
1) Support for ESN in AH (RFC 4302) from Fan Du.
2) Add full kernel doc for ethtool command structures, from Ben
Hutchings.
3) Add BCM7xxx PHY driver, from Florian Fainelli.
4) Export computed TCP rate information in netlink socket dumps, from
Eric Dumazet.
5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
Dichtel.
6) Convert many drivers to pci_enable_msix_range(), from Alexander
Gordeev.
7) Record SKB timestamps more efficiently, from Eric Dumazet.
8) Switch to microsecond resolution for TCP round trip times, also
from Eric Dumazet.
9) Clean up and fix 6lowpan fragmentation handling by making use of
the existing inet_frag api for it's implementation.
10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.
11) Auto size SKB lengths when composing netlink messages based upon
past message sizes used, from Eric Dumazet.
12) qdisc dumps can take a long time, add a cond_resched(), From Eric
Dumazet.
13) Sanitize netpoll core and drivers wrt. SKB handling semantics.
Get rid of never-used-in-tree netpoll RX handling. From Eric W
Biederman.
14) Support inter-address-family and namespace changing in VTI tunnel
driver(s). From Steffen Klassert.
15) Add Altera TSE driver, from Vince Bridgers.
16) Optimizing csum_replace2() so that it doesn't adjust the checksum
by checksumming the entire header, from Eric Dumazet.
17) Expand BPF internal implementation for faster interpreting, more
direct translations into JIT'd code, and much cleaner uses of BPF
filtering in non-socket ocntexts. From Daniel Borkmann and Alexei
Starovoitov"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
net: Add a test to see if a skb is freeable in irq context
qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
net: ptp: move PTP classifier in its own file
net: sxgbe: make "core_ops" static
net: sxgbe: fix logical vs bitwise operation
net: sxgbe: sxgbe_mdio_register() frees the bus
Call efx_set_channels() before efx->type->dimension_resources()
xen-netback: disable rogue vif in kthread context
net/mlx4: Set proper build dependancy with vxlan
be2net: fix build dependency on VxLAN
mac802154: make csma/cca parameters per-wpan
mac802154: allow only one WPAN to be up at any given time
net: filter: minor: fix kdoc in __sk_run_filter
netlink: don't compare the nul-termination in nla_strcmp
can: c_can: Avoid led toggling for every packet.
can: c_can: Simplify TX interrupt cleanup
can: c_can: Store dlc private
can: c_can: Reduce register access
can: c_can: Make the code readable
...
|
|
Use the newly introduced LOOKUPNAME MDS request to connect child
inode to its parent directory.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Our longest osd request now contains 3 ops: copyup+hint+write.
Also, CEPH_OSD_MAX_OP value in a BUG_ON in rbd_osd_req_callback() was
hard-coded to 2. Fix it, and switch to rbd_assert while at it.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
This is primarily for rbd's benefit and is supposed to combat
fragmentation:
"... knowing that rbd images have a 4m size, librbd can pass a hint
that will let the osd do the xfs allocation size ioctl on new files so
that they are allocated in 1m or 4m chunks. We've seen cases where
users with rbd workloads have very high levels of fragmentation in xfs
and this would mitigate that and probably have a pretty nice
performance benefit."
SETALLOCHINT is considered advisory, so our backwards compatibility
mechanism here is to set FAILOK flag for all SETALLOCHINT ops.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
Encode ceph_osd_op::flags field so that it gets sent over the wire.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
|
|
With the addition of erasure coding support in the future, scratch
variable-length array in crush_do_rule_ary() is going to grow to at
least 200 bytes on average, on top of another 128 bytes consumed by
rawosd/osd arrays in the call chain. Replace it with a buffer inside
struct osdmap and a mutex. This shouldn't result in any contention,
because all osd requests were already serialized by request_mutex at
that point; the only unlocked caller was ceph_ioctl_get_dataloc().
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:
- substantial cleanup of the generic and transport layers, in the
direction of an ultimate goal of making struct hid_device completely
transport independent, by Benjamin Tissoires
- cp2112 driver from David Barksdale
- a lot of fixes and new hardware support (Dualshock 4) to hid-sony
driver, by Frank Praznik
- support for Win 8.1 multitouch protocol by Andrew Duggan
- other smaller fixes / device ID additions
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (75 commits)
HID: sony: fix force feedback mismerge
HID: sony: Set the quriks flag for Bluetooth controllers
HID: sony: Fix Sixaxis cable state detection
HID: uhid: Add UHID_CREATE2 + UHID_INPUT2
HID: hyperv: fix _raw_request() prototype
HID: hyperv: Implement a stub raw_request() entry point
HID: hid-sensor-hub: fix sleeping function called from invalid context
HID: multitouch: add support for Win 8.1 multitouch touchpads
HID: remove hid_output_raw_report transport implementations
HID: sony: do not rely on hid_output_raw_report
HID: cp2112: remove the last hid_output_raw_report() call
HID: cp2112: remove various hid_out_raw_report calls
HID: multitouch: add support of other generic collections in hid-mt
HID: multitouch: remove pen special handling
HID: multitouch: remove registered devices with default behavior
HID: hidp: Add a comment that some devices depend on the current behavior of uniq
HID: sony: Prevent duplicate controller connections.
HID: sony: Perform a boundry check on the sixaxis battery level index.
HID: sony: Fix work queue issues
HID: sony: Fix multi-line comment styling
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
"Usual rocket science -- mostly documentation and comment updates"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
sparse: fix comment
doc: fix double words
isdn: capi: fix "CAPI_VERSION" comment
doc: DocBook: Fix typos in xml and template file
Bluetooth: add module name for btwilink
driver core: unexport static function create_syslog_header
mmc: core: typo fix in printk specifier
ARM: spear: clean up editing mistake
net-sysfs: fix comment typo 'CONFIG_SYFS'
doc: Insert MODULE_ in module-signing macros
Documentation: update URL to hfsplus Technote 1150
gpio: update path to documentation
ixgbe: Fix format string in ixgbe_fcoe.
Kconfig: Remove useless "default N" lines
user_namespace.c: Remove duplicated word in comment
CREDITS: fix formatting
treewide: Fix typo in Documentation/DocBook
mm: Fix warning on make htmldocs caused by slab.c
ata: ata-samsung_cf: cleanup in header file
idr: remove unused prototype of idr_free()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull sched/idle changes from Ingo Molnar:
"More idle code reorganization, to prepare for more integration.
(Sent separately because it depended on pending timer work, which is
now upstream)"
* 'sched-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/idle: Add more comments to the code
sched/idle: Move idle conditions in cpuidle_idle main function
sched/idle: Reorganize the idle loop
cpuidle/idle: Move the cpuidle_idle_call function to idle.c
idle/cpuidle: Split cpuidle_idle_call main function into smaller functions
|
|
git://anongit.freedesktop.org/drm-intel into drm-next
- Inherit/reuse firmwar framebuffers (for real this time) from Jesse, less
flicker for fastbooting.
- More flexible cloning for hdmi (Ville).
- Some PPGTT fixes from Ben.
- Ring init fixes from Naresh Kumar.
- set_cache_level regression fixes for the vma conversion from Ville&Chris.
- Conversion to the new dp aux helpers (Jani).
- Unification of runtime pm with pc8 support from Paulo, prep work for runtime
pm on other platforms than HSW.
- Larger cursor sizes (Sagar Kamble).
- Piles of improvements and fixes all over, as usual.
* tag 'drm-intel-next-2014-03-21' of git://anongit.freedesktop.org/drm-intel: (75 commits)
drm/i915: Include a note about the dangers of I915_READ64/I915_WRITE64
drm/i915/sdvo: fix questionable return value check
drm/i915: Fix unsafe loop iteration over vma whilst unbinding them
drm/i915: Enabling 128x128 and 256x256 ARGB Cursor Support
drm/i915: Print how many objects are shared in per-process stats
drm/i915: Per-process stats work better when evaluated per-process
drm/i915: remove rps local variables
drm/i915: Remove extraneous MMIO for RPS
drm/i915: Rename and comment all the RPS *stuff*
drm/i915: Store the HW min frequency as min_freq
drm/i915: Fix coding style for RPS
drm/i915: Reorganize the overclock code
drm/i915: init pm.suspended earlier
drm/i915: update the PC8 and runtime PM documentation
drm/i915: rename __hsw_do_{en, dis}able_pc8
drm/i915: kill struct i915_package_c8
drm/i915: move pc8.irqs_disabled to pm.irqs_disabled
drm/i915: remove dev_priv->pc8.enabled
drm/i915: don't get/put PC8 when getting/putting power wells
drm/i915: make intel_aux_display_runtime_get get runtime PM, not PC8
...
Conflicts:
drivers/gpu/drm/i915/intel_display.c
drivers/gpu/drm/i915/intel_dp.c
|
|
Pull kvm updates from Paolo Bonzini:
"PPC and ARM do not have much going on this time. Most of the cool
stuff, instead, is in s390 and (after a few releases) x86.
ARM has some caching fixes and PPC has transactional memory support in
guests. MIPS has some fixes, with more probably coming in 3.16 as
QEMU will soon get support for MIPS KVM.
For x86 there are optimizations for debug registers, which trigger on
some Windows games, and other important fixes for Windows guests. We
now expose to the guest Broadwell instruction set extensions and also
Intel MPX. There's also a fix/workaround for OS X guests, nested
virtualization features (preemption timer), and a couple kvmclock
refinements.
For s390, the main news is asynchronous page faults, together with
improvements to IRQs (floating irqs and adapter irqs) that speed up
virtio devices"
* tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits)
KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8
KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset
KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode
KVM: PPC: Book3S HV: Return ENODEV error rather than EIO
KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code
KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state
KVM: PPC: Book3S HV: Add transactional memory support
KVM: Specify byte order for KVM_EXIT_MMIO
KVM: vmx: fix MPX detection
KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n
KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE
KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write
KVM: s390: clear local interrupts at cpu initial reset
KVM: s390: Fix possible memory leak in SIGP functions
KVM: s390: fix calculation of idle_mask array size
KVM: s390: randomize sca address
KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP
KVM: Bump KVM_MAX_IRQ_ROUTES for s390
KVM: s390: irq routing for adapter interrupts.
KVM: s390: adapter interrupt sources
...
|
|
Pull devicetree changes from Grant Likely:
"Updates to devicetree core code. This branch contains the following
notable changes:
- add reserved memory binding
- make struct device_node a kobject and remove legacy
/proc/device-tree
- ePAPR conformance fixes
- update in-kernel DTC copy to version v1.4.0
- preparatory changes for dynamic device tree overlays
- minor bug fixes and documentation changes
The most significant change in this branch is the conversion of struct
device_node to be a kobject that is exposed via sysfs and removal of
the old /proc/device-tree code. This simplifies the device tree
handling code and tightens up the lifecycle on device tree nodes.
[updated: added fix for dangling select PROC_DEVICETREE]"
* tag 'dt-for-linus' of git://git.secretlab.ca/git/linux: (29 commits)
dt: Remove dangling "select PROC_DEVICETREE"
of: Add support for ePAPR "stdout-path" property
of: device_node kobject lifecycle fixes
of: only scan for reserved mem when fdt present
powerpc: add support for reserved memory defined by device tree
arm64: add support for reserved memory defined by device tree
of: add missing major vendors
of: add vendor prefix for SMSC
of: remove /proc/device-tree
of/selftest: Add self tests for manipulation of properties
of: Make device nodes kobjects so they show up in sysfs
arm: add support for reserved memory defined by device tree
drivers: of: add support for custom reserved memory drivers
drivers: of: add initialization code for dynamic reserved memory
drivers: of: add initialization code for static reserved memory
of: document bindings for reserved-memory nodes
Revert "of: fix of_update_property()"
kbuild: dtbs_install: new make target
ARM: mvebu: Allows to get the SoC ID even without PCI enabled
of: Allows to use the PCI translator without the PCI core
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI and power management updates from Rafael Wysocki:
"These are commits that were not quite ready when I sent the original
pull request for 3.15-rc1 several days ago, but they have spent some
time in linux-next since then and appear to be good to go. All of
them are fixes and cleanups.
Specifics:
- Remaining changes from upstream ACPICA release 20140214 that
introduce code to automatically serialize the execution of methods
creating any named objects which really cannot be executed in
parallel with each other anyway (previously ACPICA attempted to
address that by aborting methods upon conflict detection, but that
wasn't reliable enough and led to other issues). From Bob Moore
and Lv Zheng.
- intel_pstate fix to use del_timer_sync() instead of del_timer() in
the exit path before freeing the timer structure from Dirk
Brandewie (original patch from Thomas Gleixner).
- cpufreq fix related to system resume from Viresh Kumar.
- Serialization of frequency transitions in cpufreq that involve
PRECHANGE and POSTCHANGE notifications to avoid ordering issues
resulting from race conditions. From Srivatsa S Bhat and Viresh
Kumar.
- Revert of an ACPI processor driver change that was based on a
specific interpretation of the ACPI spec which may not be correct
(the relevant part of the spec appears to be incomplete). From
Hanjun Guo.
- Runtime PM core cleanups and documentation updates from Geert
Uytterhoeven.
- PNP core cleanup from Michael Opdenacker"
* tag 'pm+acpi-3.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Make cpufreq_notify_transition & cpufreq_notify_post_transition static
cpufreq: Convert existing drivers to use cpufreq_freq_transition_{begin|end}
cpufreq: Make sure frequency transitions are serialized
intel_pstate: Use del_timer_sync in intel_pstate_cpu_stop
cpufreq: resume drivers before enabling governors
PM / Runtime: Spelling s/competing/completing/
PM / Runtime: s/foo_process_requests/foo_process_next_request/
PM / Runtime: GENERIC_SUBSYS_PM_OPS is gone
PM / Runtime: Correct documented return values for generic PM callbacks
PM / Runtime: Split line longer than 80 characters
PM / Runtime: dev_pm_info.runtime_error is signed
Revert "ACPI / processor: Make it possible to get APIC ID via GIC"
ACPICA: Enable auto-serialization as a default kernel behavior.
ACPICA: Ignore sync_level for methods that have been auto-serialized.
ACPICA: Add additional named objects for the auto-serialize method scan.
ACPICA: Add auto-serialization support for ill-behaved control methods.
ACPICA: Remove global option to serialize all control methods.
PNP: remove deprecated IRQF_DISABLED
|
|
1000-1099 is for configuring things. So auditd ignored such messages.
This is about actually logging what was configured. Move it into the
range for such types of messages.
Reported-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 old platform removal from Peter Anvin:
"This patchset removes support for several completely obsolete
platforms, where the maintainers either have completely vanished or
acked the removal. For some of them it is questionable if there even
exists functional specimens of the hardware"
Geert Uytterhoeven apparently thought this was a April Fool's pull request ;)
* 'x86-nuke-platforms-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, platforms: Remove NUMAQ
x86, platforms: Remove SGI Visual Workstation
x86, apic: Remove support for IBM Summit/EXA chipset
x86, apic: Remove support for ia32-based Unisys ES7000
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull compat time conversion changes from Peter Anvin:
"Despite the branch name this is really neither an x86 nor an
x32-specific patchset, although it the implementation of the
discussions that followed the x32 security hole a few months ago.
This removes get/put_compat_timespec/val() and replaces them with
compat_get/put_timespec/val() which are savvy as to the current status
of COMPAT_USE_64BIT_TIME.
It removes several unused and/or incorrect/misleading functions (like
compat_put_timeval_convert which doesn't in fact do any conversion)
and also replaces several open-coded implementations what is now
called compat_convert_timespec() with that function"
* 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
compat: Fix sparse address space warnings
compat: Get rid of (get|put)_compat_time(val|spec)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso changes from Peter Anvin:
"This is the revamp of the 32-bit vdso and the associated cleanups.
This adds timekeeping support to the 32-bit vdso that we already have
in the 64-bit vdso. Although 32-bit x86 is legacy, it is likely to
remain in the embedded space for a very long time to come.
This removes the traditional COMPAT_VDSO support; the configuration
variable is reused for simply removing the 32-bit vdso, which will
produce correct results but obviously suffer a performance penalty.
Only one beta version of glibc was affected, but that version was
unfortunately included in one OpenSUSE release.
This is not the end of the vdso cleanups. Stefani and Andy have
agreed to continue work for the next kernel cycle; in fact Andy has
already produced another set of cleanups that came too late for this
cycle.
An incidental, but arguably important, change is that this ensures
that unused space in the VVAR page is properly zeroed. It wasn't
before, and would contain whatever garbage was left in memory by BIOS
or the bootloader. Since the VVAR page is accessible to user space
this had the potential of information leaks"
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86, vdso: Fix the symbol versions on the 32-bit vDSO
x86, vdso, build: Don't rebuild 32-bit vdsos on every make
x86, vdso: Actually discard the .discard sections
x86, vdso: Fix size of get_unmapped_area()
x86, vdso: Finish removing VDSO32_PRELINK
x86, vdso: Move more vdso definitions into vdso.h
x86: Load the 32-bit vdso in place, just like the 64-bit vdsos
x86, vdso32: handle 32 bit vDSO larger one page
x86, vdso32: Disable stack protector, adjust optimizations
x86, vdso: Zero-pad the VVAR page
x86, vdso: Add 32 bit VDSO time support for 64 bit kernel
x86, vdso: Add 32 bit VDSO time support for 32 bit kernel
x86, vdso: Patch alternatives in the 32-bit VDSO
x86, vdso: Introduce VVAR marco for vdso32
x86, vdso: Cleanup __vdso_gettimeofday()
x86, vdso: Replace VVAR(vsyscall_gtod_data) by gtod macro
x86, vdso: __vdso_clock_gettime() cleanup
x86, vdso: Revamp vclock_gettime.c
mm: Add new func _install_special_mapping() to mmap.c
x86, vdso: Make vsyscall_gtod_data handling x86 generic
...
|
|
'arm/shmobile' and 'x86/vt-d' into next
|
|
Introduce a bit kernel and userspace exchange between each-other on
the init stage and turn writeback on if the userspace want this and
mount option 'allow_wbcache' is present (controlled by fusermount).
Also add each writable file into per-inode write list and call the
generic_file_aio_write to make use of the Linux page cache engine.
Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
always equal to &iocb->ki_pos.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
same story - it's &iocb->ki_pos in all cases
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
It's always equal to &iocb->ki_pos, where iocb is the value of the 1st
argument.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
sg_iovec array passed to it can be const
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
generic_file_aio_read() was looping over the target iovec, with loop over
(source) pages nested inside that. Just set an iov_iter up and pass *that*
to do_generic_file_aio_read(). With copy_page_to_iter() doing all work
of mapping and copying a page to iovec and advancing iov_iter.
Switch shmem_file_aio_read() to the same and kill file_read_actor(), while
we are at it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
all pipe_buffer_operations have the same instances of those...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|