summaryrefslogtreecommitdiffstats
path: root/security
AgeCommit message (Collapse)Author
2014-07-17Merge branch 'stable-3.16' of git://git.infradead.org/users/pcmoore/selinux ↵James Morris
into next
2014-07-16sched: Remove proliferation of wait_on_bit() action functionsNeilBrown
The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are *not* given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> (fscache, keys) Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2) Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-15cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypesTejun Heo
Currently, cgroup_subsys->base_cftypes is used for both the unified default hierarchy and legacy ones and subsystems can mark each file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear only on one of them. This is quite hairy and error-prone. Also, we may end up exposing interface files to the default hierarchy without thinking it through. cgroup_subsys will grow two separate cftype arrays and apply each only on the hierarchies of the matching type. This will allow organizing cftypes in a lot clearer way and encourage subsystems to scrutinize the interface which is being exposed in the new default hierarchy. In preparation, this patch renames cgroup_subsys->base_cftypes to cgroup_subsys->legacy_cftypes. This patch is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2014-07-10selinux: fix the default socket labeling in sock_graft()Paul Moore
The sock_graft() hook has special handling for AF_INET, AF_INET, and AF_UNIX sockets as those address families have special hooks which label the sock before it is attached its associated socket. Unfortunately, the sock_graft() hook was missing a default approach to labeling sockets which meant that any other address family which made use of connections or the accept() syscall would find the returned socket to be in an "unlabeled" state. This was recently demonstrated by the kcrypto/AF_ALG subsystem and the newly released cryptsetup package (cryptsetup v1.6.5 and later). This patch preserves the special handling in selinux_sock_graft(), but adds a default behavior - setting the sock's label equal to the associated socket - which resolves the problem with AF_ALG and presumably any other address family which makes use of accept(). Cc: stable@vger.kernel.org Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Milan Broz <gmazyland@gmail.com>
2014-06-26selinux: reduce the number of calls to synchronize_net() when flushing cachesPaul Moore
When flushing the AVC, such as during a policy load, the various network caches are also flushed, with each making a call to synchronize_net() which has shown to be expensive in some cases. This patch consolidates the network cache flushes into a single AVC callback which only calls synchronize_net() once for each AVC cache flush. Reported-by: Jaejyn Shin <flagon22bass@gmail.com> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-06-23selinux: no recursive read_lock of policy_rwlock in security_genfs_sid()Waiman Long
With the introduction of fair queued rwlock, recursive read_lock() may hang the offending process if there is a write_lock() somewhere in between. With recursive read_lock checking enabled, the following error was reported: ============================================= [ INFO: possible recursive locking detected ] 3.16.0-rc1 #2 Tainted: G E --------------------------------------------- load_policy/708 is trying to acquire lock: (policy_rwlock){.+.+..}, at: [<ffffffff8125b32a>] security_genfs_sid+0x3a/0x170 but task is already holding lock: (policy_rwlock){.+.+..}, at: [<ffffffff8125b48c>] security_fs_use+0x2c/0x110 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(policy_rwlock); lock(policy_rwlock); This patch fixes the occurrence of recursive read_lock() of policy_rwlock by adding a helper function __security_genfs_sid() which requires caller to take the lock before calling it. The security_fs_use() was then modified to call the new helper function. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-06-19selinux: fix a possible memory leak in cond_read_node()Namhyung Kim
The cond_read_node() should free the given node on error path as it's not linked to p->cond_list yet. This is done via cond_node_destroy() but it's not called when next_entry() fails before the expr loop. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-06-19selinux: simple cleanup for cond_read_node()Namhyung Kim
The node->cur_state and len can be read in a single call of next_entry(). And setting len before reading is a dead write so can be eliminated. Signed-off-by: Namhyung Kim <namhyung@kernel.org> (Minor tweak to the length parameter in the call to next_entry()) Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-06-18security: Used macros from compiler.h instead of __attribute__((...))Gideon Israel Dsouza
To increase compiler portability there is <linux/compiler.h> which provides convenience macros for various gcc constructs. Eg: __packed for __attribute__((packed)). This patch is part of a large task I've taken to clean the gcc specific attributes and use the the macros instead. Signed-off-by: Gideon Israel Dsouza <gidisrael@gmail.com> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-06-18selinux: introduce str_read() helperNamhyung Kim
There're some code duplication for reading a string value during policydb_read(). Add str_read() helper to fix it. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-06-17SELinux: use ARRAY_SIZEHimangi Saraogi
ARRAY_SIZE is more concise to use when the size of an array is divided by the size of its type or the size of its first element. The Coccinelle semantic patch that makes this change is as follows: // <smpl> @@ type T; T[] E; @@ - (sizeof(E)/sizeof(E[...])) + ARRAY_SIZE(E) // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-06-17Merge tag 'v3.15' into nextPaul Moore
Linux 3.15
2014-06-13Merge branch 'serge-next-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux-security Pull more security layer updates from Serge Hallyn: "A few more commits had previously failed to make it through security-next into linux-next but this week made it into linux-next. At least commit "ima: introduce ima_kernel_read()" was deemed critical by Mimi to make this merge window. This is a temporary tree just for this request. Mimi has pointed me to some previous threads about keeping maintainer trees at the previous release, which I'll certainly do for anything long-term, after talking with James" * 'serge-next-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux-security: ima: introduce ima_kernel_read() evm: prohibit userspace writing 'security.evm' HMAC value ima: check inode integrity cache in violation check ima: prevent unnecessary policy checking evm: provide option to protect additional SMACK xattrs evm: replace HMAC version with attribute mask ima: prevent new digsig xattr from being replaced
2014-06-12ima: introduce ima_kernel_read()Dmitry Kasatkin
Commit 8aac62706 "move exit_task_namespaces() outside of exit_notify" introduced the kernel opps since the kernel v3.10, which happens when Apparmor and IMA-appraisal are enabled at the same time. ---------------------------------------------------------------------- [ 106.750167] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 106.750221] IP: [<ffffffff811ec7da>] our_mnt+0x1a/0x30 [ 106.750241] PGD 0 [ 106.750254] Oops: 0000 [#1] SMP [ 106.750272] Modules linked in: cuse parport_pc ppdev bnep rfcomm bluetooth rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel snd_hda_codec_hdmi kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd snd_hda_codec_realtek dcdbas snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi psmouse snd_seq microcode serio_raw snd_timer snd_seq_device snd soundcore video lpc_ich coretemp mac_hid lp parport mei_me mei nbd hid_generic e1000e usbhid ahci ptp hid libahci pps_core [ 106.750658] CPU: 6 PID: 1394 Comm: mysqld Not tainted 3.13.0-rc7-kds+ #15 [ 106.750673] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A08 09/19/2012 [ 106.750689] task: ffff8800de804920 ti: ffff880400fca000 task.ti: ffff880400fca000 [ 106.750704] RIP: 0010:[<ffffffff811ec7da>] [<ffffffff811ec7da>] our_mnt+0x1a/0x30 [ 106.750725] RSP: 0018:ffff880400fcba60 EFLAGS: 00010286 [ 106.750738] RAX: 0000000000000000 RBX: 0000000000000100 RCX: ffff8800d51523e7 [ 106.750764] RDX: ffffffffffffffea RSI: ffff880400fcba34 RDI: ffff880402d20020 [ 106.750791] RBP: ffff880400fcbae0 R08: 0000000000000000 R09: 0000000000000001 [ 106.750817] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800d5152300 [ 106.750844] R13: ffff8803eb8df510 R14: ffff880400fcbb28 R15: ffff8800d51523e7 [ 106.750871] FS: 0000000000000000(0000) GS:ffff88040d200000(0000) knlGS:0000000000000000 [ 106.750910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.750935] CR2: 0000000000000018 CR3: 0000000001c0e000 CR4: 00000000001407e0 [ 106.750962] Stack: [ 106.750981] ffffffff813434eb ffff880400fcbb20 ffff880400fcbb18 0000000000000000 [ 106.751037] ffff8800de804920 ffffffff8101b9b9 0001800000000000 0000000000000100 [ 106.751093] 0000010000000000 0000000000000002 000000000000000e ffff8803eb8df500 [ 106.751149] Call Trace: [ 106.751172] [<ffffffff813434eb>] ? aa_path_name+0x2ab/0x430 [ 106.751199] [<ffffffff8101b9b9>] ? sched_clock+0x9/0x10 [ 106.751225] [<ffffffff8134a68d>] aa_path_perm+0x7d/0x170 [ 106.751250] [<ffffffff8101b945>] ? native_sched_clock+0x15/0x80 [ 106.751276] [<ffffffff8134aa73>] aa_file_perm+0x33/0x40 [ 106.751301] [<ffffffff81348c5e>] common_file_perm+0x8e/0xb0 [ 106.751327] [<ffffffff81348d78>] apparmor_file_permission+0x18/0x20 [ 106.751355] [<ffffffff8130c853>] security_file_permission+0x23/0xa0 [ 106.751382] [<ffffffff811c77a2>] rw_verify_area+0x52/0xe0 [ 106.751407] [<ffffffff811c789d>] vfs_read+0x6d/0x170 [ 106.751432] [<ffffffff811cda31>] kernel_read+0x41/0x60 [ 106.751457] [<ffffffff8134fd45>] ima_calc_file_hash+0x225/0x280 [ 106.751483] [<ffffffff8134fb52>] ? ima_calc_file_hash+0x32/0x280 [ 106.751509] [<ffffffff8135022d>] ima_collect_measurement+0x9d/0x160 [ 106.751536] [<ffffffff810b552d>] ? trace_hardirqs_on+0xd/0x10 [ 106.751562] [<ffffffff8134f07c>] ? ima_file_free+0x6c/0xd0 [ 106.751587] [<ffffffff81352824>] ima_update_xattr+0x34/0x60 [ 106.751612] [<ffffffff8134f0d0>] ima_file_free+0xc0/0xd0 [ 106.751637] [<ffffffff811c9635>] __fput+0xd5/0x300 [ 106.751662] [<ffffffff811c98ae>] ____fput+0xe/0x10 [ 106.751687] [<ffffffff81086774>] task_work_run+0xc4/0xe0 [ 106.751712] [<ffffffff81066fad>] do_exit+0x2bd/0xa90 [ 106.751738] [<ffffffff8173c958>] ? retint_swapgs+0x13/0x1b [ 106.751763] [<ffffffff8106780c>] do_group_exit+0x4c/0xc0 [ 106.751788] [<ffffffff81067894>] SyS_exit_group+0x14/0x20 [ 106.751814] [<ffffffff8174522d>] system_call_fastpath+0x1a/0x1f [ 106.751839] Code: c3 0f 1f 44 00 00 55 48 89 e5 e8 22 fe ff ff 5d c3 0f 1f 44 00 00 55 65 48 8b 04 25 c0 c9 00 00 48 8b 80 28 06 00 00 48 89 e5 5d <48> 8b 40 18 48 39 87 c0 00 00 00 0f 94 c0 c3 0f 1f 80 00 00 00 [ 106.752185] RIP [<ffffffff811ec7da>] our_mnt+0x1a/0x30 [ 106.752214] RSP <ffff880400fcba60> [ 106.752236] CR2: 0000000000000018 [ 106.752258] ---[ end trace 3c520748b4732721 ]--- ---------------------------------------------------------------------- The reason for the oops is that IMA-appraisal uses "kernel_read()" when file is closed. kernel_read() honors LSM security hook which calls Apparmor handler, which uses current->nsproxy->mnt_ns. The 'guilty' commit changed the order of cleanup code so that nsproxy->mnt_ns was not already available for Apparmor. Discussion about the issue with Al Viro and Eric W. Biederman suggested that kernel_read() is too high-level for IMA. Another issue, except security checking, that was identified is mandatory locking. kernel_read honors it as well and it might prevent IMA from calculating necessary hash. It was suggested to use simplified version of the function without security and locking checks. This patch introduces special version ima_kernel_read(), which skips security and mandatory locking checking. It prevents the kernel oops to happen. Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org>
2014-06-12evm: prohibit userspace writing 'security.evm' HMAC valueMimi Zohar
Calculating the 'security.evm' HMAC value requires access to the EVM encrypted key. Only the kernel should have access to it. This patch prevents userspace tools(eg. setfattr, cp --preserve=xattr) from setting/modifying the 'security.evm' HMAC value directly. Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org>
2014-06-12ima: check inode integrity cache in violation checkDmitry Kasatkin
When IMA did not support ima-appraisal, existance of the S_IMA flag clearly indicated that the file was measured. With IMA appraisal S_IMA flag indicates that file was measured and/or appraised. Because of this, when measurement is not enabled by the policy, violations are still reported. To differentiate between measurement and appraisal policies this patch checks the inode integrity cache flags. The IMA_MEASURED flag indicates whether the file was actually measured, while the IMA_MEASURE flag indicates whether the file should be measured. Unfortunately, the IMA_MEASURED flag is reset to indicate the file needs to be re-measured. Thus, this patch checks the IMA_MEASURE flag. This patch limits the false positive violation reports, but does not fix it entirely. The IMA_MEASURE/IMA_MEASURED flags are indications that, at some point in time, the file opened for read was in policy, but might not be in policy now (eg. different uid). Other changes would be needed to further limit false positive violation reports. Changelog: - expanded patch description based on conversation with Roberto (Mimi) Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2014-06-12ima: prevent unnecessary policy checkingDmitry Kasatkin
ima_rdwr_violation_check is called for every file openning. The function checks the policy even when violation condition is not met. It causes unnecessary policy checking. This patch does policy checking only if violation condition is met. Changelog: - check writecount is greater than zero (Mimi) Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2014-06-12evm: provide option to protect additional SMACK xattrsDmitry Kasatkin
Newer versions of SMACK introduced following security xattrs: SMACK64EXEC, SMACK64TRANSMUTE and SMACK64MMAP. To protect these xattrs, this patch includes them in the HMAC calculation. However, for backwards compatibility with existing labeled filesystems, including these xattrs needs to be configurable. Changelog: - Add SMACK dependency on new option (Mimi) Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2014-06-12evm: replace HMAC version with attribute maskDmitry Kasatkin
Using HMAC version limits the posibility to arbitrarily add new attributes such as SMACK64EXEC to the hmac calculation. This patch replaces hmac version with attribute mask. Desired attributes can be enabled with configuration parameter. It allows to build kernels which works with previously labeled filesystems. Currently supported attribute is 'fsuuid' which is equivalent of the former version 2. Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2014-06-12ima: prevent new digsig xattr from being replacedMimi Zohar
Even though a new xattr will only be appraised on the next access, set the DIGSIG flag to prevent a signature from being replaced with a hash on file close. Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2014-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...
2014-06-12tomoyo: Use sensible time interfaceThomas Gleixner
There is no point in calling gettimeofday if only the seconds part of the timespec is used. Use get_seconds() instead. It's not only the proper interface it's also faster. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: linux-security-module@vger.kernel.org Link: http://lkml.kernel.org/r/20140611234607.775273584@linutronix.de
2014-06-10Merge branch 'serge-next-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux-security Pull security layer updates from Serge Hallyn: "This is a merge of James Morris' security-next tree from 3.14 to yesterday's master, plus four patches from Paul Moore which are in linux-next, plus one patch from Mimi" * 'serge-next-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux-security: ima: audit log files opened with O_DIRECT flag selinux: conditionally reschedule in hashtab_insert while loading selinux policy selinux: conditionally reschedule in mls_convert_context while loading selinux policy selinux: reject setexeccon() on MNT_NOSUID applications with -EACCES selinux: Report permissive mode in avc: denied messages. Warning in scanf string typing Smack: Label cgroup files for systemd Smack: Verify read access on file open - v3 security: Convert use of typedef ctl_table to struct ctl_table Smack: bidirectional UDS connect check Smack: Correctly remove SMACK64TRANSMUTE attribute SMACK: Fix handling value==NULL in post setxattr bugfix patch for SMACK Smack: adds smackfs/ptrace interface Smack: unify all ptrace accesses in the smack Smack: fix the subject/object order in smack_ptrace_traceme() Minor improvement of 'smack_sb_kern_mount' smack: fix key permission verification KEYS: Move the flags representing required permission to linux/key.h
2014-06-09Merge branch 'for-3.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "A lot of activities on cgroup side. Heavy restructuring including locking simplification took place to improve the code base and enable implementation of the unified hierarchy, which currently exists behind a __DEVEL__ mount option. The core support is mostly complete but individual controllers need further work. To explain the design and rationales of the the unified hierarchy Documentation/cgroups/unified-hierarchy.txt is added. Another notable change is css (cgroup_subsys_state - what each controller uses to identify and interact with a cgroup) iteration update. This is part of continuing updates on css object lifetime and visibility. cgroup started with reference count draining on removal way back and is now reaching a point where csses behave and are iterated like normal refcnted objects albeit with some complexities to allow distinguishing the state where they're being deleted. The css iteration update isn't taken advantage of yet but is planned to be used to simplify memcg significantly" * 'for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (77 commits) cgroup: disallow disabled controllers on the default hierarchy cgroup: don't destroy the default root cgroup: disallow debug controller on the default hierarchy cgroup: clean up MAINTAINERS entries cgroup: implement css_tryget() device_cgroup: use css_has_online_children() instead of has_children() cgroup: convert cgroup_has_live_children() into css_has_online_children() cgroup: use CSS_ONLINE instead of CGRP_DEAD cgroup: iterate cgroup_subsys_states directly cgroup: introduce CSS_RELEASED and reduce css iteration fallback window cgroup: move cgroup->serial_nr into cgroup_subsys_state cgroup: link all cgroup_subsys_states in their sibling lists cgroup: move cgroup->sibling and ->children into cgroup_subsys_state cgroup: remove cgroup->parent device_cgroup: remove direct access to cgroup->children memcg: update memcg_has_children() to use css_next_child() memcg: remove tasks/children test from mem_cgroup_force_empty() cgroup: remove css_parent() cgroup: skip refcnting on normal root csses and cgrp_dfl_root self css cgroup: use cgroup->self.refcnt for cgroup refcnting ...
2014-06-03ima: audit log files opened with O_DIRECT flagMimi Zohar
Files are measured or appraised based on the IMA policy. When a file, in policy, is opened with the O_DIRECT flag, a deadlock occurs. The first attempt at resolving this lockdep temporarily removed the O_DIRECT flag and restored it, after calculating the hash. The second attempt introduced the O_DIRECT_HAVELOCK flag. Based on this flag, do_blockdev_direct_IO() would skip taking the i_mutex a second time. The third attempt, by Dmitry Kasatkin, resolves the i_mutex locking issue, by re-introducing the IMA mutex, but uncovered another problem. Reading a file with O_DIRECT flag set, writes directly to userspace pages. A second patch allocates a user-space like memory. This works for all IMA hooks, except ima_file_free(), which is called on __fput() to recalculate the file hash. Until this last issue is addressed, do not 'collect' the measurement for measuring, appraising, or auditing files opened with the O_DIRECT flag set. Based on policy, permit or deny file access. This patch defines a new IMA policy rule option named 'permit_directio'. Policy rules could be defined, based on LSM or other criteria, to permit specific applications to open files with the O_DIRECT flag set. Changelog v1: - permit or deny file access based IMA policy rules Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Cc: <stable@vger.kernel.org>
2014-06-03selinux: conditionally reschedule in hashtab_insert while loading selinux policyDave Jones
After silencing the sleeping warning in mls_convert_context() I started seeing similar traces from hashtab_insert. Do a cond_resched there too. Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-06-03selinux: conditionally reschedule in mls_convert_context while loading ↵Dave Jones
selinux policy On a slow machine (with debugging enabled), upgrading selinux policy may take a considerable amount of time. Long enough that the softlockup detector gets triggered. The backtrace looks like this.. > BUG: soft lockup - CPU#2 stuck for 23s! [load_policy:19045] > Call Trace: > [<ffffffff81221ddf>] symcmp+0xf/0x20 > [<ffffffff81221c27>] hashtab_search+0x47/0x80 > [<ffffffff8122e96c>] mls_convert_context+0xdc/0x1c0 > [<ffffffff812294e8>] convert_context+0x378/0x460 > [<ffffffff81229170>] ? security_context_to_sid_core+0x240/0x240 > [<ffffffff812221b5>] sidtab_map+0x45/0x80 > [<ffffffff8122bb9f>] security_load_policy+0x3ff/0x580 > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100 > [<ffffffff810786dd>] ? sched_clock_local+0x1d/0x80 > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100 > [<ffffffff8103096a>] ? __change_page_attr_set_clr+0x82a/0xa50 > [<ffffffff810786dd>] ? sched_clock_local+0x1d/0x80 > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100 > [<ffffffff8103096a>] ? __change_page_attr_set_clr+0x82a/0xa50 > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100 > [<ffffffff81534ddc>] ? retint_restore_args+0xe/0xe > [<ffffffff8109c82d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 > [<ffffffff81279a2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [<ffffffff810d28a8>] ? rcu_irq_exit+0x68/0xb0 > [<ffffffff81534ddc>] ? retint_restore_args+0xe/0xe > [<ffffffff8121e947>] sel_write_load+0xa7/0x770 > [<ffffffff81139633>] ? vfs_write+0x1c3/0x200 > [<ffffffff81210e8e>] ? security_file_permission+0x1e/0xa0 > [<ffffffff8113952b>] vfs_write+0xbb/0x200 > [<ffffffff811581c7>] ? fget_light+0x397/0x4b0 > [<ffffffff81139c27>] SyS_write+0x47/0xa0 > [<ffffffff8153bde4>] tracesys+0xdd/0xe2 Stephen Smalley suggested: > Maybe put a cond_resched() within the ebitmap_for_each_positive_bit() > loop in mls_convert_context()? That seems to do the trick. Tested by downgrading and re-upgrading selinux-policy-targeted. Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-06-03selinux: reject setexeccon() on MNT_NOSUID applications with -EACCESPaul Moore
We presently prevent processes from using setexecon() to set the security label of exec()'d processes when NO_NEW_PRIVS is enabled by returning an error; however, we silently ignore setexeccon() when exec()'ing from a nosuid mounted filesystem. This patch makes things a bit more consistent by returning an error in the setexeccon()/nosuid case. Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
2014-06-03selinux: Report permissive mode in avc: denied messages.Stephen Smalley
We cannot presently tell from an avc: denied message whether access was in fact denied or was allowed due to global or per-domain permissive mode. Add a permissive= field to the avc message to reflect this information. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/bonding/bond_alb.c drivers/net/ethernet/altera/altera_msgdma.c drivers/net/ethernet/altera/altera_sgdma.c net/ipv6/xfrm6_output.c Several cases of overlapping changes. The xfrm6_output.c has a bug fix which overlaps the renaming of skb->local_df to skb->ignore_df. In the Altera TSE driver cases, the register access cleanups in net-next overlapped with bug fixes done in net. Similarly a bug fix to send ALB packets in the bonding driver using the right source address overlaps with cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-20Merge branch 'smack-for-3.16' of git://git.gitorious.org/smack-next/kernel ↵James Morris
into next
2014-05-16device_cgroup: use css_has_online_children() instead of has_children()Tejun Heo
devcgroup_update_access() wants to know whether there are child cgroups which are online and visible to userland and has_children() may return false positive. Replace it with css_has_online_children(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com> Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-16device_cgroup: remove direct access to cgroup->childrenTejun Heo
Currently, devcg::has_children() directly tests cgroup->children for list emptiness. The field is not a published field and scheduled to go away. In addition, the test isn't strictly correct as devcg should only care about children which are visible to userland. This patch converts has_children() to use css_next_child() instead. The subtle incorrectness is noted and will be dealt with later. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com> Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-16cgroup: remove css_parent()Tejun Heo
cgroup in general is moving towards using cgroup_subsys_state as the fundamental structural component and css_parent() was introduced to convert from using cgroup->parent to css->parent. It was quite some time ago and we're moving forward with making css more prominent. This patch drops the trivial wrapper css_parent() and let the users dereference css->parent. While at it, explicitly mark fields of css which are public and immutable. v2: New usage from device_cgroup.c converted. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: "David S. Miller" <davem@davemloft.net> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Johannes Weiner <hannes@cmpxchg.org>
2014-05-15selinux: conditionally reschedule in hashtab_insert while loading selinux policyDave Jones
After silencing the sleeping warning in mls_convert_context() I started seeing similar traces from hashtab_insert. Do a cond_resched there too. Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-05-15selinux: conditionally reschedule in mls_convert_context while loading ↵Dave Jones
selinux policy On a slow machine (with debugging enabled), upgrading selinux policy may take a considerable amount of time. Long enough that the softlockup detector gets triggered. The backtrace looks like this.. > BUG: soft lockup - CPU#2 stuck for 23s! [load_policy:19045] > Call Trace: > [<ffffffff81221ddf>] symcmp+0xf/0x20 > [<ffffffff81221c27>] hashtab_search+0x47/0x80 > [<ffffffff8122e96c>] mls_convert_context+0xdc/0x1c0 > [<ffffffff812294e8>] convert_context+0x378/0x460 > [<ffffffff81229170>] ? security_context_to_sid_core+0x240/0x240 > [<ffffffff812221b5>] sidtab_map+0x45/0x80 > [<ffffffff8122bb9f>] security_load_policy+0x3ff/0x580 > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100 > [<ffffffff810786dd>] ? sched_clock_local+0x1d/0x80 > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100 > [<ffffffff8103096a>] ? __change_page_attr_set_clr+0x82a/0xa50 > [<ffffffff810786dd>] ? sched_clock_local+0x1d/0x80 > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100 > [<ffffffff8103096a>] ? __change_page_attr_set_clr+0x82a/0xa50 > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100 > [<ffffffff81534ddc>] ? retint_restore_args+0xe/0xe > [<ffffffff8109c82d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 > [<ffffffff81279a2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [<ffffffff810d28a8>] ? rcu_irq_exit+0x68/0xb0 > [<ffffffff81534ddc>] ? retint_restore_args+0xe/0xe > [<ffffffff8121e947>] sel_write_load+0xa7/0x770 > [<ffffffff81139633>] ? vfs_write+0x1c3/0x200 > [<ffffffff81210e8e>] ? security_file_permission+0x1e/0xa0 > [<ffffffff8113952b>] vfs_write+0xbb/0x200 > [<ffffffff811581c7>] ? fget_light+0x397/0x4b0 > [<ffffffff81139c27>] SyS_write+0x47/0xa0 > [<ffffffff8153bde4>] tracesys+0xdd/0xe2 Stephen Smalley suggested: > Maybe put a cond_resched() within the ebitmap_for_each_positive_bit() > loop in mls_convert_context()? That seems to do the trick. Tested by downgrading and re-upgrading selinux-policy-targeted. Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-05-15selinux: reject setexeccon() on MNT_NOSUID applications with -EACCESPaul Moore
We presently prevent processes from using setexecon() to set the security label of exec()'d processes when NO_NEW_PRIVS is enabled by returning an error; however, we silently ignore setexeccon() when exec()'ing from a nosuid mounted filesystem. This patch makes things a bit more consistent by returning an error in the setexeccon()/nosuid case. Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
2014-05-13cgroup: replace cftype->write_string() with cftype->write()Tejun Heo
Convert all cftype->write_string() users to the new cftype->write() which maps directly to kernfs write operation and has full access to kernfs and cgroup contexts. The conversions are mostly mechanical. * @css and @cft are accessed using of_css() and of_cft() accessors respectively instead of being specified as arguments. * Should return @nbytes on success instead of 0. * @buf is not trimmed automatically. Trim if necessary. Note that blkcg and netprio don't need this as the parsers already handle whitespaces. cftype->write_string() has no user left after the conversions and removed. While at it, remove unnecessary local variable @p in cgroup_subtree_control_write() and stale comment about CGROUP_LOCAL_BUFFER_SIZE in cgroup_freezer.c. This patch doesn't introduce any visible behavior changes. v2: netprio was missing from conversion. Converted. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net>
2014-05-13Merge branch 'for-3.15-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "During recent restructuring, device_cgroup unified config input check and enforcement logic; unfortunately, it turned out to share too much. Aristeu's patches fix the breakage and marked for -stable backport. The other two patches are fallouts from kernfs conversion. The blkcg change is temporary and will go away once kernfs internal locking gets simplified (patches pending)" * 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats() device_cgroup: check if exception removal is allowed device_cgroup: fix the comment format for recently added functions device_cgroup: rework device access check and exception checking cgroup: fix the retry path of cgroup_mount()
2014-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/altera/altera_sgdma.c net/netlink/af_netlink.c net/sched/cls_api.c net/sched/sch_api.c The netlink conflict dealt with moving to netlink_capable() and netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations in non-init namespaces. These were simple transformations from netlink_capable to netlink_ns_capable. The Altera driver conflict was simply code removal overlapping some void pointer cast cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-06Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "dcache fixes + kvfree() (uninlined, exported by mm/util.c) + posix_acl bugfix from hch" The dcache fixes are for a subtle LRU list corruption bug reported by Miklos Szeredi, where people inside IBM saw list corruptions with the LTP/host01 test. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nick kvfree() from apparmor posix_acl: handle NULL ACL in posix_acl_equiv_mode dcache: don't need rcu in shrink_dentry_list() more graceful recovery in umount_collect() don't remove from shrink list in select_collect() dentry_kill(): don't try to remove from shrink list expand the call of dentry_lru_del() in dentry_kill() new helper: dentry_free() fold try_prune_one_dentry() fold d_kill() and d_free() fix races between __d_instantiate() and checks of dentry flags
2014-05-06Warning in scanf string typingToralf Förster
This fixes a warning about the mismatch of types between the declared unsigned and integer. Signed-off-by: Toralf Förster <toralf.foerster@gmx.de>
2014-05-06nick kvfree() from apparmorAl Viro
too many places open-code it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-05device_cgroup: check if exception removal is allowedAristeu Rozanski
[PATCH v3 1/2] device_cgroup: check if exception removal is allowed When the device cgroup hierarchy was introduced in bd2953ebbb53 - devcg: propagate local changes down the hierarchy a specific case was overlooked. Consider the hierarchy bellow: A default policy: ALLOW, exceptions will deny access \ B default policy: ALLOW, exceptions will deny access There's no need to verify when an new exception is added to B because in this case exceptions will deny access to further devices, which is always fine. Hierarchy in device cgroup only makes sure B won't have more access than A. But when an exception is removed (by writing devices.allow), it isn't checked if the user is in fact removing an inherited exception from A, thus giving more access to B. Example: # echo 'a' >A/devices.allow # echo 'c 1:3 rw' >A/devices.deny # echo $$ >A/B/tasks # echo >/dev/null -bash: /dev/null: Operation not permitted # echo 'c 1:3 w' >A/B/devices.allow # echo >/dev/null # This shouldn't be allowed and this patch fixes it by making sure to never allow exceptions in this case to be removed if the exception is partially or fully present on the parent. v3: missing '*' in function description v2: improved log message and formatting fixes Cc: cgroups@vger.kernel.org Cc: Li Zefan <lizefan@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-05-04device_cgroup: fix the comment format for recently added functionsAristeu Rozanski
Moving more extensive explanations to the end of the comment. Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-05-01selinux: Report permissive mode in avc: denied messages.Stephen Smalley
We cannot presently tell from an avc: denied message whether access was in fact denied or was allowed due to global or per-domain permissive mode. Add a permissive= field to the avc message to reflect this information. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-04-30Smack: Label cgroup files for systemdCasey Schaufler
The cgroup filesystem isn't ready for an LSM to properly use extented attributes. This patch makes files created in the cgroup filesystem usable by a system running Smack and systemd. Targeted for git://git.gitorious.org/smack-next/kernel.git Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2014-04-23Smack: Verify read access on file open - v3Casey Schaufler
Smack believes that many of the operatons that can be performed on an open file descriptor are read operations. The fstat and lseek system calls are examples. An implication of this is that files shouldn't be open if the task doesn't have read access even if it has write access and the file is being opened write only. Targeted for git://git.gitorious.org/smack-next/kernel.git Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2014-04-22audit: add netlink audit protocol bind to check capabilities on multicast joinRichard Guy Briggs
Register a netlink per-protocol bind fuction for audit to check userspace process capabilities before allowing a multicast group connection. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-22locks: rename file-private locks to "open file description locks"Jeff Layton
File-private locks have been merged into Linux for v3.15, and *now* people are commenting that the name and macro definitions for the new file-private locks suck. ...and I can't even disagree. The names and command macros do suck. We're going to have to live with these for a long time, so it's important that we be happy with the names before we're stuck with them. The consensus on the lists so far is that they should be rechristened as "open file description locks". The name isn't a big deal for the kernel, but the command macros are not visually distinct enough from the traditional POSIX lock macros. The glibc and documentation folks are recommending that we change them to look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a programmer will typo one of the commands wrong, and also makes it easier to spot this difference when reading code. This patch makes the following changes that I think are necessary before v3.15 ships: 1) rename the command macros to their new names. These end up in the uapi headers and so are part of the external-facing API. It turns out that glibc doesn't actually use the fcntl.h uapi header, but it's hard to be sure that something else won't. Changing it now is safest. 2) make the the /proc/locks output display these as type "OFDLCK" Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Carlos O'Donell <carlos@redhat.com> Cc: Stefan Metzmacher <metze@samba.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Frank Filz <ffilzlnx@mindspring.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jeff Layton <jlayton@redhat.com>