asmadeus/linux.git - The linux kernel

Age	Commit message (Collapse)	Author
2013-11-21	Merge branch 'for-linus2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "In this patchset, we finally get an SELinux update, with Paul Moore taking over as maintainer of that code. Also a significant update for the Keys subsystem, as well as maintenance updates to Smack, IMA, TPM, and Apparmor" and since I wanted to know more about the updates to key handling, here's the explanation from David Howells on that: "Okay. There are a number of separate bits. I'll go over the big bits and the odd important other bit, most of the smaller bits are just fixes and cleanups. If you want the small bits accounting for, I can do that too. (1) Keyring capacity expansion. KEYS: Consolidate the concept of an 'index key' for key access KEYS: Introduce a search context structure KEYS: Search for auth-key by name rather than target key ID Add a generic associative array implementation. KEYS: Expand the capacity of a keyring Several of the patches are providing an expansion of the capacity of a keyring. Currently, the maximum size of a keyring payload is one page. Subtract a small header and then divide up into pointers, that only gives you ~500 pointers on an x86_64 box. However, since the NFS idmapper uses a keyring to store ID mapping data, that has proven to be insufficient to the cause. Whatever data structure I use to handle the keyring payload, it can only store pointers to keys, not the keys themselves because several keyrings may point to a single key. This precludes inserting, say, and rb_node struct into the key struct for this purpose. I could make an rbtree of records such that each record has an rb_node and a key pointer, but that would use four words of space per key stored in the keyring. It would, however, be able to use much existing code. I selected instead a non-rebalancing radix-tree type approach as that could have a better space-used/key-pointer ratio. I could have used the radix tree implementation that we already have and insert keys into it by their serial numbers, but that means any sort of search must iterate over the whole radix tree. Further, its nodes are a bit on the capacious side for what I want - especially given that key serial numbers are randomly allocated, thus leaving a lot of empty space in the tree. So what I have is an associative array that internally is a radix-tree with 16 pointers per node where the index key is constructed from the key type pointer and the key description. This means that an exact lookup by type+description is very fast as this tells us how to navigate directly to the target key. I made the data structure general in lib/assoc_array.c as far as it is concerned, its index key is just a sequence of bits that leads to a pointer. It's possible that someone else will be able to make use of it also. FS-Cache might, for example. (2) Mark keys as 'trusted' and keyrings as 'trusted only'. KEYS: verify a certificate is signed by a 'trusted' key KEYS: Make the system 'trusted' keyring viewable by userspace KEYS: Add a 'trusted' flag and a 'trusted only' flag KEYS: Separate the kernel signature checking keyring from module signing These patches allow keys carrying asymmetric public keys to be marked as being 'trusted' and allow keyrings to be marked as only permitting the addition or linkage of trusted keys. Keys loaded from hardware during kernel boot or compiled into the kernel during build are marked as being trusted automatically. New keys can be loaded at runtime with add_key(). They are checked against the system keyring contents and if their signatures can be validated with keys that are already marked trusted, then they are marked trusted also and can thus be added into the master keyring. Patches from Mimi Zohar make this usable with the IMA keyrings also. (3) Remove the date checks on the key used to validate a module signature. X.509: Remove certificate date checks It's not reasonable to reject a signature just because the key that it was generated with is no longer valid datewise - especially if the kernel hasn't yet managed to set the system clock when the first module is loaded - so just remove those checks. (4) Make it simpler to deal with additional X.509 being loaded into the kernel. KEYS: Load .x509 files into kernel keyring KEYS: Have make canonicalise the paths of the X.509 certs better to deduplicate The builder of the kernel now just places files with the extension ".x509" into the kernel source or build trees and they're concatenated by the kernel build and stuffed into the appropriate section. (5) Add support for userspace kerberos to use keyrings. KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches KEYS: Implement a big key type that can save to tmpfs Fedora went to, by default, storing kerberos tickets and tokens in tmpfs. We looked at storing it in keyrings instead as that confers certain advantages such as tickets being automatically deleted after a certain amount of time and the ability for the kernel to get at these tokens more easily. To make this work, two things were needed: (a) A way for the tickets to persist beyond the lifetime of all a user's sessions so that cron-driven processes can still use them. The problem is that a user's session keyrings are deleted when the session that spawned them logs out and the user's user keyring is deleted when the UID is deleted (typically when the last log out happens), so neither of these places is suitable. I've added a system keyring into which a 'persistent' keyring is created for each UID on request. Each time a user requests their persistent keyring, the expiry time on it is set anew. If the user doesn't ask for it for, say, three days, the keyring is automatically expired and garbage collected using the existing gc. All the kerberos tokens it held are then also gc'd. (b) A key type that can hold really big tickets (up to 1MB in size). The problem is that Active Directory can return huge tickets with lots of auxiliary data attached. We don't, however, want to eat up huge tracts of unswappable kernel space for this, so if the ticket is greater than a certain size, we create a swappable shmem file and dump the contents in there and just live with the fact we then have an inode and a dentry overhead. If the ticket is smaller than that, we slap it in a kmalloc()'d buffer" 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (121 commits) KEYS: Fix keyring content gc scanner KEYS: Fix error handling in big_key instantiation KEYS: Fix UID check in keyctl_get_persistent() KEYS: The RSA public key algorithm needs to select MPILIB ima: define '_ima' as a builtin 'trusted' keyring ima: extend the measurement list to include the file signature kernel/system_certificate.S: use real contents instead of macro GLOBAL() KEYS: fix error return code in big_key_instantiate() KEYS: Fix keyring quota misaccounting on key replacement and unlink KEYS: Fix a race between negating a key and reading the error set KEYS: Make BIG_KEYS boolean apparmor: remove the "task" arg from may_change_ptraced_domain() apparmor: remove parent task info from audit logging apparmor: remove tsk field from the apparmor_audit_struct apparmor: fix capability to not use the current task, during reporting Smack: Ptrace access check mode ima: provide hash algo info in the xattr ima: enable support for larger default filedata hash algorithms ima: define kernel parameter 'ima_template=' to change configured default ima: add Kconfig default measurement list template ...
2013-11-21	Merge git://git.infradead.org/users/eparis/audit	Linus Torvalds
	Pull audit updates from Eric Paris: "Nothing amazing. Formatting, small bug fixes, couple of fixes where we didn't get records due to some old VFS changes, and a change to how we collect execve info..." Fixed conflict in fs/exec.c as per Eric and linux-next. * git://git.infradead.org/users/eparis/audit: (28 commits) audit: fix type of sessionid in audit_set_loginuid() audit: call audit_bprm() only once to add AUDIT_EXECVE information audit: move audit_aux_data_execve contents into audit_context union audit: remove unused envc member of audit_aux_data_execve audit: Kill the unused struct audit_aux_data_capset audit: do not reject all AUDIT_INODE filter types audit: suppress stock memalloc failure warnings since already managed audit: log the audit_names record type audit: add child record before the create to handle case where create fails audit: use given values in tty_audit enable api audit: use nlmsg_len() to get message payload length audit: use memset instead of trying to initialize field by field audit: fix info leak in AUDIT_GET requests audit: update AUDIT_INODE filter rule to comparator function audit: audit feature to set loginuid immutable audit: audit feature to only allow unsetting the loginuid audit: allow unsetting the loginuid (with priv) audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE audit: loginuid functions coding style selinux: apply selinux checks on new audit message types ...
2013-11-05	selinux: apply selinux checks on new audit message types	Eric Paris
	We use the read check to get the feature set (like AUDIT_GET) and the write check to set the features (like AUDIT_SET). Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-10-23	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller
	Conflicts: drivers/net/usb/qmi_wwan.c include/net/dst.h Trivial merge conflicts, both were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-22	Merge branch 'master' of git://git.infradead.org/users/pcmoore/selinux into ↵	James Morris
	ra-next
2013-10-14	netfilter: pass hook ops to hookfn	Patrick McHardy
	Pass the hook ops to the hookfn to allow for generic hook functions. This change is required by nf_tables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-04	selinux: remove 'flags' parameter from avc_audit()	Linus Torvalds
	Now avc_audit() has no more users with that parameter. Remove it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-04	selinux: avc_has_perm_flags has no more users	Linus Torvalds
	.. so get rid of it. The only indirect users were all the avc_has_perm() callers which just expanded to have a zero flags argument. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-10-04	selinux: remove 'flags' parameter from inode_has_perm	Linus Torvalds
	Every single user passes in '0'. I think we had non-zero users back in some stone age when selinux_inode_permission() was implemented in terms of inode_has_perm(), but that complicated case got split up into a totally separate code-path so that we could optimize the much simpler special cases. See commit 2e33405785d3 ("SELinux: delay initialization of audit data in selinux_inode_permission") for example. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-30	net ipv4: Convert ipv4.ip_local_port_range to be per netns v3	Eric W. Biederman
	- Move sysctl_local_ports from a global variable into struct netns_ipv4. - Modify inet_get_local_port_range to take a struct net, and update all of the callers. - Move the initialization of sysctl_local_ports into sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c v2: - Ensure indentation used tabs - Fixed ip.h so it applies cleanly to todays net-next v3: - Compile fixes of strange callers of inet_get_local_port_range. This patch now successfully passes an allmodconfig build. Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range Originally-by: Samya <samya@twitter.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26	selinux: correct locking in selinux_netlbl_socket_connect)	Paul Moore
	The SELinux/NetLabel glue code has a locking bug that affects systems with NetLabel enabled, see the kernel error message below. This patch corrects this problem by converting the bottom half socket lock to a more conventional, and correct for this call-path, lock_sock() call. =============================== [ INFO: suspicious RCU usage. ] 3.11.0-rc3+ #19 Not tainted ------------------------------- net/ipv4/cipso_ipv4.c:1928 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by ping/731: #0: (slock-AF_INET/1){+.-...}, at: [...] selinux_netlbl_socket_connect #1: (rcu_read_lock){.+.+..}, at: [<...>] netlbl_conn_setattr stack backtrace: CPU: 1 PID: 731 Comm: ping Not tainted 3.11.0-rc3+ #19 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 0000000000000001 ffff88006f659d28 ffffffff81726b6a ffff88003732c500 ffff88006f659d58 ffffffff810e4457 ffff88006b845a00 0000000000000000 000000000000000c ffff880075aa2f50 ffff88006f659d90 ffffffff8169bec7 Call Trace: [<ffffffff81726b6a>] dump_stack+0x54/0x74 [<ffffffff810e4457>] lockdep_rcu_suspicious+0xe7/0x120 [<ffffffff8169bec7>] cipso_v4_sock_setattr+0x187/0x1a0 [<ffffffff8170f317>] netlbl_conn_setattr+0x187/0x190 [<ffffffff8170f195>] ? netlbl_conn_setattr+0x5/0x190 [<ffffffff8131ac9e>] selinux_netlbl_socket_connect+0xae/0xc0 [<ffffffff81303025>] selinux_socket_connect+0x135/0x170 [<ffffffff8119d127>] ? might_fault+0x57/0xb0 [<ffffffff812fb146>] security_socket_connect+0x16/0x20 [<ffffffff815d3ad3>] SYSC_connect+0x73/0x130 [<ffffffff81739a85>] ? sysret_check+0x22/0x5d [<ffffffff810e5e2d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [<ffffffff81373d4e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff815d52be>] SyS_connect+0xe/0x10 [<ffffffff81739a59>] system_call_fastpath+0x16/0x1b Cc: stable@vger.kernel.org Signed-off-by: Paul Moore <pmoore@redhat.com>
2013-09-26	selinux: Use kmemdup instead of kmalloc + memcpy	Duan Jiong
	Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2013-09-18	Merge git://git.infradead.org/users/eparis/selinux	Paul Moore
	Conflicts: security/selinux/hooks.c Pull Eric's existing SELinux tree as there are a number of patches in there that are not yet upstream. There was some minor fixup needed to resolve a conflict in security/selinux/hooks.c:selinux_set_mnt_opts() between the labeled NFS patches and Eric's security_fs_use() simplification patch.
2013-09-07	Merge branch 'next' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Nothing major for this kernel, just maintenance updates" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (21 commits) apparmor: add the ability to report a sha1 hash of loaded policy apparmor: export set of capabilities supported by the apparmor module apparmor: add the profile introspection file to interface apparmor: add an optional profile attachment string for profiles apparmor: add interface files for profiles and namespaces apparmor: allow setting any profile into the unconfined state apparmor: make free_profile available outside of policy.c apparmor: rework namespace free path apparmor: update how unconfined is handled apparmor: change how profile replacement update is done apparmor: convert profile lists to RCU based locking apparmor: provide base for multiple profiles to be replaced at once apparmor: add a features/policy dir to interface apparmor: enable users to query whether apparmor is enabled apparmor: remove minimum size check for vmalloc() Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytes Smack: network label match fix security: smack: add a hash table to quicken smk_find_entry() security: smack: fix memleak in smk_write_rules_list() xattr: Constify ->name member of "struct xattr". ...
2013-08-28	Revert "SELinux: do not handle seclabel as a special flag"	Eric Paris
	This reverts commit 308ab70c465d97cf7e3168961dfd365535de21a6. It breaks my FC6 test box. /dev/pts is not mounted. dmesg says SELinux: mount invalid. Same superblock, different security settings for (dev devpts, type devpts) Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-08-28	selinux: consider filesystem subtype in policies	Anand Avati
	Not considering sub filesystem has the following limitation. Support for SELinux in FUSE is dependent on the particular userspace filesystem, which is identified by the subtype. For e.g, GlusterFS, a FUSE based filesystem supports SELinux (by mounting and processing FUSE requests in different threads, avoiding the mount time deadlock), whereas other FUSE based filesystems (identified by a different subtype) have the mount time deadlock. By considering the subtype of the filesytem in the SELinux policies, allows us to specify a filesystem subtype, in the following way: fs_use_xattr fuse.glusterfs gen_context(system_u:object_r:fs_t,s0); This way not all FUSE filesystems are put in the same bucket and subjected to the limitations of the other subtypes. Signed-off-by: Anand Avati <avati@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-31	net: split rt_genid for ipv4 and ipv6	fan.du
	Current net name space has only one genid for both IPv4 and IPv6, it has below drawbacks: - Add/delete an IPv4 address will invalidate all IPv6 routing table entries. - Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table entries even when the policy is only applied for one address family. Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6 separately in a fine granularity. Signed-off-by: Fan Du <fan.du@windriver.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-25	Add SELinux policy capability for always checking packet and peer classes.	Chris PeBenito
	Currently the packet class in SELinux is not checked if there are no SECMARK rules in the security or mangle netfilter tables. Some systems prefer that packets are always checked, for example, to protect the system should the netfilter rules fail to load or if the nefilter rules were maliciously flushed. Add the always_check_network policy capability which, when enabled, treats SECMARK as enabled, even if there are no netfilter SECMARK rules and treats peer labeling as enabled, even if there is no Netlabel or labeled IPSEC configuration. Includes definition of "redhat1" SELinux policy capability, which exists in the SELinux userpace library, to keep ordering correct. The SELinux userpace portion of this was merged last year, but this kernel change fell on the floor. Signed-off-by: Chris PeBenito <cpebenito@tresys.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	selinux: fix problems in netnode when BUG() is compiled out	Paul Moore
	When the BUG() macro is disabled at compile time it can cause some problems in the SELinux netnode code: invalid return codes and uninitialized variables. This patch fixes this by making sure we take some corrective action after the BUG() macro. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	SELinux: use a helper function to determine seclabel	Eric Paris
	Use a helper to determine if a superblock should have the seclabel flag rather than doing it in the function. I'm going to use this in the security server as well. Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	SELinux: pass a superblock to security_fs_use	Eric Paris
	Rather than passing pointers to memory locations, strings, and other stuff just give up on the separation and give security_fs_use the superblock. It just makes the code easier to read (even if not easier to reuse on some other OS) Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	SELinux: do not handle seclabel as a special flag	Eric Paris
	Instead of having special code around the 'non-mount' seclabel mount option just handle it like the mount options. Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	SELinux: change sbsec->behavior to short	Eric Paris
	We only have 6 options, so char is good enough, but use a short as that packs nicely. This shrinks the superblock_security_struct just a little bit. Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	SELinux: renumber the superblock options	Eric Paris
	Just to make it clear that we have mount time options and flags, separate them. Since I decided to move the non-mount options above above 0x10, we need a short instead of a char. (x86 padding says this takes up no additional space as we have a 3byte whole in the structure) Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	SELinux: do all flags twiddling in one place	Eric Paris
	Currently we set the initialize and seclabel flag in one place. Do some unrelated printk then we unset the seclabel flag. Eww. Instead do the flag twiddling in one place in the code not seperated by unrelated printk. Also don't set and unset the seclabel flag. Only set it if we need to. Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	SELinux: rename SE_SBLABELSUPP to SBLABEL_MNT	Eric Paris
	Just a flag rename as we prepare to make it not so special. Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	SELinux: use define for number of bits in the mnt flags mask	Eric Paris
	We had this random hard coded value of '8' in the code (I put it there) for the number of bits to check for mount options. This is stupid. Instead use the #define we already have which tells us the number of mount options. Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	SELinux: make it harder to get the number of mnt opts wrong	Eric Paris
	Instead of just hard coding a value, use the enum to out benefit. Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	SELinux: remove crazy contortions around proc	Eric Paris
	We check if the fsname is proc and if so set the proc superblock security struct flag. We then check if the flag is set and use the string 'proc' for the fsname instead of just using the fsname. What's the point? It's always proc... Get rid of the useless conditional. Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	SELinux: fix selinuxfs policy file on big endian systems	Eric Paris
	The /sys/fs/selinux/policy file is not valid on big endian systems like ppc64 or s390. Let's see why: static int hashtab_cnt(void key, void data, void ptr) { int cnt = ptr; cnt = cnt + 1; return 0; } static int range_write(struct policydb p, void fp) { size_t nel; [...] /* count the number of entries in the hashtab */ nel = 0; rc = hashtab_map(p->range_tr, hashtab_cnt, &nel); if (rc) return rc; buf[0] = cpu_to_le32(nel); rc = put_entry(buf, sizeof(u32), 1, fp); So size_t is 64 bits. But then we pass a pointer to it as we do to hashtab_cnt. hashtab_cnt thinks it is a 32 bit int and only deals with the first 4 bytes. On x86_64 which is little endian, those first 4 bytes and the least significant, so this works out fine. On ppc64/s390 those first 4 bytes of memory are the high order bits. So at the end of the call to hashtab_map nel has a HUGE number. But the least significant 32 bits are all 0's. We then pass that 64 bit number to cpu_to_le32() which happily truncates it to a 32 bit number and does endian swapping. But the low 32 bits are all 0's. So no matter how many entries are in the hashtab, big endian systems always say there are 0 entries because I screwed up the counting. The fix is easy. Use a 32 bit int, as the hashtab_cnt expects, for nel. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com>
2013-07-25	SELinux: Enable setting security contexts on rootfs inodes.	Stephen Smalley
	rootfs (ramfs) can support setting of security contexts by userspace due to the vfs fallback behavior of calling the security module to set the in-core inode state for security.* attributes when the filesystem does not provide an xattr handler. No xattr handler required as the inodes are pinned in memory and have no backing store. This is useful in allowing early userspace to label individual files within a rootfs while still providing a policy-defined default via genfs. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	SELinux: Increase ebitmap_node size for 64-bit configuration	Waiman Long
	Currently, the ebitmap_node structure has a fixed size of 32 bytes. On a 32-bit system, the overhead is 8 bytes, leaving 24 bytes for being used as bitmaps. The overhead ratio is 1/4. On a 64-bit system, the overhead is 16 bytes. Therefore, only 16 bytes are left for bitmap purpose and the overhead ratio is 1/2. With a 3.8.2 kernel, a boot-up operation will cause the ebitmap_get_bit() function to be called about 9 million times. The average number of ebitmap_node traversal is about 3.7. This patch increases the size of the ebitmap_node structure to 64 bytes for 64-bit system to keep the overhead ratio at 1/4. This may also improve performance a little bit by making node to node traversal less frequent (< 2) as more bits are available in each node. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	SELinux: Reduce overhead of mls_level_isvalid() function call	Waiman Long
	While running the high_systime workload of the AIM7 benchmark on a 2-socket 12-core Westmere x86-64 machine running 3.10-rc4 kernel (with HT on), it was found that a pretty sizable amount of time was spent in the SELinux code. Below was the perf trace of the "perf record -a -s" of a test run at 1500 users: 5.04% ls [kernel.kallsyms] [k] ebitmap_get_bit 1.96% ls [kernel.kallsyms] [k] mls_level_isvalid 1.95% ls [kernel.kallsyms] [k] find_next_bit The ebitmap_get_bit() was the hottest function in the perf-report output. Both the ebitmap_get_bit() and find_next_bit() functions were, in fact, called by mls_level_isvalid(). As a result, the mls_level_isvalid() call consumed 8.95% of the total CPU time of all the 24 virtual CPUs which is quite a lot. The majority of the mls_level_isvalid() function invocations come from the socket creation system call. Looking at the mls_level_isvalid() function, it is checking to see if all the bits set in one of the ebitmap structure are also set in another one as well as the highest set bit is no bigger than the one specified by the given policydb data structure. It is doing it in a bit-by-bit manner. So if the ebitmap structure has many bits set, the iteration loop will be done many times. The current code can be rewritten to use a similar algorithm as the ebitmap_contains() function with an additional check for the highest set bit. The ebitmap_contains() function was extended to cover an optional additional check for the highest set bit, and the mls_level_isvalid() function was modified to call ebitmap_contains(). With that change, the perf trace showed that the used CPU time drop down to just 0.08% (ebitmap_contains + mls_level_isvalid) of the total which is about 100X less than before. 0.07% ls [kernel.kallsyms] [k] ebitmap_contains 0.05% ls [kernel.kallsyms] [k] ebitmap_get_bit 0.01% ls [kernel.kallsyms] [k] mls_level_isvalid 0.01% ls [kernel.kallsyms] [k] find_next_bit The remaining ebitmap_get_bit() and find_next_bit() functions calls are made by other kernel routines as the new mls_level_isvalid() function will not call them anymore. This patch also improves the high_systime AIM7 benchmark result, though the improvement is not as impressive as is suggested by the reduction in CPU time spent in the ebitmap functions. The table below shows the performance change on the 2-socket x86-64 system (with HT on) mentioned above. +--------------+---------------+----------------+-----------------+ \| Workload \| mean % change \| mean % change \| mean % change \| \| \| 10-100 users \| 200-1000 users \| 1100-2000 users \| +--------------+---------------+----------------+-----------------+ \| high_systime \| +0.1% \| +0.9% \| +2.6% \| +--------------+---------------+----------------+-----------------+ Signed-off-by: Waiman Long <Waiman.Long@hp.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	selinux: remove the BUG_ON() from selinux_skb_xfrm_sid()	Paul Moore
	Remove the BUG_ON() from selinux_skb_xfrm_sid() and propogate the error code up to the caller. Also check the return values in the only caller function, selinux_skb_peerlbl_sid(). Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	selinux: cleanup the XFRM header	Paul Moore
	Remove the unused get_sock_isec() function and do some formatting fixes. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	selinux: cleanup selinux_xfrm_decode_session()	Paul Moore
	Some basic simplification. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	selinux: cleanup some comment and whitespace issues in the XFRM code	Paul Moore
	Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	selinux: cleanup selinux_xfrm_sock_rcv_skb() and selinux_xfrm_postroute_last()	Paul Moore
	Some basic simplification and comment reformatting. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	selinux: cleanup selinux_xfrm_policy_lookup() and ↵	Paul Moore
	selinux_xfrm_state_pol_flow_match() Do some basic simplification and comment reformatting. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	selinux: cleanup and consolidate the XFRM alloc/clone/delete/free code	Paul Moore
	The SELinux labeled IPsec code state management functions have been long neglected and could use some cleanup and consolidation. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	lsm: split the xfrm_state_alloc_security() hook implementation	Paul Moore
	The xfrm_state_alloc_security() LSM hook implementation is really a multiplexed hook with two different behaviors depending on the arguments passed to it by the caller. This patch splits the LSM hook implementation into two new hook implementations, which match the LSM hooks in the rest of the kernel: * xfrm_state_alloc * xfrm_state_alloc_acquire Also included in this patch are the necessary changes to the SELinux code; no other LSMs are affected. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2013-07-25	xattr: Constify ->name member of "struct xattr".	Tetsuo Handa
	Since everybody sets kstrdup()ed constant string to "struct xattr"->name but nobody modifies "struct xattr"->name , we can omit kstrdup() and its failure checking by constifying ->name member of "struct xattr". Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Joel Becker <jlbec@evilplan.org> [ocfs2] Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Reviewed-by: Paul Moore <paul@paul-moore.com> Tested-by: Paul Moore <paul@paul-moore.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-07-09	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next	Linus Torvalds
	Pull networking updates from David Miller: "This is a re-do of the net-next pull request for the current merge window. The only difference from the one I made the other day is that this has Eliezer's interface renames and the timeout handling changes made based upon your feedback, as well as a few bug fixes that have trickeled in. Highlights: 1) Low latency device polling, eliminating the cost of interrupt handling and context switches. Allows direct polling of a network device from socket operations, such as recvmsg() and poll(). Currently ixgbe, mlx4, and bnx2x support this feature. Full high level description, performance numbers, and design in commit 0a4db187a999 ("Merge branch 'll_poll'") From Eliezer Tamir. 2) With the routing cache removed, ip_check_mc_rcu() gets exercised more than ever before in the case where we have lots of multicast addresses. Use a hash table instead of a simple linked list, from Eric Dumazet. 3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski, Marek Puzyniak, Michal Kazior, and Sujith Manoharan. 4) Support reporting the TUN device persist flag to userspace, from Pavel Emelyanov. 5) Allow controlling network device VF link state using netlink, from Rony Efraim. 6) Support GRE tunneling in openvswitch, from Pravin B Shelar. 7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from Daniel Borkmann and Eric Dumazet. 8) Allow controlling of TCP quickack behavior on a per-route basis, from Cong Wang. 9) Several bug fixes and improvements to vxlan from Stephen Hemminger, Pravin B Shelar, and Mike Rapoport. In particular, support receiving on multiple UDP ports. 10) Major cleanups, particular in the area of debugging and cookie lifetime handline, to the SCTP protocol code. From Daniel Borkmann. 11) Allow packets to cross network namespaces when traversing tunnel devices. From Nicolas Dichtel. 12) Allow monitoring netlink traffic via AF_PACKET sockets, in a manner akin to how we monitor real network traffic via ptype_all. From Daniel Borkmann. 13) Several bug fixes and improvements for the new alx device driver, from Johannes Berg. 14) Fix scalability issues in the netem packet scheduler's time queue, by using an rbtree. From Eric Dumazet. 15) Several bug fixes in TCP loss recovery handling, from Yuchung Cheng. 16) Add support for GSO segmentation of MPLS packets, from Simon Horman. 17) Make network notifiers have a real data type for the opaque pointer that's passed into them. Use this to properly handle network device flag changes in arp_netdev_event(). From Jiri Pirko and Timo Teräs. 18) Convert several drivers over to module_pci_driver(), from Peter Huewe. 19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a O(1) calculation instead. From Eric Dumazet. 20) Support setting of explicit tunnel peer addresses in ipv6, just like ipv4. From Nicolas Dichtel. 21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet. 22) Prevent a single high rate flow from overruning an individual cpu during RX packet processing via selective flow shedding. From Willem de Bruijn. 23) Don't use spinlocks in TCP md5 signing fast paths, from Eric Dumazet. 24) Don't just drop GSO packets which are above the TBF scheduler's burst limit, chop them up so they are in-bounds instead. Also from Eric Dumazet. 25) VLAN offloads are missed when configured on top of a bridge, fix from Vlad Yasevich. 26) Support IPV6 in ping sockets. From Lorenzo Colitti. 27) Receive flow steering targets should be updated at poll() time too, from David Majnemer. 28) Fix several corner case regressions in PMTU/redirect handling due to the routing cache removal, from Timo Teräs. 29) We have to be mindful of ipv4 mapped ipv6 sockets in upd_v6_push_pending_frames(). From Hannes Frederic Sowa. 30) Fix L2TP sequence number handling bugs, from James Chapman." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits) drivers/net: caif: fix wrong rtnl_is_locked() usage drivers/net: enic: release rtnl_lock on error-path vhost-net: fix use-after-free in vhost_net_flush net: mv643xx_eth: do not use port number as platform device id net: sctp: confirm route during forward progress virtio_net: fix race in RX VQ processing virtio: support unlocked queue poll net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit Documentation: Fix references to defunct linux-net@vger.kernel.org net/fs: change busy poll time accounting net: rename low latency sockets functions to busy poll bridge: fix some kernel warning in multicast timer sfc: Fix memory leak when discarding scattered packets sit: fix tunnel update via netlink dt:net:stmmac: Add dt specific phy reset callback support. dt:net:stmmac: Add support to dwmac version 3.610 and 3.710 dt:net:stmmac: Allocate platform data only if its NULL. net:stmmac: fix memleak in the open method ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available net: ipv6: fix wrong ping_v6_sendmsg return value ...
2013-07-09	Merge tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs	Linus Torvalds
	Pull NFS client updates from Trond Myklebust: "Feature highlights include: - Add basic client support for NFSv4.2 - Add basic client support for Labeled NFS (selinux for NFSv4.2) - Fix the use of credentials in NFSv4.1 stateful operations, and add support for NFSv4.1 state protection. Bugfix highlights: - Fix another NFSv4 open state recovery race - Fix an NFSv4.1 back channel session regression - Various rpc_pipefs races - Fix another issue with NFSv3 auth negotiation Please note that Labeled NFS does require some additional support from the security subsystem. The relevant changesets have all been reviewed and acked by James Morris." * tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (54 commits) NFS: Set NFS_CS_MIGRATION for NFSv4 mounts NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs nfs: have NFSv3 try server-specified auth flavors in turn nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it nfs: move server_authlist into nfs_try_mount_request nfs: refactor "need_mount" code out of nfs_try_mount SUNRPC: PipeFS MOUNT notification optimization for dying clients SUNRPC: split client creation routine into setup and registration SUNRPC: fix races on PipeFS UMOUNT notifications SUNRPC: fix races on PipeFS MOUNT notifications NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize NFS: Improve legacy idmapping fallback NFSv4.1 end back channel session draining NFS: Apply v4.1 capabilities to v4.2 NFSv4.1: Clean up layout segment comparison helper names NFSv4.1: layout segment comparison helpers should take 'const' parameters NFSv4: Move the DNS resolver into the NFSv4 module rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set ...
2013-06-29	SELinux: Institute file_path_has_perm()	David Howells
	Create a file_path_has_perm() function that is like path_has_perm() but instead takes a file struct that is the source of both the path and the inode (rather than getting the inode from the dentry in the path). This is then used where appropriate. This will be useful for situations like unionmount where it will be possible to have an apparently-negative dentry (eg. a fallthrough) that is open with the file struct pointing to an inode on the lower fs. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-08	NFS: Client implementation of Labeled-NFS	David Quigley
	This patch implements the client transport and handling support for labeled NFS. The patch adds two functions to encode and decode the security label recommended attribute which makes use of the LSM hooks added earlier. It also adds code to grab the label from the file attribute structures and encode the label to be sent back to the server. Acked-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08	SELinux: Add new labeling type native labels	David Quigley
	There currently doesn't exist a labeling type that is adequate for use with labeled NFS. Since NFS doesn't really support xattrs we can't use the use xattr labeling behavior. For this we developed a new labeling type. The native labeling type is used solely by NFS to ensure NFS inodes are labeled at runtime by the NFS code instead of relying on the SELinux security server on the client end. Acked-by: Eric Paris <eparis@redhat.com> Acked-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08	LSM: Add flags field to security_sb_set_mnt_opts for in kernel mount data.	David Quigley
	There is no way to differentiate if a text mount option is passed from user space or the kernel. A flags field is being added to the security_sb_set_mnt_opts hook to allow for in kernel security flags to be sent to the LSM for processing in addition to the text options received from mount. This patch also updated existing code to fix compilation errors. Acked-by: Eric Paris <eparis@redhat.com> Acked-by: James Morris <james.l.morris@oracle.com> Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08	Security: Add Hook to test if the particular xattr is part of a MAC model.	David Quigley
	The interface to request security labels from user space is the xattr interface. When requesting the security label from an NFS server it is important to make sure the requested xattr actually is a MAC label. This allows us to make sure that we get the desired semantics from the attribute instead of something else such as capabilities or a time based LSM. Acked-by: Eric Paris <eparis@redhat.com> Acked-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08	Security: Add hook to calculate context based on a negative dentry.	David Quigley
	There is a time where we need to calculate a context without the inode having been created yet. To do this we take the negative dentry and calculate a context based on the process and the parent directory contexts. Acked-by: Eric Paris <eparis@redhat.com> Acked-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com> Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg> Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg> Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg> Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>