summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)Author
2008-02-06mlx4_core: For 64-bit systems, vmap() kernel queue buffersJack Morgenstein
Since kernel virtual memory is not a problem on 64-bit systems, there is no reason to use our own 2-layer page mapping scheme for large kernel queue buffers on such systems. Instead, map the page list to a single virtually contiguous buffer with vmap(), so that can we access buffer memory via direct indexing. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-06IB/mlx4: Consolidate code to get an entry from a struct mlx4_bufRoland Dreier
We use struct mlx4_buf for kernel QP, CQ and SRQ buffers, and the code to look up an entry is duplicated in get_cqe_from_buf() and the QP and SRQ versions of get_wqe(). Factor this out into mlx4_buf_offset(). This will also make it easier to switch over to using vmap() for buffers. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-05Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] make pfm_get_task work with virtual pids [IA64] honor notify_die() returning NOTIFY_STOP [IA64] remove dead code: __cpu_{down,die} from !HOTPLUG_CPU [IA64] Appoint kvm/ia64 Maintainers [IA64] ia64_set_psr should use srlz.i [IA64] Export three symbols for module use [IA64] mca style cleanup [IA64] sn_hwperf semaphore to mutex [IA64] generalize attribute of fsyscall_gtod_data [IA64] efi.c Add /* never reached */ annotation [IA64] efi.c Spelling/punctuation fixes [IA64] Make efi.c mostly fit in 80 columns [IA64] aliasing-test: fix gcc warnings on non-ia64 [IA64] Slim-down __clear_bit_unlock [IA64] Fix the order of atomic operations in restore_previous_kprobes on ia64 [IA64] constify function pointer tables [IA64] fix userspace compile error in gcc_intrin.h
2008-02-05Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6Linus Torvalds
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [S390] dcss: Initialize workqueue before using it. [S390] Remove BUILD_BUG_ON() in vmem code. [S390] sclp_tty/sclp_vt220: Fix scheduling while atomic [S390] dasd: fix panic caused by alias device offline [S390] dasd: add ifcc handling [S390] latencytop s390 support. [S390] Implement ext2_find_next_bit. [S390] Cleanup & optimize bitops. [S390] Define GENERIC_LOCKBREAK. [S390] console: allow vt220 console to be the only console [S390] Fix couple of section mismatches. [S390] Fix smp_call_function_mask semantics. [S390] Fix linker script. [S390] DEBUG_PAGEALLOC support for s390. [S390] cio: Add shutdown callback for ccwgroup. [S390] cio: Update documentation. [S390] cio: Clean up chsc response code handling. [S390] cio: make sense id procedure work with partial hardware response
2008-02-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits) [PKT_SCHED]: vlan tag match [NET]: Add if_addrlabel.h to sanitized headers. [NET] rtnetlink.c: remove no longer used functions [ICMP]: Restore pskb_pull calls in receive function [INET]: Fix accidentally broken inet(6)_hash_connect's port offset calculations. [NET]: Remove further references to net-modules.txt bluetooth rfcomm tty: destroy before tty_close() bluetooth: blacklist another Broadcom BCM2035 device drivers/bluetooth/btsdio.c: fix double-free drivers/bluetooth/bpa10x.c: fix memleak bluetooth: uninlining bluetooth: hidp_process_hid_control remove unnecessary parameter dealing tun: impossible to deassert IFF_ONE_QUEUE or IFF_NO_PI hamradio: fix dmascc section mismatch [SCTP]: Fix kernel panic while received AUTH chunk with BAD shared key identifier [SCTP]: Fix kernel panic while received AUTH chunk while enabled auth [IPV4]: Formatting fix for /proc/net/fib_trie. [IPV6]: Fix sysctl compilation error. [NET_SCHED]: Add #ifdef CONFIG_NET_EMATCH in net/sched/cls_flow.c (latest git broken build) [IPV4]: Fix compile error building without CONFIG_FS_PROC ...
2008-02-05Merge branch 'agp-patches' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6 * 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6: agp: remove flush_agp_mappings calls from new flush handling code intel-agp: introduce IS_I915 and do some cleanups.. [intel_agp] fix name for G35 chipset intel-agp: fixup resource handling in flush code. intel-agp: add new chipset ID agp: remove unnecessary pci_dev_put agp: remove uid comparison as security check fix AGP warning agp/intel: Add chipset flushing support for i8xx chipsets. intel-agp: add chipset flushing support agp: add chipset flushing support to AGP interface
2008-02-05uml: LDT mutex conversionDaniel Walker
The ldt.semaphore conforms to the new struct mutex requirments, so I converted it to use the new API and changed the name. Signed-off-by: Daniel Walker <dwalker@mvista.com> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: add back CONFIG_HZJeff Dike
avoid-overflows-in-kernel-timec.patch makes CONFIG_HZ necessary for a successful build. UML lacks a definition, so this patch adds one. It also changes the hard-wired definition of HZ to CONFIG_HZ. Note: this patch is a good idea even in the absence of hpa's time fixes. Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: get rid of syscall countersJeff Dike
Get rid of some syscall counters which haven't been useful in ages. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: customize tlb.hJeff Dike
Customize the hooks in tlb.h to optimize TLB flushing some more. Add start and end fields to tlb_gather_mmu, which are used to limit the address space range scanned when a region is unmapped. The interfaces which just free page tables, without actually changing mappings, don't need to cause a TLB flush. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: 64-bit tlb fixesJeff Dike
Some 64-bit tlb fixes - moved pmd_page_vaddr to pgtable.h since it's the same for both 2-level and 3-level page tables fixed a bogus cast on pud_page_vaddr made the address checking in update_*_range more careful Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: cover stubs with a VMAJeff Dike
Give the stubs a VMA. This allows the removal of a truly nasty kludge to make sure that mm->nr_ptes was correct in exit_mmap. The underlying problem was always that the stubs, which have ptes, and thus allocated a page table, weren't covered by a VMA. This patch fixes that by using install_special_mapping in arch_dup_mmap and activate_context to create the VMA. The stubs have to be moved, since shift_arg_pages seems to assume that the stack is the only VMA present at that point during exec, and uses vma_adjust to fiddle its VMA. However, that extends the stub VMA by the amount removed from the stack VMA. To avoid this problem, the stubs were moved to a different fixed location at the start of the address space. The init_stub_pte calls were moved from init_new_context to arch_dup_mmap because I was occasionally seeing arch_dup_mmap not being called, causing exit_mmap to die. Rather than figure out what was really happening, I decided it was cleaner to just move the calls so that there's no doubt that both the pte and VMA creation happen, no matter what. arch_exit_mmap is used to clear the stub ptes at exit time. The STUB_* constants in as-layout.h no longer depend on UM_TASK_SIZE, that that definition is removed, along with the comments complaining about gcc. Because the stubs are no longer at the top of the address space, some care is needed while flushing TLBs. update_pte_range checks for addresses in the stub range and skips them. flush_thread now issues two unmaps, one for the range before STUB_START and one for the range after STUB_END. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: clean up TASK_SIZE usageJeff Dike
Clean up the calculation and use of the usable address space size on the host. task_size is gone, replaced with TASK_SIZE, which is calculated from CONFIG_TOP_ADDR. get_kmem_end and set_task_sizes_skas are also gone. host_task_size, which refers to the entire address space usable by the UML kernel and which may be larger than the address space usable by a UML process, since that has to end on a pgdir boundary, is replaced by CONFIG_TOP_ADDR. STACK_TOP is now TASK_SIZE minus the two stub pages. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: add virt_to_pteJeff Dike
Turn um_virt_to_phys into virt_to_pte, cleaning up a horrid interface. It's also made non-static and declared in pgtable.h because it'll be needed when the stubs get a vma. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: fix page table data sizesJeff Dike
Get the sizes of various pieces of data right when using three-level page tables. pgd and pmd entries remain at 32 bits in a 32-bit compilation because page tables will remain in low memory. So, PGDIR_SHIFT, the PTRS_PER_* values, set_pud, set_pmd are conditional on 64BIT. More use of phys_t is made when there are physical memory addresses floating around. ObCheckpatchViolationJustification - the new typedef is an alternate definition of pmd_t, which I can't really live without. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: current.h cleanupJeff Dike
Tidy current-related stuff. There was a comment in current.h saying that current_thread was obsolete, so this patch turns all instances of current_thread into current_thread_info(). There's some simplifying of the result in arch/um/sys-i386/signal.c. current.h and thread_info also get style cleanups. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: style cleanupJeff Dike
Style fixes in elf-i386.h and arch/um/kernel/mem.c. update the copyright get rid of an emacs formatting comment some formatting fixes inclusion trimming whitespace fixes Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: header untanglingJeff Dike
Untangle UML headers somewhat and add some includes where they were needed explicitly, but gotten accidentally via some other header. arch/um/include/um_uaccess.h loses asm/fixmap.h because it uses no fixmap stuff and gains elf.h, because it needs FIXADDR_USER_*, and archsetjmp.h, because it needs jmp_buf. pmd_alloc_one is uninlined because it needs mm_struct, and that's inconvenient to provide in asm-um/pgtable-3level.h. elf_core_copy_fpregs is also uninlined from elf-i386.h and elf-x86_64.h, which duplicated the code anyway, to arch/um/kernel/process.c, so that the reference to current_thread doesn't pull sched.h or anything related into asm/elf.h. arch/um/sys-i386/ldt.c, arch/um/kernel/tlb.c and arch/um/kernel/skas/uaccess.c got sched.h because they dereference task_structs. Its includes of linux and asm headers got turned from "" to <>. arch/um/sys-i386/bug.c gets asm/errno.h because it needs errno constants. asm/elf-i386 gets asm/user.h because it needs user_regs_struct. asm/fixmap.h gets page.h because it needs PAGE_SIZE and PAGE_MASK and system.h for BUG_ON. asm/pgtable doesn't need sched.h. asm/processor-generic.h defined mm_segment_t, but didn't use it. So, that definition is moved to uaccess.h, which defines a bunch of mm_segment_t-related stuff. thread_info.h uses mm_segment_t, and includes uaccess.h, which causes a recursion. So, the definition is placed above the include of thread_info. in uaccess.h. thread_info.h also gets page.h because it needs PAGE_SIZE. ObCheckpatchViolationJustification - I'm not adding a typedef; I'm moving mm_segment_t from one place to another. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: move um_virt_to_physJeff Dike
This patchset makes UML build and run with three-level page tables on 32-bit hosts. This is an uncommon use case, but the code here needed fixing and cleaning up, so 32-bit three-level pages tables were tested to make sure the changes are good. Patch 1 - code movement Patch 2 - header untangling Patch 3 - style fixups in files affected so far Patch 4 - clean up use of current.h Patch 5 - fix sizes of types that are different between 2 and 3-level page tables - three-level page table support should build at this point Patch 6 - tidy (i.e. eliminate much of) the code that figures out how big the address space is Patch 7 - change um_virt_to_phys into virt_to_pte, clean its interface, and clean its (so far) one caller Patch 8 - the stub pages are covered with a VMA, allowing some nasty code to be thrown out - three-level page tables now work This patch: um_virt_to_phys only has one user, so it can be moved to the same file and made static. Its declarations in pgtable.h and ksyms.c are also gone. current_cmd was another apparent user, but it itself isn't used, so it is deleted. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05UML - Fix build in 2.6.24-rc2-mm1Jeff Dike
The earlier pgtable.h tidying patch made things a bit too tidy. Add back a header which is needed in VMALLOC_START and friend. Also add back a definition of pmd_page_vaddr, which is needed on x86_64. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: tidy pgtable.hJeff Dike
Large pieces of include/asm/pgtable.h were unused cruft. This uncovered arch/um/kernel/trap.c needing skas.h in order to get ptrace_faultinfo. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: get rid of asmlinkageJeff Dike
Get rid of asmlinkage and remove some old cruft from asm/linkage.h. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: implement get_wchanJeff Dike
Implement get_wchan - the algorithm is similar to x86. It starts with the stack pointer of the process in question and looks above that for addresses that are kernel text. The second one which isn't in the scheduler is the one that's returned. The first one is ignored because that will be UML's own context switching routine. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05uml: remove xmm checking on x86Karol Swietlicki
This patch removes some code which ran at every boot, but does not seem to do anything anymore. Please test. It works for me but mistakes can happen. Signed-off-by: Karol Swietlicki <magotari@gmail.com> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05cris: remove unused __dummy, CONST_ADDR and ADDR from bitops.hJesper Nilsson
This is very old code, it hasn't changed since 2001 and it is not used anywhere. Noticed by Clemens Koller. Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05mac68k: add nubus card definitions and a typo fixFinn Thain
Add some new card definitions and fix a typo (from Eugen Paiuc). Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05mac68k: remove dead codeFinn Thain
Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05leds: add possibility to remove leds classdevs during suspend/resumeRafael J. Wysocki
Make it possible to unregister a led classdev object in a safe way during a suspend/resume cycle. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Michael Buesch <mb@bu3sch.de> Cc: Pavel Machek <pavel@ucw.cz> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05HWRNG: add possibility to remove hwrng devices during suspend/resumeRafael J. Wysocki
Make it possible to unregister a Hardware Random Number Generator device object in a safe way during a suspend/resume cycle. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Michael Buesch <mb@bu3sch.de> Cc: Michael Buesch <mb@bu3sch.de> Cc: Pavel Machek <pavel@ucw.cz> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05Misc: Add possibility to remove misc devices during suspend/resumeRafael J. Wysocki
Make it possible to unregister a misc device object in a safe way during a suspend/resume cycle. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Michael Buesch <mb@bu3sch.de> Cc: Pavel Machek <pavel@ucw.cz> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05latency.c: use QoS infrastructureMark Gross
Replace latency.c use with pm_qos_params use. Signed-off-by: mark gross <mgross@linux.intel.com> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: Len Brown <lenb@kernel.org> Cc: Jaroslav Kysela <perex@suse.cz> Cc: Takashi Iwai <tiwai@suse.de> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05pm qos infrastructure and interfaceMark Gross
The following patch is a generalization of the latency.c implementation done by Arjan last year. It provides infrastructure for more than one parameter, and exposes a user mode interface for processes to register pm_qos expectations of processes. This interface provides a kernel and user mode interface for registering performance expectations by drivers, subsystems and user space applications on one of the parameters. Currently we have {cpu_dma_latency, network_latency, network_throughput} as the initial set of pm_qos parameters. The infrastructure exposes multiple misc device nodes one per implemented parameter. The set of parameters implement is defined by pm_qos_power_init() and pm_qos_params.h. This is done because having the available parameters being runtime configurable or changeable from a driver was seen as too easy to abuse. For each parameter a list of performance requirements is maintained along with an aggregated target value. The aggregated target value is updated with changes to the requirement list or elements of the list. Typically the aggregated target value is simply the max or min of the requirement values held in the parameter list elements. >From kernel mode the use of this interface is simple: pm_qos_add_requirement(param_id, name, target_value): Will insert a named element in the list for that identified PM_QOS parameter with the target value. Upon change to this list the new target is recomputed and any registered notifiers are called only if the target value is now different. pm_qos_update_requirement(param_id, name, new_target_value): Will search the list identified by the param_id for the named list element and then update its target value, calling the notification tree if the aggregated target is changed. with that name is already registered. pm_qos_remove_requirement(param_id, name): Will search the identified list for the named element and remove it, after removal it will update the aggregate target and call the notification tree if the target was changed as a result of removing the named requirement. >From user mode: Only processes can register a pm_qos requirement. To provide for automatic cleanup for process the interface requires the process to register its parameter requirements in the following way: To register the default pm_qos target for the specific parameter, the process must open one of /dev/[cpu_dma_latency, network_latency, network_throughput] As long as the device node is held open that process has a registered requirement on the parameter. The name of the requirement is "process_<PID>" derived from the current->pid from within the open system call. To change the requested target value the process needs to write a s32 value to the open device node. This translates to a pm_qos_update_requirement call. To remove the user mode request for a target value simply close the device node. [akpm@linux-foundation.org: fix warnings] [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: fix build again] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: mark gross <mgross@linux.intel.com> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: Len Brown <lenb@kernel.org> Cc: Jaroslav Kysela <perex@suse.cz> Cc: Takashi Iwai <tiwai@suse.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com> Cc: Adam Belay <abelay@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05make kernel_shutdown_prepare() staticAdrian Bunk
kernel_shutdown_prepare() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05alpha: fix warning by fixing flush_tlb_kernel_range()Andrew Morton
mm/vmalloc.c: In function 'unmap_kernel_range': mm/vmalloc.c:75: warning: unused variable 'start' Macros are so horrid. Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05Alpha doesn't use socketcallSamuel Thibault
Alpha doesn't use socketcall and doesn't provide __NR_socketcall. Signed-off-by: Samuel Thibault <samuel.thibault@citrix.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05alpha: atomic_add_return() should return intAndrew Morton
Prevents stuff like drivers/crypto/hifn_795x.c:2443: warning: format '%d' expects type 'int', but argument 4 has type 'long int' drivers/crypto/hifn_795x.c:2443: warning: format '%d' expects type 'int', but argument 4 has type 'long int' (at least). Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05m68knomu: remove dead config symbols from m68knomu codeJiri Olsa
remove dead config symbols from m68knommu code Signed-off-by: Jiri Olsa <olsajiri@gmail.com> Acked-by: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05frv: remove dead config symbol from FRV codeJiri Olsa
Remove dead config symbol from FRV code. Signed-off-by: Jiri Olsa <olsajiri@gmail.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05FRV: move DMA macros to scatterlist.h for consistency.Robert P. J. Day
To be consistent with other architectures, these two DMA macros should be defined in scatterlist.h as opposed to dma-mapping.h Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05Smack: Simplified Mandatory Access Control KernelCasey Schaufler
Smack is the Simplified Mandatory Access Control Kernel. Smack implements mandatory access control (MAC) using labels attached to tasks and data containers, including files, SVIPC, and other tasks. Smack is a kernel based scheme that requires an absolute minimum of application support and a very small amount of configuration data. Smack uses extended attributes and provides a set of general mount options, borrowing technics used elsewhere. Smack uses netlabel for CIPSO labeling. Smack provides a pseudo-filesystem smackfs that is used for manipulation of system Smack attributes. The patch, patches for ls and sshd, a README, a startup script, and x86 binaries for ls and sshd are also available on http://www.schaufler-ca.com Development has been done using Fedora Core 7 in a virtual machine environment and on an old Sony laptop. Smack provides mandatory access controls based on the label attached to a task and the label attached to the object it is attempting to access. Smack labels are deliberately short (1-23 characters) text strings. Single character labels using special characters are reserved for system use. The only operation applied to Smack labels is equality comparison. No wildcards or expressions, regular or otherwise, are used. Smack labels are composed of printable characters and may not include "/". A file always gets the Smack label of the task that created it. Smack defines and uses these labels: "*" - pronounced "star" "_" - pronounced "floor" "^" - pronounced "hat" "?" - pronounced "huh" The access rules enforced by Smack are, in order: 1. Any access requested by a task labeled "*" is denied. 2. A read or execute access requested by a task labeled "^" is permitted. 3. A read or execute access requested on an object labeled "_" is permitted. 4. Any access requested on an object labeled "*" is permitted. 5. Any access requested by a task on an object with the same label is permitted. 6. Any access requested that is explicitly defined in the loaded rule set is permitted. 7. Any other access is denied. Rules may be explicitly defined by writing subject,object,access triples to /smack/load. Smack rule sets can be easily defined that describe Bell&LaPadula sensitivity, Biba integrity, and a variety of interesting configurations. Smack rule sets can be modified on the fly to accommodate changes in the operating environment or even the time of day. Some practical use cases: Hierarchical levels. The less common of the two usual uses for MLS systems is to define hierarchical levels, often unclassified, confidential, secret, and so on. To set up smack to support this, these rules could be defined: C Unclass rx S C rx S Unclass rx TS S rx TS C rx TS Unclass rx A TS process can read S, C, and Unclass data, but cannot write it. An S process can read C and Unclass. Note that specifying that TS can read S and S can read C does not imply TS can read C, it has to be explicitly stated. Non-hierarchical categories. This is the more common of the usual uses for an MLS system. Since the default rule is that a subject cannot access an object with a different label no access rules are required to implement compartmentalization. A case that the Bell & LaPadula policy does not allow is demonstrated with this Smack access rule: A case that Bell&LaPadula does not allow that Smack does: ESPN ABC r ABC ESPN r On my portable video device I have two applications, one that shows ABC programming and the other ESPN programming. ESPN wants to show me sport stories that show up as news, and ABC will only provide minimal information about a sports story if ESPN is covering it. Each side can look at the other's info, neither can change the other. Neither can see what FOX is up to, which is just as well all things considered. Another case that I especially like: SatData Guard w Guard Publish w A program running with the Guard label opens a UDP socket and accepts messages sent by a program running with a SatData label. The Guard program inspects the message to ensure it is wholesome and if it is sends it to a program running with the Publish label. This program then puts the information passed in an appropriate place. Note that the Guard program cannot write to a Publish file system object because file system semanitic require read as well as write. The four cases (categories, levels, mutual read, guardbox) here are all quite real, and problems I've been asked to solve over the years. The first two are easy to do with traditonal MLS systems while the last two you can't without invoking privilege, at least for a while. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Cc: Joshua Brindle <method@manicmethod.com> Cc: Paul Moore <paul.moore@hp.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Cc: "Ahmed S. Darwish" <darwish.07@gmail.com> Cc: Andrew G. Morgan <morgan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05NetLabel: introduce a new kernel configuration API for NetLabelPaul Moore
Add a new set of configuration functions to the NetLabel/LSM API so that LSMs can perform their own configuration of the NetLabel subsystem without relying on assistance from userspace. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05capabilities: introduce per-process capability bounding setSerge E. Hallyn
The capability bounding set is a set beyond which capabilities cannot grow. Currently cap_bset is per-system. It can be manipulated through sysctl, but only init can add capabilities. Root can remove capabilities. By default it includes all caps except CAP_SETPCAP. This patch makes the bounding set per-process when file capabilities are enabled. It is inherited at fork from parent. Noone can add elements, CAP_SETPCAP is required to remove them. One example use of this is to start a safer container. For instance, until device namespaces or per-container device whitelists are introduced, it is best to take CAP_MKNOD away from a container. The bounding set will not affect pP and pE immediately. It will only affect pP' and pE' after subsequent exec()s. It also does not affect pI, and exec() does not constrain pI'. So to really start a shell with no way of regain CAP_MKNOD, you would do prctl(PR_CAPBSET_DROP, CAP_MKNOD); cap_t cap = cap_get_proc(); cap_value_t caparray[1]; caparray[0] = CAP_MKNOD; cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP); cap_set_proc(cap); cap_free(cap); The following test program will get and set the bounding set (but not pI). For instance ./bset get (lists capabilities in bset) ./bset drop cap_net_raw (starts shell with new bset) (use capset, setuid binary, or binary with file capabilities to try to increase caps) ************************************************************ cap_bound.c ************************************************************ #include <sys/prctl.h> #include <linux/capability.h> #include <sys/types.h> #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #ifndef PR_CAPBSET_READ #define PR_CAPBSET_READ 23 #endif #ifndef PR_CAPBSET_DROP #define PR_CAPBSET_DROP 24 #endif int usage(char *me) { printf("Usage: %s get\n", me); printf(" %s drop <capability>\n", me); return 1; } #define numcaps 32 char *captable[numcaps] = { "cap_chown", "cap_dac_override", "cap_dac_read_search", "cap_fowner", "cap_fsetid", "cap_kill", "cap_setgid", "cap_setuid", "cap_setpcap", "cap_linux_immutable", "cap_net_bind_service", "cap_net_broadcast", "cap_net_admin", "cap_net_raw", "cap_ipc_lock", "cap_ipc_owner", "cap_sys_module", "cap_sys_rawio", "cap_sys_chroot", "cap_sys_ptrace", "cap_sys_pacct", "cap_sys_admin", "cap_sys_boot", "cap_sys_nice", "cap_sys_resource", "cap_sys_time", "cap_sys_tty_config", "cap_mknod", "cap_lease", "cap_audit_write", "cap_audit_control", "cap_setfcap" }; int getbcap(void) { int comma=0; unsigned long i; int ret; printf("i know of %d capabilities\n", numcaps); printf("capability bounding set:"); for (i=0; i<numcaps; i++) { ret = prctl(PR_CAPBSET_READ, i); if (ret < 0) perror("prctl"); else if (ret==1) printf("%s%s", (comma++) ? ", " : " ", captable[i]); } printf("\n"); return 0; } int capdrop(char *str) { unsigned long i; int found=0; for (i=0; i<numcaps; i++) { if (strcmp(captable[i], str) == 0) { found=1; break; } } if (!found) return 1; if (prctl(PR_CAPBSET_DROP, i)) { perror("prctl"); return 1; } return 0; } int main(int argc, char *argv[]) { if (argc<2) return usage(argv[0]); if (strcmp(argv[1], "get")==0) return getbcap(); if (strcmp(argv[1], "drop")!=0 || argc<3) return usage(argv[0]); if (capdrop(argv[2])) { printf("unknown capability\n"); return 1; } return execl("/bin/bash", "/bin/bash", NULL); } ************************************************************ [serue@us.ibm.com: fix typo] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Casey Schaufler <casey@schaufler-ca.com>a Signed-off-by: "Serge E. Hallyn" <serue@us.ibm.com> Tested-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05Remove unnecessary include from include/linux/capability.hAndrew Morgan
KaiGai Kohei observed that this line in the linux header is not needed. Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Cc: KaiGai Kohei <kaigai@kaigai.gr.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05Add 64-bit capability support to the kernelAndrew Morgan
The patch supports legacy (32-bit) capability userspace, and where possible translates 32-bit capabilities to/from userspace and the VFS to 64-bit kernel space capabilities. If a capability set cannot be compressed into 32-bits for consumption by user space, the system call fails, with -ERANGE. FWIW libcap-2.00 supports this change (and earlier capability formats) http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/ [akpm@linux-foundation.org: coding-syle fixes] [akpm@linux-foundation.org: use get_task_comm()] [ezk@cs.sunysb.edu: build fix] [akpm@linux-foundation.org: do not initialise statics to 0 or NULL] [akpm@linux-foundation.org: unused var] [serue@us.ibm.com: export __cap_ symbols] Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05revert "capabilities: clean up file capability reading"Andrew Morton
Revert b68680e4731abbd78863063aaa0dca2a6d8cc723 to make way for the next patch: "Add 64-bit capability support to the kernel". We want to keep the vfs_cap_data.data[] structure, using two 'data's for 64-bit caps (and later three for 96-bit caps), whereas b68680e4731abbd78863063aaa0dca2a6d8cc723 had gotten rid of the 'data' struct made its members inline. The 64-bit caps patch keeps the stack abuse fix at get_file_caps(), which was the more important part of that patch. [akpm@linux-foundation.org: coding-style fixes] Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Andrew Morgan <morgan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05VFS/Security: Rework inode_getsecurity and callers to return resulting bufferDavid P. Quigley
This patch modifies the interface to inode_getsecurity to have the function return a buffer containing the security blob and its length via parameters instead of relying on the calling function to give it an appropriately sized buffer. Security blobs obtained with this function should be freed using the release_secctx LSM hook. This alleviates the problem of the caller having to guess a length and preallocate a buffer for this function allowing it to be used elsewhere for Labeled NFS. The patch also removed the unused err parameter. The conversion is similar to the one performed by Al Viro for the security_getprocattr hook. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05writeback: speed up writeback of big dirty filesFengguang Wu
After making dirty a 100M file, the normal behavior is to start the writeback for all data after 30s delays. But sometimes the following happens instead: - after 30s: ~4M - after 5s: ~4M - after 5s: all remaining 92M Some analyze shows that the internal io dispatch queues goes like this: s_io s_more_io ------------------------- 1) 100M,1K 0 2) 1K 96M 3) 0 96M 1) initial state with a 100M file and a 1K file 2) 4M written, nr_to_write <= 0, so write more 3) 1K written, nr_to_write > 0, no more writes(BUG) nr_to_write > 0 in (3) fools the upper layer to think that data have all been written out. The big dirty file is actually still sitting in s_more_io. We cannot simply splice s_more_io back to s_io as soon as s_io becomes empty, and let the loop in generic_sync_sb_inodes() continue: this may starve newly expired inodes in s_dirty. It is also not an option to draw inodes from both s_more_io and s_dirty, an let the loop go on: this might lead to live locks, and might also starve other superblocks in sync time(well kupdate may still starve some superblocks, that's another bug). We have to return when a full scan of s_io completes. So nr_to_write > 0 does not necessarily mean that "all data are written". This patch introduces a flag writeback_control.more_io to indicate that more io should be done. With it the big dirty file no longer has to wait for the next kupdate invokation 5s later. In sync_sb_inodes() we only set more_io on super_blocks we actually visited. This avoids the interaction between two pdflush deamons. Also in __sync_single_inode() we don't blindly keep requeuing the io if the filesystem cannot progress. Failing to do so may lead to 100% iowait. Tested-by: Mike Snitzer <snitzer@gmail.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Michael Rubin <mrubin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05mm: fix PageUptodate data raceNick Piggin
After running SetPageUptodate, preceeding stores to the page contents to actually bring it uptodate may not be ordered with the store to set the page uptodate. Therefore, another CPU which checks PageUptodate is true, then reads the page contents can get stale data. Fix this by having an smp_wmb before SetPageUptodate, and smp_rmb after PageUptodate. Many places that test PageUptodate, do so with the page locked, and this would be enough to ensure memory ordering in those places if SetPageUptodate were only called while the page is locked. Unfortunately that is not always the case for some filesystems, but it could be an idea for the future. Also bring the handling of anonymous page uptodateness in line with that of file backed page management, by marking anon pages as uptodate when they _are_ uptodate, rather than when our implementation requires that they be marked as such. Doing allows us to get rid of the smp_wmb's in the page copying functions, which were especially added for anonymous pages for an analogous memory ordering problem. Both file and anonymous pages are handled with the same barriers. FAQ: Q. Why not do this in flush_dcache_page? A. Firstly, flush_dcache_page handles only one side (the smb side) of the ordering protocol; we'd still need smp_rmb somewhere. Secondly, hiding away memory barriers in a completely unrelated function is nasty; at least in the PageUptodate macros, they are located together with (half) the operations involved in the ordering. Thirdly, the smp_wmb is only required when first bringing the page uptodate, wheras flush_dcache_page should be called each time it is written to through the kernel mapping. It is logically the wrong place to put it. Q. Why does this increase my text size / reduce my performance / etc. A. Because it is adding the necessary instructions to eliminate the data-race. Q. Can it be improved? A. Yes, eg. if you were to create a rule that all SetPageUptodate operations run under the page lock, we could avoid the smp_rmb places where PageUptodate is queried under the page lock. Requires audit of all filesystems and at least some would need reworking. That's great you're interested, I'm eagerly awaiting your patches. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05mm/page-writeback: highmem_is_dirtyable optionBron Gondwana
Add vm.highmem_is_dirtyable toggle A 32 bit machine with HIGHMEM64 enabled running DCC has an MMAPed file of approximately 2Gb size which contains a hash format that is written randomly by the dbclean process. On 2.6.16 this process took a few minutes. With lowmem only accounting of dirty ratios, this takes about 12 hours of 100% disk IO, all random writes. Include a toggle in /proc/sys/vm/highmem_is_dirtyable which can be set to 1 to add the highmem back to the total available memory count. [akpm@linux-foundation.org: Fix the CONFIG_DETECT_SOFTLOCKUP=y build] Signed-off-by: Bron Gondwana <brong@fastmail.fm> Cc: Ethan Solomita <solo@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: WU Fengguang <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05Page allocator: get rid of the list of cold pagesChristoph Lameter
We have repeatedly discussed if the cold pages still have a point. There is one way to join the two lists: Use a single list and put the cold pages at the end and the hot pages at the beginning. That way a single list can serve for both types of allocations. The discussion of the RFC for this and Mel's measurements indicate that there may not be too much of a point left to having separate lists for hot and cold pages (see http://marc.info/?t=119492914200001&r=1&w=2). Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Martin Bligh <mbligh@mbligh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>