asmadeus/linux.git - The linux kernel

Age	Commit message (Collapse)	Author
2007-07-19	define new percpu interface for shared data	Fenghua Yu
	per cpu data section contains two types of data. One set which is exclusively accessed by the local cpu and the other set which is per cpu, but also shared by remote cpus. In the current kernel, these two sets are not clearely separated out. This can potentially cause the same data cacheline shared between the two sets of data, which will result in unnecessary bouncing of the cacheline between cpus. One way to fix the problem is to cacheline align the remotely accessed per cpu data, both at the beginning and at the end. Because of the padding at both ends, this will likely cause some memory wastage and also the interface to achieve this is not clean. This patch: Moves the remotely accessed per cpu data (which is currently marked as ____cacheline_aligned_in_smp) into a different section, where all the data elements are cacheline aligned. And as such, this differentiates the local only data and remotely accessed data cleanly. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: <linux-arch@vger.kernel.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19	unregister_chrdev(): ignore the return value	Akinobu Mita
	unregister_chrdev() always returns 0. There is no need to check the return value. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19	mm: fault feedback #2	Nick Piggin
	This patch completes Linus's wish that the fault return codes be made into bit flags, which I agree makes everything nicer. This requires requires all handle_mm_fault callers to be modified (possibly the modifications should go further and do things like fault accounting in handle_mm_fault -- however that would be for another patch). [akpm@linux-foundation.org: fix alpha build] [akpm@linux-foundation.org: fix s390 build] [akpm@linux-foundation.org: fix sparc build] [akpm@linux-foundation.org: fix sparc64 build] [akpm@linux-foundation.org: fix ia64 build] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Still apparently needs some ARM and PPC loving - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-18	[SPARC64]: Set vio->desc_buf to NULL after freeing.	David S. Miller
	Otherwise we trigger assertions on the next link-up. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-18	[SPARC]: Mark sparc and sparc64 as not having virt_to_bus	Stephen Rothwell
	Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-18	[SPARC64]: Handle reset events in vio_link_state_change().	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-18	[SPARC64]: Handle LDC resets properly in domain-services driver.	David S. Miller
	Reset the handshake and per-capability state so that when the link comes back up we'll renegotiate the DS version and then reregister all of the services. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-18	[SPARC64]: Massively simplify VIO device layer and support hot add/remove.	David S. Miller
	Create and destroy VIO devices in response to MD update events. These run synchronously inside of the MD update mutex so the VIO layer doesn't need to do internal locking of any sort. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-18	[SPARC64]: Add basic infrastructure for MD add/remove notification.	David S. Miller
	And add dummy handlers for the VIO device layer. These will be filled in with real code after the vdc, vnet, and ds drivers are reworked to have simpler dependencies on the VIO device tree. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-17	[SPARC64]: Kill bogus set_fs(KERNEL_DS) in do_rt_sigreturn().	Oleg Nesterov
	From: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-17	[SPARC64]: Update defconfig.	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-17	[SPARC64]: Kill explicit %gl register reference.	David S. Miller
	Older binutils can't handle it. Use SET_GL() instead, which is explicitly for this purpose. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-17	Report that kernel is tainted if there was an OOPS	Pavel Emelianov
	If the kernel OOPSed or BUGed then it probably should be considered as tainted. Thus, all subsequent OOPSes and SysRq dumps will report the tainted kernel. This saves a lot of time explaining oddities in the calltraces. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Added parisc patch from Matthew Wilson -Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16	[SPARC64]: Fix race between MD update and dr-cpu add.	David S. Miller
	We need to make sure the MD update occurs before we try to process dr-cpu configure requests. MD update and dr-cpu were being processed by seperate threads so that did not happen occaisionally. Fix this by executing all domain services data packets from a single thread, in order. This will help simplify some other things as well. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: SMP build fix.	Fabio Massimo Di Nitto
	The UP build fix had some unintended consequences. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Fix UP build.	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: dr-cpu unconfigure support.	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Give more accurate errors in dr_cpu_configure().	David S. Miller
	When cpu_up() fails, we can discern the most likely cause. If cpu_present() is false, this means the cpu did not appear in the MD. If -ENODEV is the error return value, then the processor did not boot properly into the kernel. Pass this information back in the dr-cpu response packet. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Clear cpu_{core,sibling}_map[] in smp_fill_in_sib_core_maps()	David S. Miller
	When we hot-plug in new cpus, the core_id and proc_id of existing cpus can change. So in order to set the cpu groups correctly we need to clear the maps out completely first. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Fix leak when DR added cpu does not bootup.	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Add ->set_affinity IRQ handlers.	David S. Miller
	dr-cpu unconfigure requests will walk throught he enabled IRQs and trigger ->set_affinity so that the going-down cpu no longer has INOs targetted to it. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Process dr-cpu events in a kthread instead of workqueue.	David S. Miller
	This will be necessary to handle unconfigure requests properly. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: More sensible udelay implementation.	David S. Miller
	Take a page from the powerpc folks and just calculate the delay factor directly. Since frequency scaling chips use a system-tick register, the value is going to be the same system-wide. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: SMP build fixes.	David S. Miller
	With the move of ldom_startcpu_cpuid() into smp.c some other things need to follow along: 1) smp.c is not a driver so we can't use "PFX" macro in the printk calls. 2) smp.c now needs asm/io.h and asm/hvtramp.h, ds.c no longer does 3) kimage_addr_to_ra() also needs to move into smp.c While we're here, update copyright info and my email address in smp.c Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: mdesc.c needs linux/mm.h	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Fix build regressions added by dr-cpu changes.	David S. Miller
	Do not select HOTPLUG_CPU from SUN_LDOMS, that causes HOTPLUG_CPU to be selected even on non-SMP which is illegal. Only build hvtramp.o when SMP, just like trampoline.o Protect dr-cpu code in ds.c with HOTPLUG_CPU. Likewise move ldom_startcpu_cpuid() to smp.c and protect it and the call site with SUN_LDOMS && HOTPLUG_CPU. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Unconditionally register vio_bus_type.	David S. Miller
	The VIO drivers register themselves unconditionally just like those of any other bus type, so to avoid crashes on non-VIO systems we need to always register vio_bus_type. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Initial LDOM cpu hotplug support.	David S. Miller
	Only adding cpus is supports at the moment, removal will come next. When new cpus are configured, the machine description is updated. When we get the configure request we pass in a cpu mask of to-be-added cpus to the mdesc CPU node parser so it only fetches information for those cpus. That code also proceeds to update the SMT/multi-core scheduling bitmaps. cpu_up() does all the work and we return the status back over the DS channel. CPUs via dr-cpu need to be booted straight out of the hypervisor, and this requires: 1) A new trampoline mechanism. CPUs are booted straight out of the hypervisor with MMU disabled and running in physical addresses with no mappings installed in the TLB. The new hvtramp.S code sets up the critical cpu state, installs the locked TLB mappings for the kernel, and turns the MMU on. It then proceeds to follow the logic of the existing trampoline.S SMP cpu bringup code. 2) All calls into OBP have to be disallowed when domaining is enabled. Since cpus boot straight into the kernel from the hypervisor, OBP has no state about that cpu and therefore cannot handle being invoked on that cpu. Luckily it's only a handful of interfaces which can be called after the OBP device tree is obtained. For example, rebooting, halting, powering-off, and setting options node variables. CPU removal support will require some infrastructure changes here. Namely we'll have to process the requests via a true kernel thread instead of in a workqueue. workqueues run on a per-cpu thread, but when unconfiguring we might need to force the thread to execute on another cpu if the current cpu is the one being removed. Removal of a cpu also causes the kernel to destroy that cpu's workqueue running thread. Another issue on removal is that we may have interrupts still pointing to the cpu-to-be-removed. So new code will be needed to walk the active INO list and retarget those cpus as-needed. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Fix setting of variables in LDOM guest.	David S. Miller
	There is a special domain services capability for setting variables in the OBP options node. Guests don't have permanent store for the OBP variables like a normal system, so they are instead maintained in the LDOM control node or in the SC. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Fix MD property lifetime bugs.	David S. Miller
	Property values cannot be referenced outside of mdesc_grab()/mdesc_release() pairs. The only major offender was the VIO bus layer, easily fixed. Add some commentary to mdesc.h describing these rules. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Abstract out mdesc accesses for better MD update handling.	David S. Miller
	Since we have to be able to handle MD updates, having an in-tree set of data structures representing the MD objects actually makes things more painful. The MD itself is easy to parse, and we can implement the existing interfaces using direct parsing of the MD binary image. The MD is now reference counted, so accesses have to now take the form: handle = mdesc_grab(); ... operations on MD ... mdesc_release(handle); The only remaining issue are cases where code holds on to references to MD property values. mdesc_get_property() returns a direct pointer to the property value, most cases just pull in the information they need and discard the pointer, but there are few that use the pointer directly over a long lifetime. Those will be fixed up in a subsequent changeset. A preliminary handler for MD update events from domain services is there, it is rudimentry but it works and handles all of the reference counting. It does not check the generation number of the MDs, and it does not generate a "add/delete" list for notification to interesting parties about MD changes but that will be forthcoming. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Use more mearningful names for IRQ registry.	David S. Miller
	All of the interrupts say "LDX RX" and "LDX TX" currently which is next to useless. Put a device specific prefix before "RX" and "TX" instead which makes it much more useful. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Initial domain-services driver.	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Export powerd facilities for external entities.	David S. Miller
	Besides the existing usage for power-button interrupts, we'll want to make use of this code for domain-services where the LDOM manager can send reboot requests to the guest node. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Add domain-services nodes to VIO device tree.	David S. Miller
	They sit under the root of the MD tree unlike the rest of the LDC channel based virtual devices. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Assorted LDC bug cures.	David S. Miller
	1) LDC_MODE_RELIABLE is deprecated an unused by anything, plus it and LDC_MODE_STREAM were mis-numbered. 2) read_stream() should try to read as much as possible into the per-LDC stream buffer area, so do not trim the read_nonraw() length by the caller's size parameter. 3) Send data ACKs when necessary in read_nonraw(). 4) In read_nonraw() when we get a pure ACK, advance the RX head unconditionally past it. 5) Provide the ACKID field in the ldcdgb() packet dump in read_nonraw(). This helps debugging stream mode LDC channel problems. 6) Decrease verbosity of rx_data_wait() so that it is more useful. A debugging message each loop iteration is too much. 7) In process_data_ack() stop the loop checking when we hit lp->tx_tail not lp->tx_head. 8) Set the seqid field properly in send_data_nack(). Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Do not ACK an INO if it is disabled or inprogress.	David S. Miller
	This is also a partial workaround for a bug in the LDOM firmware which double-transmits RX inos during high load. Without this, such an event causes the kernel to loop forever in the interrupt call chain ACK'ing but never actually running the IRQ handler (and thus clearing the interrupt condition in the device). There is still a bad potential effect when double INOs occur, not covered by this changeset. Namely, if the INO is already on the per-cpu INO vector list, we still blindly re-insert it and thus we can end up losing interrupts already linked in after it. We could deal with that by traversing the list before insertion, but that's too expensive for this edge case. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16	[SPARC64]: Add LDOM virtual channel driver and VIO device layer.	David S. Miller
	Virtual devices on Sun Logical Domains are built on top of a virtual channel framework. This, with help of hypervisor interfaces, provides a link layer protocol with basic handshaking over which virtual device clients and servers communicate. Built on top of this is a VIO device protocol which has it's own handshaking and message types. At this layer attributes are exchanged (disk size, network device addresses, etc.) descriptor rings are registered, and data transfers are triggers and replied to. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-11	PCI: Only build PCI syscalls on architectures that want them	Matthew Wilcox
	The PCI syscalls are built on every architecture except X86, but only a few have ever hooked them up. Use a new Kconfig symbol to save a couple of kB on the architectures that have never used the syscalls. Tested on x86 and ia64 only. Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11	PCI: read revision ID by default	Auke Kok
	Currently there are 97 occurrences where drivers need the pci revision ID. We can do this once for all devices. Even the pci subsystem needs the revision several times for quirks. The extra u8 member pads out nicely in the pci_dev struct. Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-09	sched: zap the migration init / cache-hot balancing code	Ingo Molnar
	the SMP load-balancer uses the boot-time migration-cost estimation code to attempt to improve the quality of balancing. The reason for this code is that the discrete priority queues do not preserve the order of scheduling accurately, so the load-balancer skips tasks that were running on a CPU 'recently'. this code is fundamental fragile: the boot-time migration cost detector doesnt really work on systems that had large L3 caches, it caused boot delays on large systems and the whole cache-hot concept made the balancing code pretty undeterministic as well. (and hey, i wrote most of it, so i can say it out loud that it sucks ;-) under CFS the same purpose of cache affinity can be achieved without any special cache-hot special-case: tasks are sorted in the 'timeline' tree and the SMP balancer picks tasks from the left side of the tree, thus the most cache-cold task is balanced automatically. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-06-26	[SPARC64]: Need to set state to IDLE during sun4v IRQ enable.	David S. Miller
	This fixes hypervisor console interrupts on LDOM guests. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-26	[SPARC64]: Fix VIRQ enabling.	David S. Miller
	We were doing the wrong call to turn them on, and also when enabling we need to forcefully set the state to IDLE. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-13	[SPARC64]: Fix args to sun4v_ldc_revoke().	David S. Miller
	First argument is LDC channel ID, then mapping cookie, then the MTE revoke cookie. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-13	[SPARC64]: Fix IO/MEM space sizing for PCI.	David S. Miller
	In pci_determine_mem_io_space(), do not hard code the region sizes. Instead, use the values given to us in the ranges property. Thanks goes to Mikael Petterson for the original Xorg failure bug repoert, and strace dumps from Mikael and Dmitry Artamonow. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-13	[SPARC64]: Wire up cookie based sun4v interrupt registry.	David S. Miller
	This will be used for logical domain channel interrupts. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07	[SPARC64]: Handle PCI bridges without 'ranges' property.	David S. Miller
	This fixes the IDE controller not showing up on Netra-T1 systems. Just like Simba bridges, some PCI bridges can lack the 'ranges' OBP property. So we handle this similarly to the existing Simba code: 1) In of_device register address resolving, we push the translation to the parent. 2) In PCI device scanning, we interrogate the PCI config space registers of the PCI bus device in order to resolve the resources, just like the generic Linux PCI probing code does. With much help and testing from Fabio, who also reported the initial problem. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Fabio Massimo Di Nitto <fabbione@ubuntu.com>
2007-06-07	[SPARC64]: Include <linux/rwsem.h> instead of <asm/rwsem.h>.	Robert P. J. Day
	To be consistent with other architectures, include the generic version of rwsem.h. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07	[SPARC64]: Fix SBUS IRQ regression caused by PCI-E driver.	David S. Miller
	We used to access the 64-bit IRQ IMAP and ICLR registers of bus controllers 4-bytes in and as a 32-bit register word, since only the low 32-bits were relevant. This seemed like a good idea at the time. But the PCI-E controller requires full 8-byte 64-bit access to these registers, so we switched over to accessing them fully. SBUS was not adjusted properly, which broke interrupts completely. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07	[SPARC64]: Fix 2 bugs in PCI Sabre bus scanning.	David S. Miller
	If we are on hummingbird, bus runs at 66MHZ. pbm->pci_bus should be setup with the result of pci_scan_one_pbm() or else we deref NULL pointers in the error interrupt handlers. Signed-off-by: David S. Miller <davem@davemloft.net>