summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)Author
2010-08-01of/address: Clean up function declarationsGrant Likely
This patch moves the declaration of of_get_address(), of_get_pci_address(), and of_pci_address_to_resource() out of arch code and into the common linux/of_address header file. This patch also fixes some of the asm/prom.h ordering issues. It still includes some header files that it ideally shouldn't be, but at least the ordering is consistent now so that of_* overrides work. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-08-01KVM: PPC: elide struct thread_struct instances from stackAndreas Schwab
Instead of instantiating a whole thread_struct on the stack use only the required parts of it. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Tested-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-07-31powerpc/kexec: Fix orphaned offline CPUs across kexecMatt Evans
When CPU hotplug is used, some CPUs may be offline at the time a kexec is performed. The subsequent kernel may expect these CPUs to be already running, and will declare them stuck. On pseries, there's also a soft-offline (cede) state that CPUs may be in; this can also cause problems as the kexeced kernel may ask RTAS if they're online -- and RTAS would say they are. The CPU will either appear stuck, or will cause a crash as we replace its cede loop beneath it. This patch kicks each present offline CPU awake before the kexec, so that none are forever lost to these assumptions in the subsequent kernel. Now, the behaviour is that all available CPUs that were offlined are now online & usable after the kexec. This mimics the behaviour of a full reboot (on which all CPUs will be restarted). Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-31powerpc/kexec: Add to and tidy debug/comments in machine_kexec64.cMatt Evans
Tidies some typos, KERN_INFO-ise an info msg, and add a debug msg showing when the final sequence starts. Also adds a comment to kexec_prepare_cpus_wait() to make note of a possible problem; the need for kexec to deal with CPUs that failed to originally start up. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-31powerpc: Print decimal values in prom_init.cMichael Neuling
Currently we look pretty stupid when printing out a bunch of things in prom_init.c. eg. Max number of cores passed to firmware: 0x0000000000000080 So I've change this to print in decimal: Max number of cores passed to firmware: 128 (NR_CPUS = 256) This required adding a prom_print_dec() function and changing some prom_printk() calls from %x to %lu. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-31powerpc/smp: remove the incorrect decrementer initial codes for APTiejun Chen
We already defined start_cpu_decrementer() to invoke decrementer for AP as the following path: start_secondary() -> secondary_cpu_time_init() -> start_cpu_decrementer() So remove these incorrect codes introduced from commit: e7f75ad0 powerpc/47x: Base ppc476 support And actually we really should not enable decrementer before calling set_dec(). Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-31powerpc: Add vmcoreinfo symbols to allow makdumpfile to filter core files ↵Neil Horman
properly Signed-off-by: Neil Horman <nhorman@tuxdriver.com> machine_kexec.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-31powerpc/crashdump: Fix issues with kexec and 36bit physical addrMatthew McClintock
Fix sizes of variables so correct values are exported via /proc. Cast variable in comparison to avoid compiler error. Signed-off-by: Matthew McClintock <msm@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-31powerpc/kexec: Switch to a static PACA on the way outMatt Evans
With dynamic PACAs, the kexecing CPU's PACA won't lie within the kernel static data and there is a chance that something may stomp it when preparing to kexec. This patch switches this final CPU to a static PACA just before we pull the switch. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-30Merge commit 'jwb/next' into nextBenjamin Herrenschmidt
2010-07-28Merge branch 'powerpc.cherry-picks' into timers/clocksourceThomas Gleixner
Conflicts: arch/powerpc/kernel/time.c Reason: The powerpc next tree contains two commits which conflict with the timekeeping changes: 8fd63a9e powerpc: Rework VDSO gettimeofday to prevent time going backwards c1aa687d powerpc: Clean up obsolete code relating to decrementer and timebase John Stultz identified them and provided the conflict resolution. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-28powerpc: Clean up obsolete code relating to decrementer and timebasePaul Mackerras
Since the decrementer and timekeeping code was moved over to using the generic clockevents and timekeeping infrastructure, several variables and functions have been obsolete and effectively unused. This deletes them. In particular, wakeup_decrementer() is no longer needed since the generic code reprograms the decrementer as part of the process of resuming the timekeeping code, which happens during sysdev resume. Thus the wakeup_decrementer calls in the suspend_enter methods for 52xx platforms have been removed. The call in the powermac cpu frequency change code has been replaced by set_dec(1), which will cause a timer interrupt as soon as interrupts are enabled, and the generic code will then reprogram the decrementer with the correct value. This also simplifies the generic_suspend_en/disable_irqs functions and makes them static since they are not referenced outside time.c. The preempt_enable/disable calls are removed because the generic code has disabled all but the boot cpu at the point where these functions are called, so we can't be moved to another cpu. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-28powerpc: Rework VDSO gettimeofday to prevent time going backwardsPaul Mackerras
Currently it is possible for userspace to see the result of gettimeofday() going backwards by 1 microsecond, assuming that userspace is using the gettimeofday() in the VDSO. The VDSO gettimeofday() algorithm computes the time in "xsecs", which are units of 2^-20 seconds, or approximately 0.954 microseconds, using the algorithm now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec and then converts the time in xsecs to seconds and microseconds. The kernel updates the tb_orig_stamp and stamp_xsec values every tick in update_vsyscall(). If the length of the tick is not an integer number of xsecs, then some precision is lost in converting the current time to xsecs. For example, with CONFIG_HZ=1000, the tick is 1ms long, which is 1048.576 xsecs. That means that stamp_xsec will advance by either 1048 or 1049 on each tick. With the right conditions, it is possible for userspace to get (timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is slightly late in updating the vdso_datapage, and then for stamp_xsec to advance by 1048 when the kernel does update it, and for userspace to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to integer truncation. The result is that time appears to go backwards by 1 microsecond. To fix this we change the VDSO gettimeofday to use a new field in the VDSO datapage which stores the nanoseconds part of the time as a fractional number of seconds in a 0.32 binary fraction format. (Or put another way, as a 32-bit number in units of 0.23283 ns.) This is convenient because we can use the mulhwu instruction to convert it to either microseconds or nanoseconds. Since it turns out that computing the time of day using this new field is simpler than either using stamp_xsec (as gettimeofday does) or stamp_xtime.tv_nsec (as clock_gettime does), this converts both gettimeofday and clock_gettime to use the new field. The existing __do_get_tspec function is converted to use the new field and take a parameter in r7 that indicates the desired resolution, 1,000,000 for microseconds or 1,000,000,000 for nanoseconds. The __do_get_xsec function is then unused and is deleted. The new algorithm is now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs + (stamp_xtime_seconds << 32) + stamp_sec_fraction with 'now' in units of 2^-32 seconds. That is then converted to seconds and either microseconds or nanoseconds with seconds = now >> 32 partseconds = ((now & 0xffffffff) * resolution) >> 32 The 32-bit VDSO code also makes a further simplification: it ignores the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary fraction. Doing so gets rid of 4 multiply instructions. Assuming a timebase frequency of 1GHz or less and an update interval of no more than 10ms, the upper 32 bits of tb_to_xs will be at least 4503599, so the error from ignoring the low 32 bits will be at most 2.2ns, which is more than an order of magnitude less than the time taken to do gettimeofday or clock_gettime on our fastest processors, so there is no possibility of seeing inconsistent values due to this. This also moves update_gtod() down next to its only caller, and makes update_vsyscall use the time passed in via the wall_time argument rather than accessing xtime directly. At present, wall_time always points to xtime, but that could change in future. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-27perf, powerpc: Use perf_sample_data_init() for the FSL codePeter Zijlstra
We should use perf_sample_data_init() to initialize struct perf_sample_data. As explained in the description of commit dc1d628a ("perf: Provide generic perf_sample_data initialization"), it is possible for userspace to get the kernel to dereference data.raw, so if it is not initialized, that means that unprivileged userspace can possibly oops the kernel. Using perf_sample_data_init makes sure it gets initialized to NULL. This conversion should have been included in commit dc1d628a, but it got missed. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Kumar Gala <kumar.gala@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2010-07-27timkeeping: Fix update_vsyscall to provide wall_to_monotonic offsetJohn Stultz
update_vsyscall() did not provide the wall_to_monotoinc offset, so arch specific implementations tend to reference wall_to_monotonic directly. This limits future cleanups in the timekeeping core, so this patch fixes the update_vsyscall interface to provide wall_to_monotonic, allowing wall_to_monotonic to be made static as planned in Documentation/feature-removal-schedule.txt Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Tony Luck <tony.luck@intel.com> LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27powerpc: Cleanup xtime usageJohn Stultz
This removes powerpc's direct xtime usage, allowing for further generic timeekeping cleanups Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> LKML-Reference: <1279068988-21864-6-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27powerpc: Simplify update_vsyscallJohn Stultz
Currently powerpc's update_vsyscall calls an inline update_gtod. However, both are straightforward, and there are no other users, so this patch merges update_gtod into update_vsyscall. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Anton Blanchard <anton@samba.org> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1279068988-21864-5-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-26powerpc/40x: Distinguish AMCC PowerPC 405EX and 405EXr correctlyLee Nipper
The recent AMCC 405EX Rev D without Security uses a PVR value that matches the old 405EXr Rev A/B with Security. The 405EX Rev D without Security would be shown incorrectly as an 405EXr. The pvr_mask of 0xffff0004 is no longer sufficient to distinguish the 405EX from 405EXr. This patch replaces 2 entries in the cpu_specs table and adds 8 more, each using pvr_mask of 0xffff000f and appropriate pvr_value to distinguish the AMCC PowerPC 405EX and 405EXr instances. The cpu_name for these entries now includes the Rev, in similar fashion to the 440GX. Signed-off-by: Lee Nipper <lee.nipper@gmail.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2010-07-24of: remove of_default_bus_idsJonas Bonn
This list used was by only two platforms with all other platforms defining an own list of valid bus id's to pass to of_platform_bus_probe. This patch: i) copies the default list to the two platforms that depended on it (powerpc) ii) remove the usage of of_default_bus_ids in of_platform_bus_probe iii) removes the definition of the list from all architectures that defined it Passing a NULL 'matches' parameter to of_platform_bus_probe is still valid; the function returns no error in that case as the NULL value is equivalent to an empty list. Signed-off-by: Jonas Bonn <jonas@southpole.se> [grant.likely@secretlab.ca: added __initdata annotations, warn on and return error on missing match table, and fix whitespace errors] Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-07-24of: make of_find_device_by_node genericJonas Bonn
There's no need for this function to be architecture specific and all four architectures defining it had the same definition. The function has been moved to drivers/of/platform.c. Signed-off-by: Jonas Bonn <jonas@southpole.se> [grant.likely@secretlab.ca: moved to drivers/of/platform.c, simplified code, and added kerneldoc comment] Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: David S. Miller <davem@davemloft.net>
2010-07-24powerpc: remove references to of_device and to_of_deviceGrant Likely
of_device is just a #define alias to platform_device. This patch replaces all references to it with platform_device. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-07-24of/platform: remove all of_bus_type and of_platform_bus_type referencesGrant Likely
Both of_bus_type and of_platform_bus_type are just #define aliases for the platform bus. This patch removes all references to them and switches to the of_register_platform_driver()/of_unregister_platform_driver() API for registering. Subsequent patches will convert each user of of_register_platform_driver() into plain platform_drivers without the of_platform_driver shim. At which point the of_register_platform_driver()/of_unregister_platform_driver() functions can be removed. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: David S. Miller <davem@davemloft.net>
2010-07-24of: Merge of_platform_bus_type with platform_bus_typeGrant Likely
of_platform_bus was being used in the same manner as the platform_bus. The only difference being that of_platform_bus devices are generated from data in the device tree, and platform_bus devices are usually statically allocated in platform code. Having them separate causes the problem of device drivers having to be registered twice if it was possible for the same device to appear on either bus. This patch removes of_platform_bus_type and registers all of_platform bus devices and drivers on the platform bus instead. A previous patch made the of_device structure an alias for the platform_device structure, and a shim is used to adapt of_platform_drivers to the platform bus. After all of of_platform_bus drivers are converted to be normal platform drivers, the shim code can be removed. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: David S. Miller <davem@davemloft.net>
2010-07-24Merge commit 'v2.6.35-rc6' into devicetree/nextGrant Likely
Conflicts: arch/sparc/kernel/prom_64.c
2010-07-23powerpc: Fix erroneous lmb->memblock conversionsBenjamin Herrenschmidt
Oooops... we missed these. We incorrectly converted strings used when parsing the device-tree on pseries, thus breaking access to drconf memory and hotplug memory. While at it, also revert some variable names that represent something the FW calls "lmb" and thus don't need to be converted to "memblock". Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> ---
2010-07-21Merge branch 'linus' into sched/coreIngo Molnar
Merge reason: Move from the -rc3 to the almost-rc6 base. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-21Merge branch 'linus' into perf/coreIngo Molnar
Merge reason: Pick up the latest perf fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-19update email addressPavel Machek
pavel@suse.cz no longer works, replace it with working address. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-07-18of: Remove unused of_find_device_by_phandle()Grant Likely
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-14Merge branch 'lmb-to-memblock' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'lmb-to-memblock' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: lmb: rename to memblock
2010-07-14lmb: rename to memblockYinghai Lu
via following scripts FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/lmb/memblock/g' \ -e 's/LMB/MEMBLOCK/g' \ $FILES for N in $(find . -name lmb.[ch]); do M=$(echo $N | sed 's/lmb/memblock/g') mv $N $M done and remove some wrong change like lmbench and dlmb etc. also move memblock.c from lib/ to mm/ Suggested-by: Ingo Molnar <mingo@elte.hu> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-14powerpc/book3e: Fix single step when using HW page tablesBenjamin Herrenschmidt
We patch the TLB miss exception vectors to point to alternate functions when using HW page table on BookE. However, we were patching in a new branch in the first instruction of the exception handler instead of the second one, thus overriding the nop that is in the first instruction. This cause problems when single stepping as we rely on that nop for the single step to stop properly within the exception vector range rather than on the target of the branch. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-14powerpc/book3e: Add generic 64-bit idle powersave supportBenjamin Herrenschmidt
We use a similar technique to ppc32: We set a thread local flag to indicate that we are about to enter or have entered the stop state, and have fixup code in the async interrupt entry code that reacts to this flag to make us return to a different location (sets NIP to LINK in our case). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> -- v2. Fix lockdep bug Re-mask interrupts when coming back from idle
2010-07-11powerpc/fsl-booke: Fix address issue when using relocatable kernelsMatthew McClintock
When booting a relocatable kernel it needs to jump to the correct start address, which for BookE parts is usually unchanged regardless of the physical memory offset. Recent changes cause problems with how we calculate the start address, it was always adding the RMO into the start address which is incorrect. This patch only adds in the RMO offset if we are in the kexec code path, as it needs the RMO to work correctly. Instead of adding the RMO offset in in the common code path, we can just set r6 to the RMO offset in the kexec code path instead of to zero, and finally perform the masking in the common code path Signed-off-by: Matthew McClintock <msm@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2010-07-09powerpc/book3e: Resend doorbell exceptions to ourselfMichael Ellerman
If we are soft disabled and receive a doorbell exception we don't process it immediately. This means we need to check on the way out of irq restore if there are any doorbell exceptions to process. The problem is at that point we don't know what our regs are, and that in turn makes xmon unhappy. To workaround the problem, instead of checking for and processing doorbells, we check for any doorbells and if there were any we send ourselves another. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09powerpc/book3e: Use set_irq_regs() in the msgsnd/msgrcv IPI pathDavid Gibson
include/asm-generic/irq_regs.h declares per-cpu irq_regs variables and get_irq_regs() and set_irq_regs() helper functions to maintain them. These can be used to access the proper pt_regs structure related to the current interrupt entry (if any). In the powerpc arch code, this is used to maintain irq regs on decrementer and external interrupt exceptions. However, for the doorbell exceptions used by the msgsnd/msgrcv IPI mechanism of newer BookE CPUs, the irq_regs are not kept up to date. In particular this means that xmon will not work properly on SMP, because the secondary xmon instances started by IPI will blow up when they cannot retrieve the irq regs. This patch fixes the problem by adding calls to maintain the irq regs across doorbell exceptions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09powerpc/book3e: Hookup doorbells exceptions on 64-bit Book3EBenjamin Herrenschmidt
Note that critical doorbells are an unimplemented stub just like other critical or machine check handlers, since we haven't done support for "levelled" exceptions yet. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09powerpc/book3e: Don't re-trigger decrementer on lazy irq restoreBenjamin Herrenschmidt
The decrementer on BookE acts as a level interrupt and doesn't need to be re-triggered when going negative. It doesn't go negative anyways (unless programmed to auto-reload with a negative value) as it stops when reaching 0. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09powerpc/book3e: More doorbell cleanups. Sample the PIR registerBenjamin Herrenschmidt
The doorbells use the content of the PIR register to match messages from other CPUs. This may or may not be the same as our linux CPU number, so using that as the "target" is no right. Instead, we sample the PIR register at boot on every processor and use that value subsequently when sending IPIs. We also use a per-cpu message mask rather than a global array which should limit cache line contention. Note: We could use the CPU number in the device-tree instead of the PIR register, as they are supposed to be equivalent. This might prove useful if doorbells are to be used to kick CPUs out of FW at boot time, thus before we can sample the PIR. This is however not the case now and using the PIR just works. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09powerpc/book3e: Move doorbell_exception from traps.c to dbell.cBenjamin Herrenschmidt
... where it belongs Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09powerpc/book3e: Hack to get gdb moving along on Book3E 64-bitBenjamin Herrenschmidt
Our handling of debug interrupts on Book3E 64-bit is not quite the way it should be just yet. This is a workaround to let gdb work at least for now. We ensure that when context switching, we set the appropriate DBCR0 value for the new task. We also make sure that we turn off MSR[DE] within the kernel, and set it as part of the bits that get set when going back to userspace. In the long run, we will probably set the userspace DBCR0 on the exception exit code path and ensure we have some proper kernel value to set on the way into the kernel, a bit like ppc32 does, but that will take more work. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09powerpc: Add i8042 keyboard and mouse irq parsingMartyn Welch
Currently the irqs for the i8042, which historically provides keyboard and mouse (aux) support, is hardwired in the driver rather than parsing the dts. This patch modifies the powerpc legacy IO code to attempt to parse the device tree for this information, failing back to the hardcoded values if it fails. Signed-off-by: Martyn Welch <martyn.welch@ge.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09powerpc: Optimise per cpu accesses on 64bitAnton Blanchard
Now we dynamically allocate the paca array, it takes an extra load whenever we want to access another cpu's paca. One place we do that a lot is per cpu variables. A simple example: DEFINE_PER_CPU(unsigned long, vara); unsigned long test4(int cpu) { return per_cpu(vara, cpu); } This takes 4 loads, 5 if you include the actual load of the per cpu variable: ld r11,-32760(r30) # load address of paca pointer ld r9,-32768(r30) # load link address of percpu variable sldi r3,r29,9 # get offset into paca (each entry is 512 bytes) ld r0,0(r11) # load paca pointer add r3,r0,r3 # paca + offset ld r11,64(r3) # load paca[cpu].data_offset ldx r3,r9,r11 # load per cpu variable If we remove the ppc64 specific per_cpu_offset(), we get the generic one which indexes into a statically allocated array. This removes one load and one add: ld r11,-32760(r30) # load address of __per_cpu_offset ld r9,-32768(r30) # load link address of percpu variable sldi r3,r29,3 # get offset into __per_cpu_offset (each entry 8 bytes) ldx r11,r11,r3 # load __per_cpu_offset[cpu] ldx r3,r9,r11 # load per cpu variable Having all the offsets in one array also helps when iterating over a per cpu variable across a number of cpus, such as in the scheduler. Before we would need to load one paca cacheline when calculating each per cpu offset. Now we have 16 (128 / sizeof(long)) per cpu offsets in each cacheline. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09powerpc/pseries: Migration code reorganization / hibernation prepBrian King
Partition hibernation will use some of the same code as is currently used for Live Partition Migration. This function further abstracts this code such that code outside of rtas.c can utilize it. It also changes the error field in the suspend me data structure to be an atomic type, since it is set and checked on different cpus without any barriers or locking. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09powerpc: Clean up obsolete code relating to decrementer and timebasePaul Mackerras
Since the decrementer and timekeeping code was moved over to using the generic clockevents and timekeeping infrastructure, several variables and functions have been obsolete and effectively unused. This deletes them. In particular, wakeup_decrementer() is no longer needed since the generic code reprograms the decrementer as part of the process of resuming the timekeeping code, which happens during sysdev resume. Thus the wakeup_decrementer calls in the suspend_enter methods for 52xx platforms have been removed. The call in the powermac cpu frequency change code has been replaced by set_dec(1), which will cause a timer interrupt as soon as interrupts are enabled, and the generic code will then reprogram the decrementer with the correct value. This also simplifies the generic_suspend_en/disable_irqs functions and makes them static since they are not referenced outside time.c. The preempt_enable/disable calls are removed because the generic code has disabled all but the boot cpu at the point where these functions are called, so we can't be moved to another cpu. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09powerpc: Rework VDSO gettimeofday to prevent time going backwardsPaul Mackerras
Currently it is possible for userspace to see the result of gettimeofday() going backwards by 1 microsecond, assuming that userspace is using the gettimeofday() in the VDSO. The VDSO gettimeofday() algorithm computes the time in "xsecs", which are units of 2^-20 seconds, or approximately 0.954 microseconds, using the algorithm now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec and then converts the time in xsecs to seconds and microseconds. The kernel updates the tb_orig_stamp and stamp_xsec values every tick in update_vsyscall(). If the length of the tick is not an integer number of xsecs, then some precision is lost in converting the current time to xsecs. For example, with CONFIG_HZ=1000, the tick is 1ms long, which is 1048.576 xsecs. That means that stamp_xsec will advance by either 1048 or 1049 on each tick. With the right conditions, it is possible for userspace to get (timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is slightly late in updating the vdso_datapage, and then for stamp_xsec to advance by 1048 when the kernel does update it, and for userspace to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to integer truncation. The result is that time appears to go backwards by 1 microsecond. To fix this we change the VDSO gettimeofday to use a new field in the VDSO datapage which stores the nanoseconds part of the time as a fractional number of seconds in a 0.32 binary fraction format. (Or put another way, as a 32-bit number in units of 0.23283 ns.) This is convenient because we can use the mulhwu instruction to convert it to either microseconds or nanoseconds. Since it turns out that computing the time of day using this new field is simpler than either using stamp_xsec (as gettimeofday does) or stamp_xtime.tv_nsec (as clock_gettime does), this converts both gettimeofday and clock_gettime to use the new field. The existing __do_get_tspec function is converted to use the new field and take a parameter in r7 that indicates the desired resolution, 1,000,000 for microseconds or 1,000,000,000 for nanoseconds. The __do_get_xsec function is then unused and is deleted. The new algorithm is now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs + (stamp_xtime_seconds << 32) + stamp_sec_fraction with 'now' in units of 2^-32 seconds. That is then converted to seconds and either microseconds or nanoseconds with seconds = now >> 32 partseconds = ((now & 0xffffffff) * resolution) >> 32 The 32-bit VDSO code also makes a further simplification: it ignores the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary fraction. Doing so gets rid of 4 multiply instructions. Assuming a timebase frequency of 1GHz or less and an update interval of no more than 10ms, the upper 32 bits of tb_to_xs will be at least 4503599, so the error from ignoring the low 32 bits will be at most 2.2ns, which is more than an order of magnitude less than the time taken to do gettimeofday or clock_gettime on our fastest processors, so there is no possibility of seeing inconsistent values due to this. This also moves update_gtod() down next to its only caller, and makes update_vsyscall use the time passed in via the wall_time argument rather than accessing xtime directly. At present, wall_time always points to xtime, but that could change in future. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09Merge commit 'paulus-perf/master' into nextBenjamin Herrenschmidt
2010-07-08powerpc: Fix default_machine_crash_shutdown #ifdef botchPaul E. McKenney
crash_kexec_wait_realmode() is defined only if CONFIG_PPC_STD_MMU_64 and CONFIG_SMP, but is called if CONFIG_PPC_STD_MMU_64 even if !CONFIG_SMP. Fix the conditional compilation around the invocation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-08powerpc: Fix logic error in fixup_irqsJohannes Berg
When SPARSE_IRQ is set, irq_to_desc() can return NULL. While the code here has a check for NULL, it's not really correct. Fix it by separating the check for it. This fixes CPU hot unplug for me. Reported-by: Alastair Bridgewater <alastair.bridgewater@gmail.com> Cc: stable@kernel.org [2.6.32+] Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-08powerpc: Linux cannot run with 0 coresAnton Blanchard
If we configure with CONFIG_SMP=n or set NR_CPUS less than the number of SMT threads we will set the max cores property to 0 in the ibm,client-architecture-support structure. On new versions of firmware that understand this property it obliges and terminates our partition. Use DIV_ROUND_UP so we handle not only the CONFIG_SMP=n case but also the case where NR_CPUS isn't a multiple of the number of SMT threads. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>