summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)Author
2009-09-24truncate: new helpersnpiggin@suse.de
Introduce new truncate helpers truncate_pagecache and inode_newsize_ok. vmtruncate is also consolidated from mm/memory.c and mm/nommu.c and into mm/truncate.c. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24fs: fix overflow in sys_mount() for in-kernel callsVegard Nossum
sys_mount() reads/copies a whole page for its "type" parameter. When do_mount_root() passes a kernel address that points to an object which is smaller than a whole page, copy_mount_options() will happily go past this memory object, possibly dereferencing "wild" pointers that could be in any state (hence the kmemcheck warning, which shows that parts of the next page are not even allocated). (The likelihood of something going wrong here is pretty low -- first of all this only applies to kernel calls to sys_mount(), which are mostly found in the boot code. Secondly, I guess if the page was not mapped, exact_copy_from_user() _would_ in fact handle it correctly because of its access_ok(), etc. checks.) But it is much nicer to avoid the dubious reads altogether, by stopping as soon as we find a NUL byte. Is there a good reason why we can't do something like this, using the already existing strndup_from_user()? [akpm@linux-foundation.org: make copy_mount_string() static] [AV: fix compat mount breakage, which involves undoing akpm's change above] Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: al <al@dizzy.pdmi.ras.ru>
2009-09-24x86: Remove redundant non-NUMA topology functionsRusty Russell
arch/x86/include/asm/topology.h declares inline fns cpu_to_node and cpumask_of_node for !NUMA, even though they are then declared as macros by asm-generic/topology.h, which is #included just below. The macros (which are the same) end up being used; these functions are just confusing. Noticed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: "Greg Kroah-Hartman" <gregkh@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> LKML-Reference: <200909241748.45629.rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24perf tools: .gitignore += perf*.htmlKirill Smelkov
I've tried building the docs in tools/perf/Documentation/ , and after that `git status` showed dozen of untracked htmls. Let's ignore them. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> LKML-Reference: <1253790022-10300-1-git-send-email-kirr@mns.spb.ru> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24fs: Make unload_nls() NULL pointer safeThomas Gleixner
Most call sites of unload_nls() do: if (nls) unload_nls(nls); Check the pointer inside unload_nls() like we do in kfree() and simplify the call sites. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steve French <sfrench@us.ibm.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Petr Vandrovec <vandrove@vc.cvut.cz> Cc: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24freeze_bdev: grab active reference to frozen superblocksChristoph Hellwig
Currently we held s_umount while a filesystem is frozen, despite that we might return to userspace and unlock it from a different process. Instead grab an active reference to keep the file system busy and add an explicit check for frozen filesystems in remount and reject the remount instead of blocking on s_umount. Add a new get_active_super helper to super.c for use by freeze_bdev that grabs an active reference to a superblock from a given block device. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24freeze_bdev: kill bd_mount_semChristoph Hellwig
Now that we have the freeze count there is not much reason for bd_mount_sem anymore. The actual freeze/thaw operations are serialized using the bd_fsfreeze_mutex, and the only other place we take bd_mount_sem is get_sb_bdev which tries to prevent mounting a filesystem while the block device is frozen. Instead of add a check for bd_fsfreeze_count and return -EBUSY if a filesystem is frozen. While that is a change in user visible behaviour a failing mount is much better for this case rather than having the mount process stuck uninterruptible for a long time. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24exofs: remove BKL from super operationsBoaz Harrosh
the two places inside exofs that where taking the BKL were: exofs_put_super() - .put_super and exofs_sync_fs() - which is .sync_fs and is also called from .write_super. Now exofs_sync_fs() is protected from itself by also taking the sb_lock. exofs_put_super() directly calls exofs_sync_fs() so there is no danger between these two either. In anyway there is absolutely nothing dangerous been done inside exofs_sync_fs(). Unless there is some subtle race with the actual lifetime of the super_block in regard to .put_super and some other parts of the VFS. Which is highly unlikely. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24fs/romfs: correct error-handling codeJulia Lawall
romfs_fill_super() assumes that romfs_iget() returns NULL when it fails. romfs_iget() actually returns ERR_PTR(-ve) in that case... Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: seq_file: add helpers for data fillingMiklos Szeredi
Add two helpers that allow access to the seq_file's own buffer, but hide the internal details of seq_files. This allows easier implementation of special purpose filling functions. It also cleans up some existing functions which duplicated the seq_file logic. Make these inline functions in seq_file.h, as suggested by Al. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: remove redundant position check in do_sendfileJeff Layton
As Johannes Weiner pointed out, one of the range checks in do_sendfile is redundant and is already checked in rw_verify_area. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Robert Love <rlove@google.com> Cc: Mandeep Singh Baines <msb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: change sb->s_maxbytes to a loff_tJeff Layton
sb->s_maxbytes is supposed to indicate the maximum size of a file that can exist on the filesystem. It's declared as an unsigned long long. Even if a filesystem has no inherent limit that prevents it from using every bit in that unsigned long long, it's still problematic to set it to anything larger than MAX_LFS_FILESIZE. There are places in the kernel that cast s_maxbytes to a signed value. If it's set too large then this cast makes it a negative number and generally breaks the comparison. Change s_maxbytes to be loff_t instead. That should help eliminate the temptation to set it too large by making it a signed value. Also, add a warning for couple of releases to help catch filesystems that set s_maxbytes too large. Eventually we can either convert this to a BUG() or just remove it and in the hope that no one will get it wrong now that it's a signed value. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Robert Love <rlove@google.com> Cc: Mandeep Singh Baines <msb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: explicitly cast s_maxbytes in fiemap_check_rangesJeff Layton
If fiemap_check_ranges is passed a large enough value, then it's possible that the value would be cast to a signed value for comparison against s_maxbytes when we change it to loff_t. Make sure that doesn't happen by explicitly casting s_maxbytes to an unsigned value for the purposes of comparison. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robert Love <rlove@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mandeep Singh Baines <msb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24libfs: return error code on failed attr setWu Fengguang
Currently all simple_attr.set handlers return 0 on success and negative codes on error. Fix simple_attr_write() to return these error codes. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24seq_file: return a negative error code when seq_path_root() fails.Tetsuo Handa
seq_path_root() is returning a return value of successful __d_path() instead of returning a negative value when mangle_path() failed. This is not a bug so far because nobody is using return value of seq_path_root(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: optimize touch_time() tooAndi Kleen
Do a similar optimization as earlier for touch_atime. Getting the lock in mnt_get_write is relatively costly, so try all avenues to avoid it first. This patch is careful to still only update inode fields inside the lock region. This didn't show up in benchmarks, but it's easy enough to do. [akpm@linux-foundation.org: fix typo in comment] [hugh.dickins@tiscali.co.uk: fix inverted test of mnt_want_write_file()] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Valerie Aurora <vaurora@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: optimization for touch_atime()Andi Kleen
Some benchmark testing shows touch_atime to be high up in profile logs for IO intensive workloads. Most likely that's due to the lock in mnt_want_write(). Unfortunately touch_atime first takes the lock, and then does all the other tests that could avoid atime updates (like noatime or relatime). Do it the other way round -- first try to avoid the update and only then if that didn't succeed take the lock. That works because none of the atime avoidance tests rely on locking. This also eliminates a goto. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Christoph Hellwig <hch@infradead.org> Reviewed-by: Valerie Aurora <vaurora@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24vfs: split generic_forget_inode() so that hugetlbfs does not have to copy itJan Kara
Hugetlbfs needs to do special things instead of truncate_inode_pages(). Currently, it copied generic_forget_inode() except for truncate_inode_pages() call which is asking for trouble (the code there isn't trivial). So create a separate function generic_detach_inode() which does all the list magic done in generic_forget_inode() and call it from hugetlbfs_forget_inode(). Signed-off-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24fs/inode.c: add dev-id and inode number for debugging in init_special_inode()Manish Katiyar
Add device-id and inode number for better debugging. This was suggested by Andreas in one of the threads http://article.gmane.org/gmane.comp.file-systems.ext4/12062 . "If anyone has a chance, fixing this error message to be not-useless would be good... Including the device name and the inode number would help track down the source of the problem." Signed-off-by: Manish Katiyar <mkatiyar@gmail.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24libfs: make simple_read_from_buffer conventionalSteven Rostedt
Impact: have simple_read_from_buffer conform to standards It was brought to my attention by Andrew Morton, Theodore Tso, and H. Peter Anvin that a read from userspace should only return -EFAULT if nothing was actually read. Looking at the simple_read_from_buffer I noticed that this function does not conform to that rule. This patch fixes that function. [akpm@linux-foundation.org: simplification suggested by hpa] [hpa@zytor.com: fix count==0 handling] Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24x86: early_printk: Protect against using the same device twiceJason Wessel
If you use the kernel argument: earlyprintk=serial,ttyS0,115200 This will cause a recursive hang printing the same line again and again: BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data) BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved) BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved) bootconsole [earlyser0] enabled Linux version 2.6.31-07863-gb64ada6 (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009 Linux version 2.6.31-07863-gb64ada6 (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009 Linux version 2.6.31-07863-gb64ada6 (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009 Linux version 2.6.31-07863-gb64ada6 (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009 Linux version 2.6.31-07863-gb64ada6 (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009 Instead warn the end user that they specified the device a second time, and ignore that second console. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <gregkh@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4ABAAB89.1080407@windriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24Merge branch 'linus' into x86/urgentIngo Molnar
Merge reason: Queueing up dependent early-printk fix. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24perf tools: Handle relative paths while loading module symbolsMike Galbraith
Inform util/module.c::mod_dso__load_module_paths() that relative paths do exist in some modules.dep, and make it fail noisily should it encounter a path that it doesn't understand, or a module it cannot open. Reported-by: Avi Kivity <avi@redhat.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: rostedt@goodmis.org Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <1253779628.10513.8.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24x86: Reduce verbosity of "PAT enabled" kernel messageRoland Dreier
On modern systems, the kernel prints the message x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 once for every CPU. This gets kind of ridiculous on huge systems; for example, on a 64-thread system I was lucky enough to get: dmesg| grep 'PAT enabled' | wc 64 704 5174 There is already a BUG() if non-boot CPUs have PAT capabilities that don't match the boot CPU, so just print the message on the boot CPU. (I kept the print after the wrmsrl() that enables PAT, so that the log output continues to mean that the system survived enabling PAT on the boot CPU) Signed-off-by: Roland Dreier <rolandd@cisco.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <adavdj92sso.fsf@cisco.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24x86: Reduce verbosity of "TSC is reliable" messageRoland Dreier
On modern systems, the kernel prints the message Skipping synchronization checks as TSC is reliable. once for every non-boot CPU. This gets kind of ridiculous on huge systems; for example, on a 64-thread system I was lucky enough to get: $ dmesg | grep 'TSC is reliable' | wc 63 567 4221 There's no point to doing this for every CPU, since the code is just checking the boot CPU anyway, so change this to a printk_once() to make the message appears only once. Signed-off-by: Roland Dreier <rolandd@cisco.com> LKML-Reference: <adazl8l2swc.fsf@cisco.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24sh: Fix up uninitialized variable use caught by gcc 4.4.Paul Mundt
In the unaligned kernel exception fixup case the printk() was ordered before the copy_from_user(), resulting in a nonsensical instruction value. This fixes up the ordering properly. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-09-24sh: Handle unaligned 16-bit instructions on SH-2A.Paul Mundt
This adds some sanity checking in the unaligned instruction handler to verify the instruction size, which enables basic support for 16-bit fixups on SH-2A parts. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-09-24microblaze: Disable heartbeat/enable emaclite in defconfigsMichal Simek
I need to disable heartbeat function because this features breaks testing in Qemu. Signed-off-by: Michal Simek <monstr@monstr.eu>
2009-09-24microblaze: Support simpleImage.dts make targetMichal Simek
Instead of remembering to specify DTB= on the make commandline, this commit allows the much friendlier make simpleImage.<dts> where <dts>.dts is expected to be found in arch/microblaze/boot/dts/ The resulting vmlinux, with the compiled DTS linked in, will be copied to boot/simpleImage.<dts> This mirrors the same functionality as on PowerPC, albeit achieving it in a slightly different way. + strip simpleImage file The size of output file is very similar to linux.bin. vmlinux - full elf without fdt blob simpleImage.<dtb name>.unstrip - full elf with fdt blob simpleImage.<dtb name> - stripped elf with fdt blob Add symlink to generic system.dts in platform folder Signed-off-by: John Williams <john.williams@petalogix.com> Signed-off-by: Michal Simek <monstr@monstr.eu>
2009-09-24[PATCH] Fix idle time field in /proc/uptimeMichael Abbott
Git commit 79741dd changes idle cputime accounting, but unfortunately the /proc/uptime file hasn't caught up. Here the idle time calculation from /proc/stat is copied over. Signed-off-by: Michael Abbott <michael.abbott@diamond.ac.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-24lsm: Use a compressed IPv6 string format in audit eventsPaul Moore
Currently the audit subsystem prints uncompressed IPv6 addresses which not only differs from common usage but also results in ridiculously large audit strings which is not a good thing. This patch fixes this by simply converting audit to always print compressed IPv6 addresses. Old message example: audit(1253576792.161:30): avc: denied { ingress } for saddr=0000:0000:0000:0000:0000:0000:0000:0001 src=5000 daddr=0000:0000:0000:0000:0000:0000:0000:0001 dest=35502 netif=lo scontext=system_u:object_r:unlabeled_t:s15:c0.c1023 tcontext=system_u:object_r:lo_netif_t:s0-s15:c0.c1023 tclass=netif New message example: audit(1253576792.161:30): avc: denied { ingress } for saddr=::1 src=5000 daddr=::1 dest=35502 netif=lo scontext=system_u:object_r:unlabeled_t:s15:c0.c1023 tcontext=system_u:object_r:lo_netif_t:s0-s15:c0.c1023 tclass=netif Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24Audit: send signal info if selinux is disabledEric Paris
Audit will not respond to signal requests if selinux is disabled since it is unable to translate the 0 sid from the sending process to a context. This patch just doesn't send the context info if there isn't any. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24Audit: rearrange audit_context to save 16 bytes per structEric Paris
pahole pointed out that on x86_64 struct audit_context can be rearrainged to save 16 bytes per struct. Since we have an audit_context per task this can acually be a pretty significant gain. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24Audit: reorganize struct audit_watch to save 8 bytesEric Paris
pahole showed that struct audit_watch had two holes: struct audit_watch { atomic_t count; /* 0 4 */ /* XXX 4 bytes hole, try to pack */ char * path; /* 8 8 */ dev_t dev; /* 16 4 */ /* XXX 4 bytes hole, try to pack */ long unsigned int ino; /* 24 8 */ struct audit_parent * parent; /* 32 8 */ struct list_head wlist; /* 40 16 */ struct list_head rules; /* 56 16 */ /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */ /* size: 72, cachelines: 2, members: 7 */ /* sum members: 64, holes: 2, sum holes: 8 */ /* last cacheline: 8 bytes */ }; /* definitions: 1 */ by moving dev after count we save 8 bytes, actually improving cacheline usage. There are typically very few of these in the kernel so it won't be a large savings, but it's a good thing no matter what. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24sh: mach-ecovec24: Add active low setting for sh_ethKuninori Morimoto
Signed-off-by: Kuninori Morimoto <morimoto.kuninori@renesas.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-09-24sh: includecheck fix: dwarf.cJaswinder Singh Rajput
fix the following 'make includecheck' warning: arch/sh/kernel/dwarf.c: asm/dwarf.h is included more than once. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-09-24Fix build of cpm_uart due to core changesBenjamin Herrenschmidt
Commit ebd2c8f6d2ec4012c267ecb95e72a57b8355a705 "serial: kill off uart_info" broke the build of this driver, this fixes it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc/8xx: Fix regression introduced by cache coherency rewriteRex Feany
After upgrading to the latest kernel on my mpc875 userspace started running incredibly slow (hours to get to a shell, even!). I tracked it down to commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36, that patch removed a work-around for the 8xx. Adding it back makes my problem go away. Signed-off-by: Rex Feany <rfeany@mrv.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc/4xx: Fix erroneous xmon warning on PowerPC 4xxJosh Boyer
The xmon code relies on MSR_RI being non-zero to indicate that an exception is recoverable. If it is not, it prints a warning message. However, the PowerPC 4xx cores do not have an MSR_RI bit and this warning is produced for every xmon event. This introduces an unrecoverable_excp function to determine if an exception is recoverable or not. This gets rid of the erroneous warnings on 4xx. Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc/mm: Fix 40x and 8xx vs. _PAGE_SPECIALBenjamin Herrenschmidt
The test to check whether we have _PAGE_SPECIAL defined is broken, since we always define it, just not always to a meaninful value :-) That broke 8xx and 40x under some circumstances. This fixes it by adding _PAGE_SPECIAL for both of these since they had a free PTE bit, and removing the condition around advertising it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc: Cleanup linker script using new linker script macros.Tim Abbott
Signed-off-by: Tim Abbott <tabbott@ksplice.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@ozlabs.org Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc: Fix ibm,client-architecture-support printoutAnton Blanchard
On machines without the ibm,client-architecture-support call we were missing a newline. We may as well print the full name in all its glory too - its ibm,client-architecture-support, not ibm,client-architecture as I mistakenly wrote (a name only an IBM architect could love). For my penance I will write out ibm,client-architecture-support 100 times. Before: Calling ibm,client-architecture...command line: root=/dev/sda6 console=hvc0 quiet After: Calling ibm,client-architecture-support... not implemented command line: root=/dev/sda6 console=hvc0 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc: Increase NODES_SHIFT on 64bit from 4 to 8Anton Blanchard
Some System p configurations can already have more than 16 nodes so we need to increase NODES_SHIFT. I chose 256 to give us some room to grow in the future, although we can look at something smaller if the memory bloat is considered too much. Unless we clamp MAX_ACTIVE_REGIONS we end up with 300kB of extra bloat in early_node_map in mm/page_alloc.c: < 6144 early_node_map > 307200 early_node_map due to: #if MAX_NUMNODES >= 32 /* If there can be many nodes, allow up to 50 holes per node */ #define MAX_ACTIVE_REGIONS (MAX_NUMNODES*50) #else /* By default, allow up to 256 distinct regions */ #define MAX_ACTIVE_REGIONS 256 Since our memory is mostly contiguous it seems reasonable to keep this at 256 for now. I also set 32bit to 32 to save space (is there any chance a 32bit system will have more than 32 discontiguous memory ranges?). Even with that fixed we have a few data structures that grow: < 896 bootmem_node_data > 14336 bootmem_node_data < 1280 node_devices > 20480 node_devices < 25088 kmalloc_caches > 59648 kmalloc_caches < 1632 hstates > 21792 hstates Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc/perf_counter: Fix vdso detectionAnton Blanchard
perf_counter uses arch_vma_name() to detect a vdso region which in turn uses current->mm->context.vdso_base. We need to initialise this before doing the mmap or else we fail to detect the vdso. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc: Move 64bit heap above 1TB on machines with 1TB segmentsAnton Blanchard
If we are using 1TB segments and we are allowed to randomise the heap, we can put it above 1TB so it is backed by a 1TB segment. Otherwise the heap will be in the bottom 1TB which always uses 256MB segments and this may result in a performance penalty. This functionality is disabled when heap randomisation is turned off: echo 1 > /proc/sys/kernel/randomize_va_space which may be useful when trying to allocate the maximum amount of 16M or 16G pages. On a microbenchmark that repeatedly touches 32GB of memory with a stride of 256MB + 4kB (designed to stress 256MB segments while still mapping nicely into the L1 cache), we see the improvement: Force malloc to use heap all the time: # export MALLOC_MMAP_MAX_=0 MALLOC_TRIM_THRESHOLD_=-1 Disable heap randomization: # echo 1 > /proc/sys/kernel/randomize_va_space # time ./test 12.51s Enable heap randomization: # echo 2 > /proc/sys/kernel/randomize_va_space # time ./test 1.70s Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc: Change archdata dma_data to a unionBecky Bruce
Sometimes this is used to hold a simple offset, and sometimes it is used to hold a pointer. This patch changes it to a union containing void * and dma_addr_t. get/set accessors are also provided, because it was getting a bit ugly to get to the actual data. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc: Rename get_dma_direct_offset get_dma_offsetBecky Bruce
The former is no longer really accurate with the swiotlb case now a possibility. I also move it into dma-mapping.h - it no longer needs to be in dma.c, and there are about to be some more accessors that should all end up in the same place. A comment is added to indicate that this function is not used in configs where there is no simple dma offset, such as the iommu case. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc/mm: Remove duplicated #includeHuang Weiyi
Remove duplicated #include('s) in arch/powerpc/mm/tlb_low_64e.S Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc/book3e-64: Remove duplicated #includeHuang Weiyi
Remove duplicated #include('s) in arch/powerpc/kernel/exceptions-64e.S Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24powerpc: Check for unsupported relocs when using CONFIG_RELOCATABLETony Breeds
When using CONFIG_RELOCATABLE, we build the kernel as a position independent executable. The kernel then uses a little bit of relocation code to relocate itself. That code only deals with R_PPC64_RELATIVE relocations though. If for some reason you use assembly constructs such as LOAD_REG_IMMEDIATE() to load the address of a symbol, you'll generate different kinds of relocations that won't be processed properly and bad things will happen. (We have 2 such bugs today). The perl script tries to filter out "known" bad ones. It's possible that we are missing some in the case of a weak function that nobody implements, we'll see if we get false positive and fix it. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>