summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)Author
2007-10-10[DCCP]: Convert dccps_timestamp_time to ktime_tArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[KTIME]: Introduce ktime_sub_ns and ktime_sub_usArnaldo Carvalho de Melo
First user will be the DCCP transport networking protocol. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[SCTP]: Rewrite of sctp buffer management codeNeil Horman
This patch introduces autotuning to the sctp buffer management code similar to the TCP. The buffer space can be grown if the advertised receive window still has room. This might happen if small message sizes are used, which is common in telecom environmens. New tunables are introduced that provide limits to buffer growth and memory pressure is entered if to much buffer spaces is used. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[ETHTOOL]: Introduce ->{get,set}_priv_flags, ETHTOOL_[GS]PFLAGSJeff Garzik
Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[ETHTOOL]: Introduce get_sset_count. Obsolete get_stats_count, self_test_countJeff Garzik
Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[ETHTOOL]: Add ETHTOOL_[GS]FLAGS sub-ioctlsJeff Garzik
Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET] netconsole: Support dynamic reconfiguration using configfsSatyam Sharma
Based upon initial work by Keiichi Kii <k-keiichi@bx.jp.nec.com>. This patch introduces support for dynamic reconfiguration (adding, removing and/or modifying parameters of netconsole targets at runtime) using a userspace interface exported via configfs. Documentation is also updated accordingly. Issues and brief design overview: (1) Kernel-initiated creation / destruction of kernel objects is not possible with configfs -- the lifetimes of the "config items" is managed exclusively from userspace. But netconsole must support boot/module params too, and these are parsed in kernel and hence netpolls must be setup from the kernel. Joel Becker suggested to separately manage the lifetimes of the two kinds of netconsole_target objects -- those created via configfs mkdir(2) from userspace and those specified from the boot/module option string. This adds complexity and some redundancy here and also means that boot/module param-created targets are not exposed through the configfs namespace (and hence cannot be updated / destroyed dynamically). However, this saves us from locking / refcounting complexities that would need to be introduced in configfs to support kernel-initiated item creation / destroy there. (2) In configfs, item creation takes place in the call chain of the mkdir(2) syscall in the driver subsystem. If we used an ioctl(2) to create / destroy objects from userspace, the special userspace program is able to fill out the structure to be passed into the ioctl and hence specify attributes such as local interface that are required at the time we set up the netpoll. For configfs, this information is not available at the time of mkdir(2). So, we keep all newly-created targets (via configfs) disabled by default. The user is expected to set various attributes appropriately (including the local network interface if required) and then write(2) "1" to the "enabled" attribute. Thus, netpoll_setup() is then called on the set parameters in the context of _this_ write(2) on the "enabled" attribute itself. This design enables the user to reconfigure existing netconsole targets at runtime to be attached to newly-come-up interfaces that may not have existed when netconsole was loaded or when the targets were actually created. All this effectively enables us to get rid of custom ioctls. (3) Ultra-paranoid configfs attribute show() and store() operations, with sanity and input range checking, using only safe string primitives, and compliant with the recommendations in Documentation/filesystems/sysfs.txt. (4) A new function netpoll_print_options() is created in the netpoll API, that just prints out the configured parameters for a netpoll structure. netpoll_parse_options() is modified to use that and it is also exported to be used from netconsole. Signed-off-by: Satyam Sharma <satyam@infradead.org> Acked-by: Keiichi Kii <k-keiichi@bx.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Update comment about highest_sack validityIlpo Järvinen
This stale info came from the original idea, which proved to be unnecessarily complex, sacked_out > 0 is easy to do and that when it's going to be needed anyway (it _can_ be valid also when sacked_out == 0 but there's not going to be a guarantee about it for now). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Move sack_ok access to obviously named funcs & cleanupIlpo Järvinen
Previously code had IsReno/IsFack defined as macros that were local to tcp_input.c though sack_ok field has user elsewhere too for the same purpose. This changes them to static inlines as preferred according the current coding style and unifies the access to sack_ok across multiple files. Magic bitops of sack_ok for FACK and DSACK are also abstracted to functions with appropriate names. Note: - One sack_ok = 1 remains but that's self explanary, i.e., it enables sack - Couple of !IsReno cases are changed to tcp_is_sack - There were no users for IsDSack => I dropped it Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Don't panic if S+L skb is detectedIlpo Järvinen
BUG_ON is an overkill. In fact, I was mislead by BUG_TRAP severity (equals to WARN_ON) which is much lower than BUG_ON's (that panics). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Reduce sacked_out with reno when purging write_queueIlpo Järvinen
Previously TCP had a transitional state during which reno counted segments that are already below the current window into sacked_out, which is now prevented. In addition, re-try now the unconditional S+L skb catching. This approach conservatively calls just remove_sack and leaves reset_sack() calls alone. The best solution to the whole problem would be to first calculate the new sacked_out fully (this patch does not move reno_sack_reset calls from original sites and thus does not implement this). However, that would require very invasive change to fastretrans_alert (perhaps even slicing it to two halves). Alternatively, all callers of tcp_packets_in_flight (i.e., users that depend on sacked_out) should be postponed until the new sacked_out has been calculated but it isn't any simpler alternative. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Left out sync->verify (the new meaning of it) & definifyIlpo Järvinen
Left_out was dropped a while ago, thus leaving verifying consistency of the "left out" as only task for the function in question. Thus make it's name more appropriate. In addition, it is intentionally converted to #define instead of static inline because the location of the invariant failure is the most important thing to have if this ever triggers. I think it would have been helpful e.g. in this case where the location of the failure point had to be based on some quesswork: http://lkml.org/lkml/2007/5/2/464 ...Luckily the guesswork seems to have proved to be correct. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Add tcp_left_out(tp) "back" to get cleaner looking linesIlpo Järvinen
tp->left_out got removed but nothing came to replace it back then (users just did addition by themselves), so add function for users now. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Tighten tcp_sock's belt, drop left_outIlpo Järvinen
It is easily calculable when needed and user are not that many after all. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Add tcp_dec_pcount_approx int variantIlpo Järvinen
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove itIlpo Järvinen
No other users exist for tcp_ecn.h. Very few things remain in tcp.h, for most TCP ECN functions callers reside within a single .c file and can be placed there. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Access to highest_sack obsoletes forward_cnt_hintIlpo Järvinen
In addition, added a reference about the purpose of the loop. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[TCP]: Add highest_sack seqno, points to globally highest SACKIlpo Järvinen
It is guaranteed to be valid only when !tp->sacked_out. In most cases this seqno is available in the last ACK but there is no guarantee for that. The new fast recovery loss marking algorithm needs this as entry point. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Generic Large Receive Offload for TCP trafficJan-Bernd Themann
This patch provides generic Large Receive Offload (LRO) functionality for IPv4/TCP traffic. LRO combines received tcp packets to a single larger tcp packet and passes them then to the network stack in order to increase performance (throughput). The interface supports two modes: Drivers can either pass SKBs or fragment lists to the LRO engine. Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Virtual ethernet device driver.Pavel Emelyanov
Veth stands for Virtual ETHernet. It is a simple tunnel driver that works at the link layer and looks like a pair of ethernet devices interconnected with each other. Mainly it allows to communicate between network namespaces but it can be used as is as well. The newlink callback is organized that way to make it easy to create the peer device in the separate namespace when we have them in kernel. This implementation uses another interface - the RTM_NRELINK message introduced by Patric. Bug fixes from Daniel Lezcano. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[RTNETLINK]: Introduce generic rtnl_create_link().Pavel Emelianov
This routine gets the parsed rtnl attributes and creates a new link with generic info (IFLA_LINKINFO policy). Its intention is to help the drivers, that need to create several links at once (like VETH). This is nothing but a copy-paste-ed part of rtnl_newlink() function that is responsible for creation of new device. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make NAPI polling independent of struct net_device objects.Stephen Hemminger
Several devices have multiple independant RX queues per net device, and some have a single interrupt doorbell for several queues. In either case, it's easier to support layouts like that if the structure representing the poll is independant from the net device itself. The signature of the ->poll() call back goes from: int foo_poll(struct net_device *dev, int *budget) to int foo_poll(struct napi_struct *napi, int budget) The caller is returned the number of RX packets processed (or the number of "NAPI credits" consumed if you want to get abstract). The callee no longer messes around bumping dev->quota, *budget, etc. because that is all handled in the caller upon return. The napi_struct is to be embedded in the device driver private data structures. Furthermore, it is the driver's responsibility to disable all NAPI instances in it's ->stop() device close handler. Since the napi_struct is privatized into the driver's private data structures, only the driver knows how to get at all of the napi_struct instances it may have per-device. With lots of help and suggestions from Rusty Russell, Roland Dreier, Michael Chan, Jeff Garzik, and Jamal Hadi Salim. Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra, Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan. [ Ported to current tree and all drivers converted. Integrated Stephen's follow-on kerneldoc additions, and restored poll_list handling to the old style to fix mutual exclusion issues. -DaveM ] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[MAC80211]: Add get_unaligned to ieee80211_get_radiotap_lenAndy Green
ieee80211_get_radiotap_len() tries to dereference radiotap length without taking care that it is completely unaligned and get_unaligned() is required. Signed-off-by: Andy Green <andy@warmcat.com> Signed-off-by: Jiri Benc <jbenc@suse.cz> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10[MAC80211]: implement ERP info change notificationsDaniel Drake
zd1211rw and bcm43xx are interested in being notified when ERP IE conditions change, so that they can reprogram a register which affects how control frames are transmitted. This patch adds an interface similar to the one that can be found in softmac. Signed-off-by: Daniel Drake <dsd@gentoo.org> Signed-off-by: Jiri Benc <jbenc@suse.cz> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10[MAC80211]: improved short preamble handlingDaniel Drake
Similarly to CTS protection, whether short preambles are used for 802.11b transmissions should be a per-subif setting, not device global. For STAs, this patch makes short preamble handling automatic based on the ERP IE. For APs, hostapd still uses the prism ioctls, but the write ioctl has been restricted to AP-only subifs. ieee80211_txrx_data.short_preamble (an unused field) was removed. Unfortunately, some API changes were required for the following functions: - ieee80211_generic_frame_duration - ieee80211_rts_duration - ieee80211_ctstoself_duration - ieee80211_rts_get - ieee80211_ctstoself_get Affected drivers were updated accordingly. Signed-off-by: Daniel Drake <dsd@gentoo.org> Signed-off-by: Jiri Benc <jbenc@suse.cz> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10[MAC80211]: Add LONG_RETRY flag to ieee80211_tx_controlIvo van Doorn
mac80211 informs the driver what the short and long retry values are through set_retry_limit(), but when packets are being transmitted it did not inform the driver which of the 2 retry limits should actually be used. Instead it sends the actual value, but for drivers that can only set the retry limit and the register and in the descriptor need to indicate which of the limits should be used this is not really useful. This patch will add a IEEE80211_TXCTL_LONG_RETRY_LIMIT flag to the ieee80211_tx_control structure. By default the short retry limit should be used but if the flag is set the long retry should be used. This does not prevent the driver to ignore the request for "no retry" packets, but at least those will be send out with the short retry limit. But there is no perfect cure for this problem.. :( Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: Jiri Benc <jbenc@suse.cz> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10[MAC80211]: improve locking of sta_info related structuresMichael Wu
The sta_info code has some awkward locking which prevents some driver callbacks from being allowed to sleep. This patch makes the locking more focused so code that calls driver callbacks are allowed to sleep. It also converts sta_lock to a rwlock. Signed-off-by: Michael Wu <flamingice@sourmilk.net> Signed-off-by: Jiri Benc <jbenc@suse.cz> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-10[MAC80211]: split RX handlers into own fileJohannes Berg
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Jiri Benc <jbenc@suse.cz> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-10-08Merge branch 'master' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [IPv6]: Fix ICMPv6 redirect handling with target multicast address [PKT_SCHED] cls_u32: error code isn't been propogated properly [ROSE]: Fix rose.ko oops on unload [TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKed
2007-10-08mm: set_page_dirty_balance() vs ->page_mkwrite()Peter Zijlstra
All the current page_mkwrite() implementations also set the page dirty. Which results in the set_page_dirty_balance() call to _not_ call balance, because the page is already found dirty. This allows us to dirty a _lot_ of pages without ever hitting balance_dirty_pages(). Not good (tm). Force a balance call if ->page_mkwrite() was successful. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-07[ROSE]: Fix rose.ko oops on unloadAlexey Dobriyan
Commit a3d384029aa304f8f3f5355d35f0ae274454f7cd aka "[AX.25]: Fix unchecked rose_add_loopback_neigh uses" transformed rose_loopback_neigh var into statically allocated one. However, on unload it will be kfree's which can't work. Steps to reproduce: modprobe rose rmmod rose BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008 printing eip: c014c664 *pde = 00000000 Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC Modules linked in: rose ax25 fan ufs loop usbhid rtc snd_intel8x0 snd_ac97_codec ehci_hcd ac97_bus uhci_hcd thermal usbcore button processor evdev sr_mod cdrom CPU: 0 EIP: 0060:[<c014c664>] Not tainted VLI EFLAGS: 00210086 (2.6.23-rc9 #3) EIP is at kfree+0x48/0xa1 eax: 00000556 ebx: c1734aa0 ecx: f6a5e000 edx: f7082000 esi: 00000000 edi: f9a55d20 ebp: 00200287 esp: f6a5ef28 ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0068 Process rmmod (pid: 1823, ti=f6a5e000 task=f7082000 task.ti=f6a5e000) Stack: f9a55d20 f9a5200c 00000000 00000000 00000000 f6a5e000 f9a5200c f9a55a00 00000000 bf818cf0 f9a51f3f f9a55a00 00000000 c0132c60 65736f72 00000000 f69f9630 f69f9528 c014244a f6a4e900 00200246 f7082000 c01025e6 00000000 Call Trace: [<f9a5200c>] rose_rt_free+0x1d/0x49 [rose] [<f9a5200c>] rose_rt_free+0x1d/0x49 [rose] [<f9a51f3f>] rose_exit+0x4c/0xd5 [rose] [<c0132c60>] sys_delete_module+0x15e/0x186 [<c014244a>] remove_vma+0x40/0x45 [<c01025e6>] sysenter_past_esp+0x8f/0x99 [<c012bacf>] trace_hardirqs_on+0x118/0x13b [<c01025b6>] sysenter_past_esp+0x5f/0x99 ======================= Code: 05 03 1d 80 db 5b c0 8b 03 25 00 40 02 00 3d 00 40 02 00 75 03 8b 5b 0c 8b 73 10 8b 44 24 18 89 44 24 04 9c 5d fa e8 77 df fd ff <8b> 56 08 89 f8 e8 84 f4 fd ff e8 bd 32 06 00 3b 5c 86 60 75 0f EIP: [<c014c664>] kfree+0x48/0xa1 SS:ESP 0068:f6a5ef28 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-07Don't do load-average calculations at even 5-second intervalsLinus Torvalds
It turns out that there are a few other five-second timers in the kernel, and if the timers get in sync, the load-average can get artificially inflated by events that just happen to coincide. So just offset the load average calculation it by a timer tick. Noticed by Anders Boström, for whom the coincidence started triggering on one of his machines with the JBD jiffies rounding code (JBD is one of the subsystems that also end up using a 5-second timer by default). Tested-by: Anders Boström <anders@bostrom.dyndns.org> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-05Remove unnecessary cast in prefetch()Serge Belyshev
It is ok to call prefetch() function with NULL argument, as specifically commented in include/linux/prefetch.h. But in standard C, it is invalid to dereference NULL pointer (see C99 standard 6.5.3.2 paragraph 4 and note #84). prefetch() has a memory reference for its argument. Newer gcc versions (4.3 and above) will use that to conclude that "x" argument is non-null and thus wreaking havok everywhere prefetch() was inlined. Fixed by removing cast and changing asm constraint. [ It seems in theory gcc 4.2 could miscompile this too; although no cases known. In 2.6.24 we should probably switch to __builtin_prefetch() instead, but this is a simpler fix for now. -- AK ] Signed-off-by: Serge Belyshev <belyshev@depni.sinp.msu.ru> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-03Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linusLinus Torvalds
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: [MIPS] Terminally fix local_{dec,sub}_if_positive [MIPS] Type proof reimplementation of cmpxchg. [MIPS] pg-r4k.c: Fix a typo in an R4600 v2 erratum workaround
2007-10-04Blackfin arch: fix PORT_J BUG for BF537/6 EMAC driver reported by Kalle ↵Michael Hennerich
Pokki <kalle.pokki@iki.fi> Cc: Kalle Pokki <kalle.pokki@iki.fi> Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Bryan Wu <bryan.wu@analog.com>
2007-10-04Blackfin arch: gpio pinmux and resource allocation API required by BF537 on ↵Michael Hennerich
chip ethernet mac driver Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Bryan Wu <bryan.wu@analog.com>
2007-10-03[MIPS] Terminally fix local_{dec,sub}_if_positiveRalf Baechle
They contain 64-bit instructions so wouldn't work on 32-bit kernels or 32-bit hardware. Since there are no users, blow them away. They probably were only ever created because there are atomic_sub_if_positive and atomic_dec_if_positive which exist only for sake of semaphores. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-03[MIPS] Type proof reimplementation of cmpxchg.Ralf Baechle
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-10-01Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linusLinus Torvalds
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: [MIPS] vmlinux.lds.S: Handle note sections [MIPS] Fix value of O_TRUNC
2007-10-01[MIPS] Fix value of O_TRUNCRalf Baechle
A "cleanup" almost two years ago deleted the old definition from <asm/fcntl.h>, so asm-generic/fcntl.h defaulted it to the the same value as FASYNC ... which happened to be the wrong thing. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-09-29i386: remove bogus comment about memory barrierNick Piggin
The comment being removed by this patch is incorrect and misleading. In the following situation: 1. load ... 2. store 1 -> X 3. wmb 4. rmb 5. load a <- Y 6. store ... 4 will only ensure ordering of 1 with 5. 3 will only ensure ordering of 2 with 6. Further, a CPU with strictly in-order stores will still only provide that 2 and 6 are ordered (effectively, it is the same as a weakly ordered CPU with wmb after every store). In all cases, 5 may still be executed before 2 is visible to other CPUs! The additional piece of the puzzle that mb() provides is the store/load ordering, which fundamentally cannot be achieved with any combination of rmb()s and wmb()s. This can be an unexpected result if one expected any sort of global ordering guarantee to barriers (eg. that the barriers themselves are sequentially consistent with other types of barriers). However sfence or lfence barriers need only provide an ordering partial ordering of memory operations -- Consider that wmb may be implemented as nothing more than inserting a special barrier entry in the store queue, or, in the case of x86, it can be a noop as the store queue is in order. And an rmb may be implemented as a directive to prevent subsequent loads only so long as their are no previous outstanding loads (while there could be stores still in store queues). I can actually see the occasional load/store being reordered around lfence on my core2. That doesn't prove my above assertions, but it does show the comment is wrong (unless my program is -- can send it out by request). So: mb() and smp_mb() always have and always will require a full mfence or lock prefixed instruction on x86. And we should remove this comment. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Paul McKenney <paulmck@us.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-28Merge branch 'master' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [TCP]: Fix MD5 signature handling on big-endian. [NET]: Zero length write() on socket should not simply return 0.
2007-09-28[TCP]: Fix MD5 signature handling on big-endian.David S. Miller
Based upon a report and initial patch by Peter Lieven. tcp4_md5sig_key and tcp6_md5sig_key need to start with the exact same members as tcp_md5sig_key. Because they are both cast to that type by tcp_v{4,6}_md5_do_lookup(). Unfortunately tcp{4,6}_md5sig_key use a u16 for the key length instead of a u8, which is what tcp_md5sig_key uses. This just so happens to work by accident on little-endian, but on big-endian it doesn't. Instead of casting, just place tcp_md5sig_key as the first member of the address-family specific structures, adjust the access sites, and kill off the ugly casts. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-27[MIPS] Fix CONFIG_BUILD_ELF64 kernels with symbols in CKSEG0.Ralf Baechle
The __pa() for those did assume that all symbols have XKPHYS values and the math fails for any other address range. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2007-09-26Revert "[PATCH] x86-64: fix x86_64-mm-sched-clock-share"Linus Torvalds
This reverts commit 184c44d2049c4db7ef6ec65794546954da2c6a0e. As noted by Dave Jones: "Linus, please revert the above cset. It doesn't seem to be necessary (it was added to fix a miscompile in 'make allnoconfig' which doesn't seem to be repeatable with it reverted) and actively breaks the ARM SA1100 framebuffer driver." Requested-by: Dave Jones <davej@redhat.com> Cc: Russell King <rmk+lkml@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-26Revert "x86-64: Disable local APIC timer use on AMD systems with C1E"Linus Torvalds
This reverts commit e66485d747505e9d960b864fc6c37f8b2afafaf0, since Rafael Wysocki noticed that the change only works for his in -mm, not in mainline (and that both "noapictimer" _and_ "apicmaintimer" are broken on his hardware, but that's apparently not a regression, just a symptom of the same issue that causes the automatic apic timer disable to not work). It turns out that it really doesn't work correctly on x86-64, since x86-64 doesn't use the generic clock events for timers yet. Thanks to Rafal for testing, and here's the ugly details on x86-64 as per Thomas: "I just looked into the code and the logic vs. noapictimer on SMP is completely broken. On i386 the noapictimer option not only disables the local APIC timer, it also registers the CPUs for broadcasting via IPI on SMP systems. The x86-64 code uses the broadcast only when the local apic timer is active, i.e. "noapictimer" is not on the command line. This defeats the whole purpose of "noapictimer". It should be there to make boxen work, where the local APIC timer actually has a hardware problem, e.g. the nx6325. The current implementation of x86_64 only fixes the ACPI c-states related problem where the APIC timer stops in C3(2), nothing else. On nx6325 and other AMD X2 equipped systems which have the C1E enabled we run into the following: PIT keeps jiffies (and the system) running, but the local APIC timer interrupts can get out of sync due to this C1E effect. I don't think this is a critical problem, but it is wrong nevertheless. I think it's safe to revert the C1E patch and postpone the fix to the clock events conversion." On further reflection, Thomas noted: "It's even worse than I thought on the first check: "noapictimer" on the command line of an SMP box prevents _ONLY_ the boot CPU apic timer from being used. But the secondary CPU is still unconditionally setting up the APIC timer and uses the non calibrated variable calibration_result, which is of course 0, to setup the APIC timer. Wreckage guaranteed." so we'll just have to wait for the x86 merge to hopefully fix this up for x86-64. Tested-and-requested-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-26x86-64: Disable local APIC timer use on AMD systems with C1EThomas Gleixner
commit 3556ddfa9284a86a59a9b78fe5894430f6ab4eef titled [PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E solves a problem with AMD dual core laptops e.g. HP nx6325 (Turion 64 X2) with C1E enabled: When both cores go into idle at the same time, then the system switches into C1E state, which is basically the same as C3. This stops the local apic timer. This was debugged right after the dyntick merge on i386 and despite the patch title it fixes only the 32 bit path. x86_64 is still missing this fix. It seems that mainline is not really affected by this issue, as the PIT is running and keeps jiffies incrementing, but that's just waiting for trouble. -mm suffers from this problem due to the x86_64 high resolution timer patches. This is a quick and dirty port of the i386 code to x86_64. I spent quite a time with Rafael to debug the -mm / hrt wreckage until someone pointed us to this. I really had forgotten that we debugged this half a year ago already. Sigh, is it just me or is there something yelling arch/x86 into my ear? Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-26fix sctp_del_bind_addr() last argument typeAl Viro
It gets pointer to fastcall function, expects a pointer to normal one and calls the sucker. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-26Merge branch 'master' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [PPP_MPPE]: Don't put InterimKey on the stack SCTP : Add paramters validity check for ASCONF chunk SCTP: Discard OOTB packetes with bundled INIT early. SCTP: Clean up OOTB handling and fix infinite loop processing SCTP: Explicitely discard OOTB chunks SCTP: Send ABORT chunk with correct tag in response to INIT ACK SCTP: Validate buffer room when processing sequential chunks [PATCH] mac80211: fix initialisation when built-in [PATCH] net/mac80211/wme.c: fix sparse warning [PATCH] cfg80211: fix initialisation if built-in [PATCH] net/wireless/sysfs.c: Shut up build warning
2007-09-25SCTP : Add paramters validity check for ASCONF chunkWei Yongjun
If ADDIP is enabled, when an ASCONF chunk is received with ASCONF paramter length set to zero, this will cause infinite loop. By the way, if an malformed ASCONF chunk is received, will cause processing to access memory without verifying. This is because of not check the validity of parameters in ASCONF chunk. This patch fixed this. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>