summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2008-07-19tcp: options clean upAdam Langley
This should fix the following bugs: * Connections with MD5 signatures produce invalid packets whenever SACK options are included * MD5 signatures are counted twice in the MSS calculations Behaviour changes: * A SYN with MD5 + SACK + TS elicits a SYNACK with MD5 + SACK This is because we can't fit any SACK blocks in a packet with MD5 + TS options. There was discussion about disabling SACK rather than TS in order to fit in better with old, buggy kernels, but that was deemed to be unnecessary. * SYNs with MD5 don't include a TS option See above. Additionally, it removes a bunch of duplicated logic for calculating options, which should help avoid these sort of issues in the future. Signed-off-by: Adam Langley <agl@imperialviolet.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-19tcp: Fix MD5 signatures for non-linear skbsAdam Langley
Currently, the MD5 code assumes that the SKBs are linear and, in the case that they aren't, happily goes off and hashes off the end of the SKB and into random memory. Reported by Stephen Hemminger in [1]. Advice thanks to Stephen and Evgeniy Polyakov. Also includes a couple of missed route_caps from Stephen's patch in [2]. [1] http://marc.info/?l=linux-netdev&m=121445989106145&w=2 [2] http://marc.info/?l=linux-netdev&m=121459157816964&w=2 Signed-off-by: Adam Langley <agl@imperialviolet.org> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18sctp: Update sctp global memory limit allocations.Vlad Yasevich
Update sctp global memory limit allocations to be the same as TCP. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18sctp: remove unnecessary byteshifting, calculate directly in big-endianHarvey Harrison
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18sctp: Allow only 1 listening socket with SO_REUSEADDRVlad Yasevich
When multiple socket bind to the same port with SO_REUSEADDR, only 1 can be listining. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18sctp: Do not leak memory on multiple listen() callsVlad Yasevich
SCTP permits multiple listen call and on subsequent calls we leak he memory allocated for the crypto transforms. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18sctp: Support ipv6only AF_INET6 sockets.Vlad Yasevich
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18sctp: Prevent uninitialized memory accessFlorian Westphal
valgrind reports uninizialized memory accesses when running sctp inside the network simulation cradle simulator: Conditional jump or move depends on uninitialised value(s) at 0x570E34A: sctp_assoc_sync_pmtu (associola.c:1324) by 0x57427DA: sctp_packet_transmit (output.c:403) by 0x5710EFF: sctp_outq_flush (outqueue.c:824) by 0x5710B88: sctp_outq_uncork (outqueue.c:701) by 0x5745262: sctp_cmd_interpreter (sm_sideeffect.c:1548) by 0x57444B7: sctp_side_effects (sm_sideeffect.c:976) by 0x5744460: sctp_do_sm (sm_sideeffect.c:945) by 0x572157D: sctp_primitive_ASSOCIATE (primitive.c:94) by 0x5725C04: __sctp_connect (socket.c:1094) by 0x57297DC: sctp_connect (socket.c:3297) Conditional jump or move depends on uninitialised value(s) at 0x575D3A5: mod_timer (timer.c:630) by 0x5752B78: sctp_cmd_hb_timers_start (sm_sideeffect.c:555) by 0x5754133: sctp_cmd_interpreter (sm_sideeffect.c:1448) by 0x5753607: sctp_side_effects (sm_sideeffect.c:976) by 0x57535B0: sctp_do_sm (sm_sideeffect.c:945) by 0x571E9AE: sctp_endpoint_bh_rcv (endpointola.c:474) by 0x573347F: sctp_inq_push (inqueue.c:104) by 0x572EF93: sctp_rcv (input.c:256) by 0x5689623: ip_local_deliver_finish (ip_input.c:230) by 0x5689759: ip_local_deliver (ip_input.c:268) by 0x5689CAC: ip_rcv_finish (dst.h:246) #1 is due to "if (t->pmtu_pending)". 8a4794914f9cf2681235ec2311e189fe307c28c7 "[SCTP] Flag a pmtu change request" suggests it should be initialized to 0. #2 is the heartbeat timer 'expires' value, which is uninizialised, but test by mod_timer(). T3_rtx_timer seems to be affected by the same problem, so initialize it, too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18sctp: Don't abort initialization when CONFIG_PROC_FS=nFlorian Westphal
This puts CONFIG_PROC_FS defines around the proc init/exit functions and also avoids compiling proc.c if procfs is not supported. Also make SCTP_DBG_OBJCNT depend on procfs. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18tcp: RTT metrics scalingStephen Hemminger
Some of the metrics (RTT, RTTVAR and RTAX_RTO_MIN) are stored in kernel units (jiffies) and this leaks out through the netlink API to user space where the units for jiffies are unknown. This patches changes the kernel to convert to/from milliseconds. This changes the ABI, but milliseconds seemed like the most natural unit for these parameters. Values available via syscall in /proc/net/rt_cache and netlink will be in milliseconds. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18pkt_sched: Fix noqueue_qdisc initialization.David S. Miller
Like noop_qdisc, it needs a dummy backpointer and explicit qdisc->q.lock initialization. Based upon a report by Stephen Hemminger. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18pkt_sched: Manage qdisc list inside of root qdisc.David S. Miller
Idea is from Patrick McHardy. Instead of managing the list of qdiscs on the device level, manage it in the root qdisc of a netdev_queue. This solves all kinds of visibility issues during qdisc destruction. The way to iterate over all qdiscs of a netdev_queue is to visit the netdev_queue->qdisc, and then traverse it's list. The only special case is to ignore builting qdiscs at the root when dumping or doing a qdisc_lookup(). That was not needed previously because builtin qdiscs were not added to the device's qdisc_list. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18pkt_sched: Get rid of u32_list.David S. Miller
The u32_list is just an indirect way of maintaining a reference to a U32 node on a per-qdisc basis. Just add an explicit node pointer for u32 to struct Qdisc an do away with this global list. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18packet: add PACKET_RESERVE sockoptPatrick McHardy
Add new sockopt to reserve some headroom in the mmaped ring frames in front of the packet payload. This can be used f.i. when the VLAN header needs to be (re)constructed to avoid moving the entire payload. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18proc: consolidate per-net single-release callersPavel Emelyanov
They are symmetrical to single_open ones :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18proc: consolidate per-net single_open callersPavel Emelyanov
There are already 7 of them - time to kill some duplicate code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18proc: clean the ip_misc_proc_init and ip_proc_init_net error pathsPavel Emelyanov
After all this stuff is moved outside, this function can look better. Besides, I tuned the error path in ip_proc_init_net to make it have only 2 exit points, not 3. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18proc: show per-net ip_devconf.forwarding in /proc/net/snmpPavel Emelyanov
This one has become per-net long ago, but the appropriate file is per-net only now. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18proc: create /proc/net/snmp file in each netPavel Emelyanov
All the statistics shown in this file have been made per-net already. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18proc: create /proc/net/netstat file in each netPavel Emelyanov
Now all the shown in it statistics is netnsizated, time to show it in appropriate net. The appropriate net init/exit ops already exist - they make the sockstat file per net - so just extend them. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18ipv4: clean the init_ipv4_mibs error pathsPavel Emelyanov
After moving all the stuff outside this function it looks a bit ugly - make it look better. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18mib: put icmpmsg statistics on struct netPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18mib: put icmp statistics on struct netPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18mib: put udplite statistics on struct netPavel Emelyanov
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18mib: put udp statistics on struct netPavel Emelyanov
Similar to... ouch, I repeat myself. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18mib: put net statistics on struct netPavel Emelyanov
Similar to ip and tcp ones :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18mib: put ip statistics on struct netPavel Emelyanov
Similar to tcp one. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18mib: put tcp statistics on struct netPavel Emelyanov
Proc temporary uses stats from init_net. BTW, TCP_XXX_STATS are beautiful (w/o do { } while (0) facing) again :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18ipv4: add pernet mib operationsPavel Emelyanov
These ones are currently empty, but stuff from init_ipv4_mibs will sequentially migrate there. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-18Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/powerpc/booting-without-of.txt drivers/atm/Makefile drivers/net/fs_enet/fs_enet-main.c drivers/pci/pci-acpi.c net/8021q/vlan.c net/iucv/iucv.c
2008-07-17pkt_sched: Make default qdisc nonshared-multiqueue safe.David S. Miller
Instead of 'pfifo_fast' we have just plain 'fifo_fast'. No priority queues, just a straight FIFO. This is necessary in order to legally have a seperate qdisc per queue in multi-TX-queue setups, and thus get full parallelization. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17pkt_sched: Add multiqueue handling to qdisc_graft().David S. Miller
Move the destruction of the old queue into qdisc_graft(). When operating on a root qdisc (ie. "parent == NULL"), apply the operation to all queues. The caller has grabbed a single implicit reference for this graft, therefore when we apply the change to more than one queue we must grab additional qdisc references. Otherwise, we are operating on a class of a specific parent qdisc, and therefore no multiqueue handling is necessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17pkt_sched: Kill netdev_queue lock.David S. Miller
We can simply use the qdisc->q.lock for all of the qdisc tree synchronization. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17pkt_sched: Kill qdisc_lock_tree and qdisc_unlock_tree.David S. Miller
No longer used. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17pkt_sched: Make qdisc grafting locking more specific.David S. Miller
Lock the root of the qdisc being operated upon. All explicit references to qdisc_tree_lock() are now gone. The only remaining uses are via the sch_tree_{lock,unlock}() and tcf_tree_{lock,unlock}() macros. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17netdevice: Move qdisc_list back into net_device proper.David S. Miller
And give it it's own lock. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17pkt_sched: Kill qdisc_lock_tree usage in cls_route.cDavid S. Miller
It just wants the qdisc tree to be synchronized, so grabbing qdisc_root_lock() is sufficient. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17pkt_sched: Remove qdisc_lock_tree usage in cls_api.cDavid S. Miller
It just wants the qdisc tree for the filter to be synchronized. So just BH lock qdisc_root_lock(q) instead. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17pkt_sched: Use per-queue locking in shutdown_scheduler_queue.David S. Miller
This eliminates another qdisc_lock_tree user. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17pkt_sched: Perform bulk of qdisc destruction in RCU.David S. Miller
This allows less strict control of access to the qdisc attached to a netdev_queue. It is even allowed to enqueue into a qdisc which is in the process of being destroyed. The RCU handler will toss out those packets. We will need this to handle sharing of a qdisc amongst multiple TX queues. In such a setup the lock has to be shared, so will be inside of the qdisc itself. At which point the netdev_queue lock cannot be used to hard synchronize access to the ->qdisc pointer. One operation we have to keep inside of qdisc_destroy() is the list deletion. It is the only piece of state visible after the RCU quiesce period, so we have to undo it early and under the appropriate locking. The operations in the RCU handler do not need any looking because the qdisc tree is no longer visible to anything at that point. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17pkt_sched: dev_init_scheduler() does not need to lock qdisc tree.David S. Miller
We are registering the device, there is no way anyone can get at this object's qdiscs yet in any meaningful way. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17pkt_sched: Schedule qdiscs instead of netdev_queue.David S. Miller
When we have shared qdiscs, packets come out of the qdiscs for multiple transmit queues. Therefore it doesn't make any sense to schedule the transmit queue when logically we cannot know ahead of time the TX queue of the SKB that the qdisc->dequeue() will give us. Just for sanity I added a BUG check to make sure we never get into a state where the noop_qdisc is scheduled. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17pkt_sched: Add and use qdisc_root() and qdisc_root_lock().David S. Miller
When code wants to lock the qdisc tree state, the logic operation it's doing is locking the top-level qdisc that sits of the root of the netdev_queue. Add qdisc_root_lock() to represent this and convert the easiest cases. In order for this to work out in all cases, we have to hook up the noop_qdisc to a dummy netdev_queue. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17pkt_sched: Make QDISC_RUNNING a qdisc state.David S. Miller
Currently it is associated with a netdev_queue, but when we have qdisc sharing that no longer makes any sense. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17pkt_sched: Move gso_skb into Qdisc.David S. Miller
We liberate any dangling gso_skb during qdisc destruction. It really only matters for the root qdisc. But when qdiscs can be shared by multiple netdev_queue objects, we can't have the gso_skb in the netdev_queue any more. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17net: Implement simple sw TX hashing.David S. Miller
It just xor hashes over IPv4/IPv6 addresses and ports of transport. The only assumption it makes is that skb_network_header() is set correctly. With bug fixes from Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17mac80211: Reimplement WME using ->select_queue().David S. Miller
The only behavior change is that we do not drop packets under any circumstances. If that is absolutely needed, we could easily add it back. With cleanups and help from Johannes Berg. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17netdev: Add netdev->select_queue() method.David S. Miller
Devices or device layers can set this to control the queue selection performed by dev_pick_tx(). This function runs under RCU protection, which allows overriding functions to have some way of synchronizing with things like dynamic ->real_num_tx_queues adjustments. This makes the spinlock prefetch in dev_queue_xmit() a little bit less effective, but that's the price right now for correctness. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17net: Use queue aware tests throughout.David S. Miller
This effectively "flips the switch" by making the core networking and multiqueue-aware drivers use the new TX multiqueue structures. Non-multiqueue drivers need no changes. The interfaces they use such as netif_stop_queue() degenerate into an operation on TX queue zero. So everything "just works" for them. Code that really wants to do "X" to all TX queues now invokes a routine that does so, such as netif_tx_wake_all_queues(), netif_tx_stop_all_queues(), etc. pktgen and netpoll required a little bit more surgery than the others. In particular the pktgen changes, whilst functional, could be largely improved. The initial check in pktgen_xmit() will sometimes check the wrong queue, which is mostly harmless. The thing to do is probably to invoke fill_packet() earlier. The bulk of the netpoll changes is to make the code operate solely on the TX queue indicated by by the SKB queue mapping. Setting of the SKB queue mapping is entirely confined inside of net/core/dev.c:dev_pick_tx(). If we end up needing any kind of special semantics (drops, for example) it will be implemented here. Finally, we now have a "real_num_tx_queues" which is where the driver indicates how many TX queues are actually active. With IGB changes from Jeff Kirsher. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-17mac80211: Temporarily mark QoS support BROKEN.David S. Miller
We will undo this after a few changsets. Signed-off-by: David S. Miller <davem@davemloft.net>