summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2014-12-09tipc: fix missing spinlock init and nullptr oopsErik Hugne
commit 908344cdda80 ("tipc: fix bug in multicast congestion handling") introduced two bugs with the bclink wakeup function. This commit fixes the missing spinlock init for the waiting_sks list. We also eliminate the race condition between the waiting_sks length check/dequeue operations in tipc_bclink_wakeup_users by simply removing the redundant length check. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Acked-by: Tero Aho <Tero.Aho@coriant.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09net: avoid two atomic operations in fast clonesEric Dumazet
Commit ce1a4ea3f125 ("net: avoid one atomic operation in skb_clone()") took the wrong way to save one atomic operation. It is actually possible to avoid two atomic operations, if we do not change skb->fclone values, and only rely on clone_ref content to signal if the clone is available or not. skb_clone() can simply use the fast clone if clone_ref is 1. kfree_skbmem() can avoid the atomic_dec_and_test() if clone_ref is 1. Note that because we usually free the clone before the original skb, this particular attempt is only done for the original skb to have better branch prediction. SKB_FCLONE_FREE is removed. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Chris Mason <clm@fb.com> Cc: Sabrina Dubroca <sd@queasysnail.net> Cc: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09rtnetlink: delay RTM_DELLINK notification until after ndo_uninit()Mahesh Bandewar
The commit 56bfa7ee7c ("unregister_netdevice : move RTM_DELLINK to until after ndo_uninit") tried to do this ealier but while doing so it created a problem. Unfortunately the delayed rtmsg_ifinfo() also delayed call to fill_info(). So this translated into asking driver to remove private state and then query it's private state. This could have catastropic consequences. This change breaks the rtmsg_ifinfo() into two parts - one takes the precise snapshot of the device by called fill_info() before calling the ndo_uninit() and the second part sends the notification using collected snapshot. It was brought to notice when last link is deleted from an ipvlan device when it has free-ed the port and the subsequent .fill_info() call is trying to get the info from the port. kernel: [ 255.139429] ------------[ cut here ]------------ kernel: [ 255.139439] WARNING: CPU: 12 PID: 11173 at net/core/rtnetlink.c:2238 rtmsg_ifinfo+0x100/0x110() kernel: [ 255.139493] Modules linked in: ipvlan bonding w1_therm ds2482 wire cdc_acm ehci_pci ehci_hcd i2c_dev i2c_i801 i2c_core msr cpuid bnx2x ptp pps_core mdio libcrc32c kernel: [ 255.139513] CPU: 12 PID: 11173 Comm: ip Not tainted 3.18.0-smp-DEV #167 kernel: [ 255.139514] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 1.0.10 05/15/2012 kernel: [ 255.139515] 0000000000000009 ffff880851b6b828 ffffffff815d87f4 00000000000000e0 kernel: [ 255.139516] 0000000000000000 ffff880851b6b868 ffffffff8109c29c 0000000000000000 kernel: [ 255.139518] 00000000ffffffa6 00000000000000d0 ffffffff81aaf580 0000000000000011 kernel: [ 255.139520] Call Trace: kernel: [ 255.139527] [<ffffffff815d87f4>] dump_stack+0x46/0x58 kernel: [ 255.139531] [<ffffffff8109c29c>] warn_slowpath_common+0x8c/0xc0 kernel: [ 255.139540] [<ffffffff8109c2ea>] warn_slowpath_null+0x1a/0x20 kernel: [ 255.139544] [<ffffffff8150d570>] rtmsg_ifinfo+0x100/0x110 kernel: [ 255.139547] [<ffffffff814f78b5>] rollback_registered_many+0x1d5/0x2d0 kernel: [ 255.139549] [<ffffffff814f79cf>] unregister_netdevice_many+0x1f/0xb0 kernel: [ 255.139551] [<ffffffff8150acab>] rtnl_dellink+0xbb/0x110 kernel: [ 255.139553] [<ffffffff8150da90>] rtnetlink_rcv_msg+0xa0/0x240 kernel: [ 255.139557] [<ffffffff81329283>] ? rhashtable_lookup_compare+0x43/0x80 kernel: [ 255.139558] [<ffffffff8150d9f0>] ? __rtnl_unlock+0x20/0x20 kernel: [ 255.139562] [<ffffffff8152cb11>] netlink_rcv_skb+0xb1/0xc0 kernel: [ 255.139563] [<ffffffff8150a495>] rtnetlink_rcv+0x25/0x40 kernel: [ 255.139565] [<ffffffff8152c398>] netlink_unicast+0x178/0x230 kernel: [ 255.139567] [<ffffffff8152c75f>] netlink_sendmsg+0x30f/0x420 kernel: [ 255.139571] [<ffffffff814e0b0c>] sock_sendmsg+0x9c/0xd0 kernel: [ 255.139575] [<ffffffff811d1d7f>] ? rw_copy_check_uvector+0x6f/0x130 kernel: [ 255.139577] [<ffffffff814e11c9>] ? copy_msghdr_from_user+0x139/0x1b0 kernel: [ 255.139578] [<ffffffff814e1774>] ___sys_sendmsg+0x304/0x310 kernel: [ 255.139581] [<ffffffff81198723>] ? handle_mm_fault+0xca3/0xde0 kernel: [ 255.139585] [<ffffffff811ebc4c>] ? destroy_inode+0x3c/0x70 kernel: [ 255.139589] [<ffffffff8108e6ec>] ? __do_page_fault+0x20c/0x500 kernel: [ 255.139597] [<ffffffff811e8336>] ? dput+0xb6/0x190 kernel: [ 255.139606] [<ffffffff811f05f6>] ? mntput+0x26/0x40 kernel: [ 255.139611] [<ffffffff811d2b94>] ? __fput+0x174/0x1e0 kernel: [ 255.139613] [<ffffffff814e2129>] __sys_sendmsg+0x49/0x90 kernel: [ 255.139615] [<ffffffff814e2182>] SyS_sendmsg+0x12/0x20 kernel: [ 255.139617] [<ffffffff815df092>] system_call_fastpath+0x12/0x17 kernel: [ 255.139619] ---[ end trace 5e6703e87d984f6b ]--- Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Cc: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09tipc: drop tx side permission checksErik Hugne
Part of the old remote management feature is a piece of code that checked permissions on the local system to see if a certain operation was permitted, and if so pass the command to a remote node. This serves no purpose after the removal of remote management with commit 5902385a2440 ("tipc: obsolete the remote management feature") so we remove it. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09ipv6: remove useless spin_lock/spin_unlockDuan Jiong
xchg is atomic, so there is no necessary to use spin_lock/spin_unlock to protect it. At last, remove the redundant opt = xchg(&inet6_sk(sk)->opt, opt); statement. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2014-12-03 1) Fix a set but not used warning. From Fabian Frederick. 2) Currently we make sequence number values available to userspace only if we use ESN. Make the sequence number values also available for non ESN states. From Zhi Ding. 3) Remove socket policy hashing. We don't need it because socket policies are always looked up via a linked list. From Herbert Xu. 4) After removing socket policy hashing, we can use __xfrm_policy_link in xfrm_policy_insert. From Herbert Xu. 5) Add a lookup method for vti6 tunnels with wildcard endpoints. I forgot this when I initially implemented vti6. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08ethtool: Support for configurable RSS hash functionEyal Perry
This patch extends the set/get_rxfh ethtool-options for getting or setting the RSS hash function. It modifies drivers implementation of set/get_rxfh accordingly. This change also delegates the responsibility of checking whether a modification to a certain RX flow hash parameter is supported to the driver implementation of set_rxfh. User-kernel API is done through the new hfunc bitmask field in the ethtool_rxfh struct. A bit set in the hfunc field is corresponding to an index in the new string-set ETH_SS_RSS_HASH_FUNCS. Got approval from most of the relevant driver maintainers that their driver is using Toeplitz, and for the few that didn't answered, also assumed it is Toeplitz. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Ariel Elior <ariel.elior@qlogic.com> Cc: Prashant Sreedharan <prashant@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Cc: Hariprasad S <hariprasad@chelsio.com> Cc: Sathya Perla <sathya.perla@emulex.com> Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com> Cc: Ajit Khaparde <ajit.khaparde@emulex.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Bruce Allan <bruce.w.allan@intel.com> Cc: Carolyn Wyborny <carolyn.wyborny@intel.com> Cc: Don Skidmore <donald.c.skidmore@intel.com> Cc: Greg Rose <gregory.v.rose@intel.com> Cc: Matthew Vick <matthew.vick@intel.com> Cc: John Ronciak <john.ronciak@intel.com> Cc: Mitch Williams <mitch.a.williams@intel.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com> Cc: Shradha Shah <sshah@solarflare.com> Cc: Shreyas Bhatewara <sbhatewara@vmware.com> Cc: "VMware, Inc." <pv-drivers@vmware.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net_sched: cls_cgroup: remove unnecessary ifJiri Pirko
since head->handle == handle (checked before), just assign handle. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net_sched: cls_flow: remove duplicate assignmentsJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net_sched: cls_flow: remove faulty use of list_for_each_entry_rcuJiri Pirko
rcu variant is not correct here. The code is called by updater (rtnl lock is held), not by reader (no rcu_read_lock is held). Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net_sched: cls_bpf: remove faulty use of list_for_each_entry_rcuJiri Pirko
rcu variant is not correct here. The code is called by updater (rtnl lock is held), not by reader (no rcu_read_lock is held). Signed-off-by: Jiri Pirko <jiri@resnulli.us> ACKed-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net_sched: cls_bpf: remove unnecessary iteration and use passed argJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net_sched: cls_basic: remove unnecessary iteration and use passed argJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08tipc: convert name table read-write lock to RCUYing Xue
Convert tipc name table read-write lock to RCU. After this change, a new spin lock is used to protect name table on write side while RCU is applied on read side. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08tipc: remove unnecessary INIT_LIST_HEADYing Xue
When a list_head variable is seen as a new entry to be added to a list head, it's unnecessary to be initialized with INIT_LIST_HEAD(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08tipc: simplify relationship between name table lock and node lockYing Xue
When tipc name sequence is published, name table lock is released before name sequence buffer is delivered to remote nodes through its underlying unicast links. However, when name sequence is withdrawn, the name table lock is held until the transmission of the removal message of name sequence is finished. During the process, node lock is nested in name table lock. To prevent node lock from being nested in name table lock, while withdrawing name, we should adopt the same locking policy of publishing name sequence: name table lock should be released before message is sent. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08tipc: any name table member must be protected under name table lockYing Xue
As tipc_nametbl_lock is used to protect name_table structure, the lock must be held while all members of name_table structure are accessed. However, the lock is not obtained while a member of name_table structure - local_publ_count is read in tipc_nametbl_publish(), as a consequence, an inconsistent value of local_publ_count might be got. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08tipc: ensure all name sequences are properly protected with its lockYing Xue
TIPC internally created a name table which is used to store name sequences. Now there is a read-write lock - tipc_nametbl_lock to protect the table, and each name sequence saved in the table is protected with its private lock. When a name sequence is inserted or removed to or from the table, its members might need to change. Therefore, in normal case, the two locks must be held while TIPC operates the table. However, there are still several places where we only hold tipc_nametbl_lock without proprerly obtaining name sequence lock, which might cause the corruption of name sequence. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08tipc: ensure all name sequences are released when name table is stoppedYing Xue
As TIPC subscriber server is terminated before name table, no user depends on subscription list of name sequence when name table is stopped. Therefore, all name sequences stored in name table should be released whatever their subscriptions lists are empty or not, otherwise, memory leak might happen. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08tipc: make name table allocated dynamicallyYing Xue
Name table locking policy is going to be adjusted from read-write lock protection to RCU lock protection in the future commits. But its essential precondition is to convert the allocation way of name table from static to dynamic mode. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08tipc: remove size variable from publ_list structYing Xue
The size variable is introduced in publ_list struct to help us exactly calculate SKB buffer sizes needed by publications when all publications in name table are delivered in bulk in named_distribute(). But if publication SKB buffer size is assumed to MTU, the size variable in publ_list struct can be completely eliminated at the cost of wasting a bit memory space for last SKB. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Tero Aho <tero.aho@coriant.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08udp: Neaten and reduce size of compute_score functionsJoe Perches
The compute_score functions are a bit difficult to read. Neaten them a bit to reduce object sizes and make them a bit more intelligible. Return early to avoid indentation and avoid unnecessary initializations. (allyesconfig, but w/ -O2 and no profiling) $ size net/ipv[46]/udp.o.* text data bss dec hex filename 28680 1184 25 29889 74c1 net/ipv4/udp.o.new 28756 1184 25 29965 750d net/ipv4/udp.o.old 17600 1010 2 18612 48b4 net/ipv6/udp.o.new 17632 1010 2 18644 48d4 net/ipv6/udp.o.old Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08net-timestamp: allow reading recv cmsg on errqueue with origin tstampWillem de Bruijn
Allow reading of timestamps and cmsg at the same time on all relevant socket families. One use is to correlate timestamps with egress device, by asking for cmsg IP_PKTINFO. on AF_INET sockets, call the relevant function (ip_cmsg_recv). To avoid changing legacy expectations, only do so if the caller sets a new timestamping flag SOF_TIMESTAMPING_OPT_CMSG. on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already returned for all origins. only change is to set ifindex, which is not initialized for all error origins. In both cases, only generate the pktinfo message if an ifindex is known. This is not the case for ACK timestamps. The difference between the protocol families is probably a historical accident as a result of the different conditions for generating cmsg in the relevant ip(v6)_recv_error function: ipv4: if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) { ipv6: if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) { At one time, this was the same test bar for the ICMP/ICMP6 distinction. This is no longer true. Signed-off-by: Willem de Bruijn <willemb@google.com> ---- Changes v1 -> v2 large rewrite - integrate with existing pktinfo cmsg generation code - on ipv4: only send with new flag, to maintain legacy behavior - on ipv6: send at most a single pktinfo cmsg - on ipv6: initialize fields if not yet initialized The recv cmsg interfaces are also relevant to the discussion of whether looping packet headers is problematic. For v6, cmsgs that identify many headers are already returned. This patch expands that to v4. If it sounds reasonable, I will follow with patches 1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY (http://patchwork.ozlabs.org/patch/366967/) 2. sysctl to conditionally drop all timestamps that have payload or cmsg from users without CAP_NET_RAW. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08ipv4: warn once on passing AF_INET6 socket to ip_recv_errorWillem de Bruijn
One line change, in response to catching an occurrence of this bug. See also fix f4713a3dfad0 ("net-timestamp: make tcp_recvmsg call ...") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05net: sock: allow eBPF programs to be attached to socketsAlexei Starovoitov
introduce new setsockopt() command: setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd)) where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...) and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER setsockopt() calls bpf_prog_get() which increments refcnt of the program, so it doesn't get unloaded while socket is using the program. The same eBPF program can be attached to multiple sockets. User task exit automatically closes socket which calls sk_filter_uncharge() which decrements refcnt of eBPF program Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following batch contains netfilter updates for net-next. Basically, enhancements for xt_recent, skip zeroing of timer in conntrack, fix linking problem with recent redirect support for nf_tables, ipset updates and a couple of cleanups. More specifically, they are: 1) Rise maximum number per IP address to be remembered in xt_recent while retaining backward compatibility, from Florian Westphal. 2) Skip zeroing timer area in nf_conn objects, also from Florian. 3) Inspect IPv4 and IPv6 traffic from the bridge to allow filtering using using meta l4proto and transport layer header, from Alvaro Neira. 4) Fix linking problems in the new redirect support when CONFIG_IPV6=n and IP6_NF_IPTABLES=n. And ipset updates from Jozsef Kadlecsik: 5) Support updating element extensions when the set is full (fixes netfilter bugzilla id 880). 6) Fix set match with 32-bits userspace / 64-bits kernel. 7) Indicate explicitly when /0 networks are supported in ipset. 8) Simplify cidr handling for hash:*net* types. 9) Allocate the proper size of memory when /0 networks are supported. 10) Explicitly add padding elements to hash:net,net and hash:net,port, because the elements must be u32 sized for the used hash function. Jozsef is also cooking ipset RCU conversion which should land soon if they reach the merge window in time. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-03netfilter: ipset: Explicitly add padding elements to hash:net, net and ↵Jozsef Kadlecsik
hash:net, port, net The elements must be u32 sized for the used hash function. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03netfilter: ipset: Allocate the proper size of memory when /0 networks are ↵Jozsef Kadlecsik
supported Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03netfilter: ipset: Simplify cidr handling for hash:*net* typesJozsef Kadlecsik
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03netfilter: ipset: Indicate when /0 networks are supportedJozsef Kadlecsik
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03netfilter: ipset: Alignment problem between 64bit kernel 32bit userspaceJozsef Kadlecsik
Sven-Haegar Koch reported the issue: sims:~# iptables -A OUTPUT -m set --match-set testset src -j ACCEPT iptables: Invalid argument. Run `dmesg' for more information. In syslog: x_tables: ip_tables: set.3 match: invalid size 48 (kernel) != (user) 32 which was introduced by the counter extension in ipset. The patch fixes the alignment issue with introducing a new set match revision with the fixed underlying 'struct ip_set_counter_match' structure. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-03netfilter: ipset: Support updating extensions when the set is fullJozsef Kadlecsik
When the set was full (hash type and maxelem reached), it was not possible to update the extension part of already existing elements. The patch removes this limitation. Fixes: https://bugzilla.netfilter.org/show_bug.cgi?id=880 Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-12-02bridge: add brport flags to dflt bridge_getlinkScott Feldman
To allow brport device to return current brport flags set on port. Add returned flags to nested IFLA_PROTINFO netlink msg built in dflt getlink. With this change, netlink msg returned for bridge_getlink contains the port's offloaded flag settings (the port's SELF settings). Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02bridge: move private brport flags to if_bridge.h so port drivers can use flagsScott Feldman
Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02bridge: add API to notify bridge driver of learned FBD on offloaded deviceScott Feldman
When the swdev device learns a new mac/vlan on a port, it sends some async notification to the driver and the driver installs an FDB in the device. To give a holistic system view, the learned mac/vlan should be reflected in the bridge's FBD table, so the user, using normal iproute2 cmds, can view what is currently learned by the device. This API on the bridge driver gives a way for the swdev driver to install an FBD entry in the bridge FBD table. (And remove one). This is equivalent to the device running these cmds: bridge fdb [add|del] <mac> dev <dev> vid <vlan id> master This patch needs some extra eyeballs for review, in paricular around the locking and contexts. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02bridge: call netdev_sw_port_stp_update when bridge port STP status changesScott Feldman
To notify switch driver of change in STP state of bridge port, add new .ndo op and provide switchdev wrapper func to call ndo op. Use it in bridge code then. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02net-sysfs: expose physical switch id for particular deviceJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Reviewed-by: Thomas Graf <tgraf@suug.ch> Acked-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02rtnl: expose physical switch id for particular deviceJiri Pirko
The netdevice represents a port in a switch, it will expose IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with the same value belong to one physical switch. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Reviewed-by: Thomas Graf <tgraf@suug.ch> Acked-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02net: introduce generic switch devices supportJiri Pirko
The goal of this is to provide a possibility to support various switch chips. Drivers should implement relevant ndos to do so. Now there is only one ndo defined: - for getting physical switch id is in place. Note that user can use random port netdevice to access the switch. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Reviewed-by: Thomas Graf <tgraf@suug.ch> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02net: rename netdev_phys_port_id to more generic nameJiri Pirko
So this can be reused for identification of other "items" as well. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Reviewed-by: Thomas Graf <tgraf@suug.ch> Acked-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02net: make vid as a parameter for ndo_fdb_add/ndo_fdb_delJiri Pirko
Do the work of parsing NDA_VLAN directly in rtnetlink code, pass simple u16 vid to drivers from there. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02bridge: convert flags in fbd entry into bitfieldsJiri Pirko
Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02bridge: rename fdb_*_hw to fdb_*_hw_addr to avoid confusionJiri Pirko
The current name might seem that this actually offloads the fdb entry to hw. So rename it to clearly present that this for hardware address addition/removal. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2014-11-28netfilter: nf_log_ipv6: correct typo in module descriptionSteven Noonan
It incorrectly identifies itself as "IPv4" packet logging. Signed-off-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Several small fixes here: 1) Don't crash in tg3 driver when the number of tx queues has been configured to be different from the number of rx queues. From Thadeu Lima de Souza Cascardo. 2) VLAN filter not disabled properly in promisc mode in ixgbe driver, from Vlad Yasevich. 3) Fix OOPS on dellink op in VTI tunnel driver, from Xin Long. 4) IPV6 GRE driver WCCP code checks skb->protocol for ETH_P_IP instead of ETH_P_IPV6, whoops. From Yuri Chislov. 5) Socket matching in ping driver is buggy when packet AF does not match socket's AF. Fix from Jane Zhou. 6) Fix checksum calculation errors in VXLAN due to where the udp_tunnel6_xmit_skb() helper gets it's saddr/daddr from. From Alexander Duyck. 7) Fix 5G detection problem in rtlwifi driver, from Larry Finger. 8) Fix NULL deref in tcp_v{4,6}_send_reset, from Eric Dumazet. 9) Various missing netlink attribute verifications in bridging code, from Thomas Graf. 10) tcp_recvmsg() unconditionally calls ipv4 ip_recv_error even for ipv6 sockets, whoops. Fix from Willem de Bruijn" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits) net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socks bridge: Sanitize IFLA_EXT_MASK for AF_BRIDGE:RTM_GETLINK bridge: Add missing policy entry for IFLA_BRPORT_FAST_LEAVE net: Check for presence of IFLA_AF_SPEC net: Validate IFLA_BRIDGE_MODE attribute length bridge: Validate IFLA_BRIDGE_FLAGS attribute length stmmac: platform: fix default values of the filter bins setting net/mlx4_core: Limit count field to 24 bits in qp_alloc_res net: dsa: bcm_sf2: reset switch prior to initialization net: dsa: bcm_sf2: fix unmapping registers in case of errors tg3: fix ring init when there are more TX than RX channels tcp: fix possible NULL dereference in tcp_vX_send_reset() rtlwifi: Change order in device startup rtlwifi: rtl8821ae: Fix 5G detection problem Revert "netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse" vxlan: Fix boolean flip in VXLAN_F_UDP_ZERO_CSUM6_[TX|RX] ip6_udp_tunnel: Fix checksum calculation net-timestamp: Fix a documentation typo net/ping: handle protocol mismatching scenario af_packet: fix sparse warning ...
2014-11-27netfilter: combine IPv4 and IPv6 nf_nat_redirect code in one modulePablo Neira Ayuso
This resolves linking problems with CONFIG_IPV6=n: net/built-in.o: In function `redirect_tg6': xt_REDIRECT.c:(.text+0x6d021): undefined reference to `nf_nat_redirect_ipv6' Reported-by: Andreas Ruprecht <rupran@einserver.de> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27netfilter: nf_tables_bridge: set the pktinfo for IPv4/IPv6 trafficAlvaro Neira
This patch adds the missing bits to allow to match per meta l4proto from the bridge. Example: nft add rule bridge filter input ether type {ip, ip6} meta l4proto udp counter Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27netfilter: nf_tables_bridge: export nft_reject_ip*hdr_validate functionsAlvaro Neira
This patch exports the functions nft_reject_iphdr_validate and nft_reject_ip6hdr_validate to use it in follow up patches. These functions check if the IPv4/IPv6 header is correct. Signed-off-by: Alvaro Neira Ayuso <alvaroneay@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-27netfilter: conntrack: avoid zeroing timerFlorian Westphal
add a __nfct_init_offset annotation member to struct nf_conn to make it clear which members are covered by the memset when the conntrack is allocated. This avoids zeroing timer_list and ct_net; both are already inited explicitly. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>