summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)Author
2015-01-05net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is definedHubert Sokolowski
Add checking whether the call to ndo_dflt_fdb_dump is needed. It is not expected to call ndo_dflt_fdb_dump unconditionally by some drivers (i.e. qlcnic or macvlan) that defines own ndo_fdb_dump. Other drivers define own ndo_fdb_dump and don't want ndo_dflt_fdb_dump to be called at all. At the same time it is desirable to call the default dump function on a bridge device. Fix attributes that are passed to dev->netdev_ops->ndo_fdb_dump. Add extra checking in br_fdb_dump to avoid duplicate entries as now filter_dev can be NULL. Following tests for filtering have been performed before the change and after the patch was applied to make sure they are the same and it doesn't break the filtering algorithm. [root@localhost ~]# cd /root/iproute2-3.18.0/bridge [root@localhost bridge]# modprobe dummy [root@localhost bridge]# ./bridge fdb add f1:f2:f3:f4:f5:f6 dev dummy0 [root@localhost bridge]# brctl addbr br0 [root@localhost bridge]# brctl addif br0 dummy0 [root@localhost bridge]# ip link set dev br0 address 02:00:00:12:01:04 [root@localhost bridge]# # show all [root@localhost bridge]# ./bridge fdb show 33:33:00:00:00:01 dev p2p1 self permanent 01:00:5e:00:00:01 dev p2p1 self permanent 33:33:ff:ac:ce:32 dev p2p1 self permanent 33:33:00:00:02:02 dev p2p1 self permanent 01:00:5e:00:00:fb dev p2p1 self permanent 33:33:00:00:00:01 dev p7p1 self permanent 01:00:5e:00:00:01 dev p7p1 self permanent 33:33:ff:79:50:53 dev p7p1 self permanent 33:33:00:00:02:02 dev p7p1 self permanent 01:00:5e:00:00:fb dev p7p1 self permanent f2:46:50:85:6d:d9 dev dummy0 master br0 permanent f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent 33:33:00:00:00:01 dev dummy0 self permanent f1:f2:f3:f4:f5:f6 dev dummy0 self permanent 33:33:00:00:00:01 dev br0 self permanent 02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent 02:00:00:12:01:04 dev br0 master br0 permanent [root@localhost bridge]# # filter by bridge [root@localhost bridge]# ./bridge fdb show br br0 f2:46:50:85:6d:d9 dev dummy0 master br0 permanent f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent 33:33:00:00:00:01 dev dummy0 self permanent f1:f2:f3:f4:f5:f6 dev dummy0 self permanent 33:33:00:00:00:01 dev br0 self permanent 02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent 02:00:00:12:01:04 dev br0 master br0 permanent [root@localhost bridge]# # filter by port [root@localhost bridge]# ./bridge fdb show brport dummy0 f2:46:50:85:6d:d9 master br0 permanent f2:46:50:85:6d:d9 vlan 1 master br0 permanent 33:33:00:00:00:01 self permanent f1:f2:f3:f4:f5:f6 self permanent [root@localhost bridge]# # filter by port + bridge [root@localhost bridge]# ./bridge fdb show br br0 brport dummy0 f2:46:50:85:6d:d9 master br0 permanent f2:46:50:85:6d:d9 vlan 1 master br0 permanent 33:33:00:00:00:01 self permanent f1:f2:f3:f4:f5:f6 self permanent [root@localhost bridge]# Signed-off-by: Hubert Sokolowski <hubert.sokolowski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05ip: Add offset parameter to ip_cmsg_recvTom Herbert
Add ip_cmsg_recv_offset function which takes an offset argument that indicates the starting offset in skb where data is being received from. This will be useful in the case of UDP and provided checksum to user space. ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of zero. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05ip: Add offset parameter to ip_cmsg_recvTom Herbert
Add ip_cmsg_recv_offset function which takes an offset argument that indicates the starting offset in skb where data is being received from. This will be useful in the case of UDP and provided checksum to user space. ip_cmsg_recv is an inline call to ip_cmsg_recv_offset with offset of zero. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05ip: IP cmsg cleanupTom Herbert
Move the IP_CMSG_* constants from ip_sockglue.c to inet_sock.h so that they can be referenced in other source files. Restructure ip_cmsg_recv to not go through flags using shift, check for flags by 'and'. This eliminates both the shift and a conditional per flag check. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05ip: Move checksum convert defines to inetTom Herbert
Move convert_csum from udp_sock to inet_sock. This allows the possibility that we can use convert checksum for different types of sockets and also allows convert checksum to be enabled from inet layer (what we'll want to do when enabling IP_CHECKSUM cmsg). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05netfilter: nf_ct_seqadj: print ack seq in the right host byte orderGao feng
new_start_seq and new_end_seq are network byte order, print the host byte order in debug message and print seq number as the type of unsigned int. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu>
2015-01-05netfilter: nfnetlink_cthelper: Remove 'const' and '&' to avoid warningsChen Gang
The related code can be simplified, and also can avoid related warnings (with allmodconfig under parisc): CC [M] net/netfilter/nfnetlink_cthelper.o net/netfilter/nfnetlink_cthelper.c: In function ‘nfnl_cthelper_from_nlattr’: net/netfilter/nfnetlink_cthelper.c:97:9: warning: passing argument 1 o ‘memcpy’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-array-qualifiers] memcpy(&help->data, nla_data(attr), help->helper->data_len); ^ In file included from include/linux/string.h:17:0, from include/uapi/linux/uuid.h:25, from include/linux/uuid.h:23, from include/linux/mod_devicetable.h:12, from ./arch/parisc/include/asm/hardware.h:4, from ./arch/parisc/include/asm/processor.h:15, from ./arch/parisc/include/asm/spinlock.h:6, from ./arch/parisc/include/asm/atomic.h:21, from include/linux/atomic.h:4, from ./arch/parisc/include/asm/bitops.h:12, from include/linux/bitops.h:36, from include/linux/kernel.h:10, from include/linux/list.h:8, from include/linux/module.h:9, from net/netfilter/nfnetlink_cthelper.c:11: ./arch/parisc/include/asm/string.h:8:8: note: expected ‘void *’ but argument is of type ‘const char (*)[]’ void * memcpy(void * dest,const void *src,size_t count); ^ Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu>
2015-01-05netfilter: nfnetlink: remove redundant variable nskbDuan Jiong
Actually after netlink_skb_clone() is called, the nskb and skb will point to the same thing, but they are used just like they are different, sometimes this is confusing, so i think there is no necessary to keep nskb anymore. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu>
2015-01-05Revert "mac80211: Fix accounting of the tailroom-needed counter"Johannes Berg
This reverts commit ca34e3b5c808385b175650605faa29e71e91991b. It turns out that the p54 and cw2100 drivers assume that there's tailroom even when they don't say they really need it. However, there's currently no way for them to explicitly say they do need it, so for now revert this. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=90331. Cc: stable@vger.kernel.org Fixes: ca34e3b5c808 ("mac80211: Fix accounting of the tailroom-needed counter") Reported-by: Christopher Chavez <chrischavez@gmx.us> Bisected-by: Larry Finger <Larry.Finger@lwfinger.net> Debugged-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-01-04geneve: Check family when reusing sockets.Jesse Gross
When searching for an existing socket to reuse, the address family is not taken into account - only port number. This means that an IPv4 socket could be used for IPv6 traffic and vice versa, which is sure to cause problems when passing packets. It is not possible to trigger this problem currently because the only user of Geneve creates just IPv4 sockets. However, that is likely to change in the near future. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04geneve: Remove socket hash table.Jesse Gross
The hash table for open Geneve ports is used only on creation and deletion time. It is not performance critical and is not likely to grow to a large number of items. Therefore, this can be changed to use a simple linked list. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04geneve: Simplify locking.Jesse Gross
The existing Geneve locking scheme was pulled over directly from VXLAN. However, VXLAN has a number of built in mechanisms which make the locking more complex and are unlikely to be necessary with Geneve. This simplifies the locking to use a basic scheme of a mutex when doing updates plus RCU on receive. In addition to making the code easier to read, this also avoids the possibility of a race when creating or destroying sockets since UDP sockets and the list of Geneve sockets are protected by different locks. After this change, the entire operation is atomic. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-04geneve: Remove workqueue.Jesse Gross
The work queue is used only to free the UDP socket upon destruction. This is not necessary with Geneve and generally makes the code more difficult to reason about. It also introduces nondeterministic behavior such as when a socket is rapidly deleted and recreated, which could fail as the the deletion happens asynchronously. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03Bluetooth: Introduce HCI_QUIRK_FIXUP_INQUIRY_MODE optionMarcel Holtmann
The HCI_QUIRK_FIXUP_INQUIRY_MODE option allows to force Inquiry Result with RSSI setting on controllers that do not indicate support for it, but where it is known to be fully functional. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-03Bluetooth: Remove dead code for manufacturer inquiry mode quirksMarcel Holtmann
There are some old Bluetooth modules from Silicon Wave and Broadcom which support Inquiry Result with RSSI, but do not advertise it. The core has quirks in the code to enable that inquiry mode. However as it stands right now, that code is not even executed since entering the function to determine which inquiry mode requires that the device has the feature bit for Inquiry Result with RSSI set in the first place. So this makes this dead code that hasn't work for a long time. In conclusion, just remove these extra quirks and simplify the setup of the inquiry mode to be inline and with that a lot easier to read and understand. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-03netlink: Lockless lookup with RCU grace period in socket releaseThomas Graf
Defers the release of the socket reference using call_rcu() to allow using an RCU read-side protected call to rhashtable_lookup() This restores behaviour and performance gains as previously introduced by e341694 ("netlink: Convert netlink_lookup() to use RCU protected hash table") without the side effect of severely delayed socket destruction. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03rhashtable: Per bucket locks & deferred expansion/shrinkingThomas Graf
Introduces an array of spinlocks to protect bucket mutations. The number of spinlocks per CPU is configurable and selected based on the hash of the bucket. This allows for parallel insertions and removals of entries which do not share a lock. The patch also defers expansion and shrinking to a worker queue which allows insertion and removal from atomic context. Insertions and deletions may occur in parallel to it and are only held up briefly while the particular bucket is linked or unzipped. Mutations of the bucket table pointer is protected by a new mutex, read access is RCU protected. In the event of an expansion or shrinking, the new bucket table allocated is exposed as a so called future table as soon as the resize process starts. Lookups, deletions, and insertions will briefly use both tables. The future table becomes the main table after an RCU grace period and initial linking of the old to the new table was performed. Optimization of the chains to make use of the new number of buckets follows only the new table is in use. The side effect of this is that during that RCU grace period, a bucket traversal using any rht_for_each() variant on the main table will not see any insertions performed during the RCU grace period which would at that point land in the future table. The lookup will see them as it searches both tables if needed. Having multiple insertions and removals occur in parallel requires nelems to become an atomic counter. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03nft_hash: Remove rhashtable_remove_pprev()Thomas Graf
The removal function of nft_hash currently stores a reference to the previous element during lookup which is used to optimize removal later on. This was possible because a lock is held throughout calling rhashtable_lookup() and rhashtable_remove(). With the introdution of deferred table resizing in parallel to lookups and insertions, the nftables lock will no longer synchronize all table mutations and the stored pprev may become invalid. Removing this optimization makes removal slightly more expensive on average but allows taking the resize cost out of the insert and remove path. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: netfilter-devel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03rhashtable: Convert bucket iterators to take table and indexThomas Graf
This patch is in preparation to introduce per bucket spinlocks. It extends all iterator macros to take the bucket table and bucket index. It also introduces a new rht_dereference_bucket() to handle protected accesses to buckets. It introduces a barrier() to the RCU iterators to the prevent the compiler from caching the first element. The lockdep verifier is introduced as stub which always succeeds and properly implement in the next patch when the locks are introduced. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03rhashtable: Do hashing inside of rhashtable_lookup_compare()Thomas Graf
Hash the key inside of rhashtable_lookup_compare() like rhashtable_lookup() does. This allows to simplify the hashing functions and keep them private. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: netfilter-devel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-03mac802154: fix kbuild test robot warningAlexander Aring
This patch fixs the following kbuild test robot warning: coccinelle warnings: (new ones prefixed by >>) >> net/mac802154/cfg.c:53:1-3: WARNING: PTR_ERR_OR_ZERO can be used Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-03ieee802154: handle config as menuconfigAlexander Aring
This patch handles the IEEE802154 Kconfig entry as menuconfig. Furthermore we move this entry out of "Network Options" and put it into "Networking" where the other networking subsystems are. This requires a menuconfig entry like all other networking subsystems. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-03ieee802154: rename af_ieee802154.c to socket.cAlexander Aring
This patch renames the "af_ieee802154.c" to "socket.c". This is just a cleanup to have a short name for it which describes the implementationm stuff more human understandable. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-03ieee802154: socket: fix checkpatch issueAlexander Aring
This patch solves the following checkpatch issue: CHECK: Comparison to NULL could be written "skb" + if (skb != NULL) { Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-03ieee802154: socket: put handling into one fileAlexander Aring
This patch puts all related socket handling into one file. This is just a cleanup to do all socket handling stuff inside of one implementation file. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-03ieee802154: socket: change module nameAlexander Aring
This patch changes the module name of af_802154 to ieee802154_socket. Just for keeping the name convention according to the 6LoWPAN module ieee802154_6lowpan. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-03ieee802154: handle socket functionality as moduleAlexander Aring
This patch makes the ieee802154 socket handling as module. Currently this is part of ieee802154 module. It pointed out that ieee802154 module has also two module_init/module_exit functions. One inside of core.c and the other in af_ieee802154.c. This patch will also solve this issue by handle the af_802154 as separate module. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-01-02Bluetooth: Fix SMP channel registration for unconfigured controllersMarcel Holtmann
When the Bluetooth controllers requires an unconfigured state (for example when the BD_ADDR is missing), then it is important to try to register the SMP channels when the controller transitions to the configured state. This also fixes an issue with the debugfs entires that are not present for controllers that start out as unconfigured. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02Bluetooth: Fix for a leftover debug of pairing credentialsMarcel Holtmann
One of the LE Secure Connections security credentials was still using the BT_DBG instead of SMP_DBG. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02Bluetooth: Fix scope of sc_only_mode debugfs entryMarcel Holtmann
The sc_only_mode debugfs entry is used to read the current state of the Secure Connections Only mode. Before Bluetooth 4.2 this mode was only for BR/EDR controllers and with that tight to the support Secure Simple Pairing. Since Secure Connections is now available for BR/EDR and LE this debugfs entry is no longer correctly place. Move it to the common section and enable it when either BR/EDR Secure Connections feature is supported or when the controller has LE support. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02Bluetooth: Remove no longer needed force_sc_support debugfs optionMarcel Holtmann
The force_sc_support debugfs option was introduced to easily work with pre-production Bluetooth 4.1 silicon. This option is no longer needed since controllers supporting BR/EDR Secure Connections feature are now available. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02Bluetooth: Remove broken force_lesc_support debugfs optionMarcel Holtmann
The force_lesc_support debugfs option never really worked. It has a race condition between creating the debugfs entry and registering the L2CAP fixed channel for BR/EDR SMP support. Also this has been replaced with a working force_bredr_smp debugfs switch that developers can use now. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02Bluetooth: Introduce force_bredr_smp debugfs option for testingMarcel Holtmann
Testing cross-transport pairing that starts on BR/EDR is only valid when using a controller with BR/EDR Secure Connections. Devices will indicate this by providing BR/EDR SMP fixed channel over L2CAP. To allow testing of this feature on Bluetooth 4.0 controller or controllers without the BR/EDR Secure Connections features, introduce a force_bredr_smp debugfs option that allows faking the required AES connection. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-01-02openvswitch: Consistently include VLAN header in flow and port stats.Ben Pfaff
Until now, when VLAN acceleration was in use, the bytes of the VLAN header were not included in port or flow byte counters. They were however included when VLAN acceleration was not used. This commit corrects the inconsistency, by always including the VLAN header in byte counters. Previous discussion at http://openvswitch.org/pipermail/dev/2014-December/049521.html Reported-by: Motonori Shindo <mshindo@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Reviewed-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02tcp: Do not apply TSO segment limit to non-TSO packetsHerbert Xu
Thomas Jarosch reported IPsec TCP stalls when a PMTU event occurs. In fact the problem was completely unrelated to IPsec. The bug is also reproducible if you just disable TSO/GSO. The problem is that when the MSS goes down, existing queued packet on the TX queue that have not been transmitted yet all look like TSO packets and get treated as such. This then triggers a bug where tcp_mss_split_point tells us to generate a zero-sized packet on the TX queue. Once that happens we're screwed because the zero-sized packet can never be removed by ACKs. Fixes: 1485348d242 ("tcp: Apply device TSO segment limit earlier") Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02net: skbuff: don't zero tc members when freeing skbFlorian Westphal
Not needed, only four cases: - kfree_skb (or one of its aliases). Don't need to zero, memory will be freed. - kfree_skb_partial and head was stolen: memory will be freed. - skb_morph: The skb header fields (including tc ones) will be copied over from the 'to-be-morphed' skb right after skb_release_head_state returns. - skb_segment: Same as before, all the skb header fields are copied over from the original skb right away. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg say: ==================== pull request: bluetooth-next 2014-12-31 Here's the first batch of bluetooth patches for 3.20. - Cleanups & fixes to ieee802154 drivers - Fix synchronization of mgmt commands with respective HCI commands - Add self-tests for LE pairing crypto functionality - Remove 'BlueFritz!' specific handling from core using a new quirk flag - Public address configuration support for ath3012 - Refactor debugfs support into a dedicated file - Initial support for LE Data Length Extension feature from Bluetooth 4.2 Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02geneve: Add Geneve GRO supportJoe Stringer
This results in an approximately 30% increase in throughput when handling encapsulated bulk traffic. Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02net: Add Transparent Ethernet Bridging GRO support.Jesse Gross
Currently the only tunnel protocol that supports GRO with encapsulated Ethernet is VXLAN. This pulls out the Ethernet code into a proper layer so that it can be used by other tunnel protocols such as GRE and Geneve. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31fib_trie: Add tracking value for suffix lengthAlexander Duyck
This change adds a tracking value for the maximum suffix length of all prefixes stored in any given tnode. With this value we can determine if we need to backtrace or not based on if the suffix is greater than the pos value. By doing this we can reduce the CPU overhead for lookups in the local table as many of the prefixes there are 32b long and have a suffix length of 0 meaning we can immediately backtrace to the root node without needing to test any of the nodes between it and where we ended up. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31fib_trie: Remove checks for index >= tnode_child_length from tnode_get_childAlexander Duyck
For some reason the compiler doesn't seem to understand that when we are in a loop that runs from tnode_child_length - 1 to 0 we don't expect the value of tn->bits to change. As such every call to tnode_get_child was rerunning tnode_chile_length which ended up consuming quite a bit of space in the resultant assembly code. I have gone though and verified that in all cases where tnode_get_child is used we are either winding though a fixed loop from tnode_child_length - 1 to 0, or are in a fastpath case where we are verifying the value by either checking for any remaining bits after shifting index by bits and testing for leaf, or by using tnode_child_length. size net/ipv4/fib_trie.o Before: text data bss dec hex filename 15506 376 8 15890 3e12 net/ipv4/fib_trie.o After: text data bss dec hex filename 14827 376 8 15211 3b6b net/ipv4/fib_trie.o Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31fib_trie: inflate/halve nodes in a more RCU friendly wayAlexander Duyck
This change pulls the node_set_parent functionality out of put_child_reorg and instead leaves that to the function to take care of as well. By doing this we can fully construct the new cluster of tnodes and all of the pointers out of it before we start routing pointers into it. I am suspecting this will likely fix some concurency issues though I don't have a good test to show as such. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31fib_trie: Push tnode flushing down to inflate/halveAlexander Duyck
This change pushes the tnode freeing down into the inflate and halve functions. It makes more sense here as we have a better grasp of what is going on and when a given cluster of nodes is ready to be freed. I believe this may address a bug in the freeing logic as well. For some reason if the freelist got to a certain size we would call synchronize_rcu(). I'm assuming that what they meant to do is call synchronize_rcu() after they had handed off that much memory via call_rcu(). As such that is what I have updated the behavior to be. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31fib_trie: Push assignment of child to parent down into inflate/halveAlexander Duyck
This change makes it so that the assignment of the tnode to the parent is handled directly within whatever function is currently handling the node be it inflate, halve, or resize. By doing this we can avoid some of the need to set NULL pointers in the tree while we are resizing the subnodes. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31fib_trie: Add functions should_inflate and should_halveAlexander Duyck
This change pulls the logic for if we should inflate/halve the nodes out into separate functions. It also addresses what I believe is a bug where 1 full node is all that is needed to keep a node from ever being halved. Simple script to reproduce the issue: modprobe dummy; ifconfig dummy0 up for i in `seq 0 255`; do ifconfig dummy0:$i 10.0.${i}.1/24 up; done ifconfig dummy0:256 10.0.255.33/16 up for i in `seq 0 254`; do ifconfig dummy0:$i down; done Results from /proc/net/fib_triestat Before: Local: Aver depth: 3.00 Max depth: 4 Leaves: 17 Prefixes: 18 Internal nodes: 11 1: 8 2: 2 10: 1 Pointers: 1048 Null ptrs: 1021 Total size: 11 kB After: Local: Aver depth: 3.41 Max depth: 5 Leaves: 17 Prefixes: 18 Internal nodes: 12 1: 8 2: 3 3: 1 Pointers: 36 Null ptrs: 8 Total size: 3 kB Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31fib_trie: Move resize to after inflate/halveAlexander Duyck
This change consists of a cut/paste of resize to behind inflate and halve so that I could remove the two function prototypes. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31fib_trie: Push rcu_read_lock/unlock to callersAlexander Duyck
This change is to start cleaning up some of the rcu_read_lock/unlock handling. I realized while reviewing the code there are several spots that I don't believe are being handled correctly or are masking warnings by locally calling rcu_read_lock/unlock instead of calling them at the correct level. A common example is a call to fib_get_table followed by fib_table_lookup. The rcu_read_lock/unlock ought to wrap both but there are several spots where they were not wrapped. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31fib_trie: Use unsigned long for anything dealing with a shift by bitsAlexander Duyck
This change makes it so that anything that can be shifted by, or compared to a value shifted by bits is updated to be an unsigned long. This is mostly a precaution against an insanely huge address space that somehow starts coming close to the 2^32 root node size which would require something like 1.5 billion addresses. I chose unsigned long instead of unsigned long long since I do not believe it is possible to allocate a 32 bit tnode on a 32 bit system as the memory consumed would be 16GB + 28B which exceeds the addressible space for any one process. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31fib_trie: Update meaning of pos to represent unchecked bitsAlexander Duyck
This change moves the pos value to the other side of the "bits" field. By doing this it actually simplifies a significant amount of code in the trie. For example when halving a tree we know that the bit lost exists at oldnode->pos, and if we inflate the tree the new bit being add is at tn->pos. Previously to find those bits you would have to subtract pos and bits from the keylength or start with a value of (1 << 31) and then shift that. There are a number of spots throughout the code that benefit from this. In the case of the hot-path searches the main advantage is that we can drop 2 or more operations from the search path as we no longer need to compute the value for the index to be shifted by and can instead just use the raw pos value. In addition the tkey_extract_bits is now defunct and can be replaced by get_index since the two operations were doing the same thing, but now get_index does it much more quickly as it is only an xor and shift versus a pair of shifts and a subtraction. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-31fib_trie: Optimize fib_table_insertAlexander Duyck
This patch updates the fib_table_insert function to take advantage of the changes made to improve the performance of fib_table_lookup. As a result the code should be smaller and run faster then the original. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>