summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)Author
2014-08-14Merge tag 'pci-v3.17-changes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull DEFINE_PCI_DEVICE_TABLE removal from Bjorn Helgaas: "Part two of the PCI changes for v3.17: - Remove DEFINE_PCI_DEVICE_TABLE macro use (Benoit Taine) It's a mechanical change that removes uses of the DEFINE_PCI_DEVICE_TABLE macro. I waited until later in the merge window to reduce conflicts, but it's possible you'll still see a few" * tag 'pci-v3.17-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Remove DEFINE_PCI_DEVICE_TABLE macro use
2014-08-14Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma updates from Roland Dreier: "Main set of InfiniBand/RDMA updates for 3.17 merge window: - MR reregistration support - MAD support for RMPP in userspace - iSER and SRP initiator updates - ocrdma hardware driver updates - other fixes..." * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (52 commits) IB/srp: Fix return value check in srp_init_module() RDMA/ocrdma: report asic-id in query device RDMA/ocrdma: Update sli data structure for endianness RDMA/ocrdma: Obtain SL from device structure RDMA/uapi: Include socket.h in rdma_user_cm.h IB/srpt: Handle GID change events IB/mlx5: Use ARRAY_SIZE instead of sizeof/sizeof[0] IB/mlx4: Use ARRAY_SIZE instead of sizeof/sizeof[0] RDMA/amso1100: Check for integer overflow in c2_alloc_cq_buf() IPoIB: Remove unnecessary test for NULL before debugfs_remove() IB/mad: Add user space RMPP support IB/mad: add new ioctl to ABI to support new registration options IB/mad: Add dev_notice messages for various umad/mad registration failures IB/mad: Update module to [pr|dev]_* style print messages IB/ipoib: Avoid multicast join attempts with invalid P_key IB/umad: Update module to [pr|dev]_* style print messages IB/ipoib: Avoid flushing the workqueue from worker context IB/ipoib: Use P_Key change event instead of P_Key polling mechanism IB/ipath: Add P_Key change event support mlx4_core: Add support for secure-host and SMP firewall ...
2014-08-14Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'iwcm', 'mad', 'misc', ↵Roland Dreier
'mlx4', 'mlx5', 'ocrdma' and 'srp' into for-next
2014-08-12PCI: Remove DEFINE_PCI_DEVICE_TABLE macro useBenoit Taine
We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines. This issue was reported by checkpatch. A simplified version of the semantic patch that makes this change is as follows (http://coccinelle.lip6.fr/): // <smpl> @@ identifier i; declarer name DEFINE_PCI_DEVICE_TABLE; initializer z; @@ - DEFINE_PCI_DEVICE_TABLE(i) + const struct pci_device_id i[] = z; // </smpl> [bhelgaas: add semantic patch] Signed-off-by: Benoit Taine <benoit.taine@lip6.fr> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Steady transitioning of the BPF instructure to a generic spot so all kernel subsystems can make use of it, from Alexei Starovoitov. 2) SFC driver supports busy polling, from Alexandre Rames. 3) Take advantage of hash table in UDP multicast delivery, from David Held. 4) Lighten locking, in particular by getting rid of the LRU lists, in inet frag handling. From Florian Westphal. 5) Add support for various RFC6458 control messages in SCTP, from Geir Ola Vaagland. 6) Allow to filter bridge forwarding database dumps by device, from Jamal Hadi Salim. 7) virtio-net also now supports busy polling, from Jason Wang. 8) Some low level optimization tweaks in pktgen from Jesper Dangaard Brouer. 9) Add support for ipv6 address generation modes, so that userland can have some input into the process. From Jiri Pirko. 10) Consolidate common TCP connection request code in ipv4 and ipv6, from Octavian Purdila. 11) New ARP packet logger in netfilter, from Pablo Neira Ayuso. 12) Generic resizable RCU hash table, with intial users in netlink and nftables. From Thomas Graf. 13) Maintain a name assignment type so that userspace can see where a network device name came from (enumerated by kernel, assigned explicitly by userspace, etc.) From Tom Gundersen. 14) Automatic flow label generation on transmit in ipv6, from Tom Herbert. 15) New packet timestamping facilities from Willem de Bruijn, meant to assist in measuring latencies going into/out-of the packet scheduler, latency from TCP data transmission to ACK, etc" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1536 commits) cxgb4 : Disable recursive mailbox commands when enabling vi net: reduce USB network driver config options. tg3: Modify tg3_tso_bug() to handle multiple TX rings amd-xgbe: Perform phy connect/disconnect at dev open/stop amd-xgbe: Use dma_set_mask_and_coherent to set DMA mask net: sun4i-emac: fix memory leak on bad packet sctp: fix possible seqlock seadlock in sctp_packet_transmit() Revert "net: phy: Set the driver when registering an MDIO bus device" cxgb4vf: Turn off SGE RX/TX Callback Timers and interrupts in PCI shutdown routine team: Simplify return path of team_newlink bridge: Update outdated comment on promiscuous mode net-timestamp: ACK timestamp for bytestreams net-timestamp: TCP timestamping net-timestamp: SCHED timestamp on entering packet scheduler net-timestamp: add key to disambiguate concurrent datagrams net-timestamp: move timestamp flags out of sk_flags net-timestamp: extend SCM_TIMESTAMPING ancillary data struct cxgb4i : Move stray CPL definitions to cxgb4 driver tcp: reduce spurious retransmits due to transient SACK reneging qlcnic: Initialize dcbnl_ops before register_netdev ...
2014-08-05Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and time updates from Thomas Gleixner: "A rather large update of timers, timekeeping & co - Core timekeeping code is year-2038 safe now for 32bit machines. Now we just need to fix all in kernel users and the gazillion of user space interfaces which rely on timespec/timeval :) - Better cache layout for the timekeeping internal data structures. - Proper nanosecond based interfaces for in kernel users. - Tree wide cleanup of code which wants nanoseconds but does hoops and loops to convert back and forth from timespecs. Some of it definitely belongs into the ugly code museum. - Consolidation of the timekeeping interface zoo. - A fast NMI safe accessor to clock monotonic for tracing. This is a long standing request to support correlated user/kernel space traces. With proper NTP frequency correction it's also suitable for correlation of traces accross separate machines. - Checkpoint/restart support for timerfd. - A few NOHZ[_FULL] improvements in the [hr]timer code. - Code move from kernel to kernel/time of all time* related code. - New clocksource/event drivers from the ARM universe. I'm really impressed that despite an architected timer in the newer chips SoC manufacturers insist on inventing new and differently broken SoC specific timers. [ Ed. "Impressed"? I don't think that word means what you think it means ] - Another round of code move from arch to drivers. Looks like most of the legacy mess in ARM regarding timers is sorted out except for a few obnoxious strongholds. - The usual updates and fixlets all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) timekeeping: Fixup typo in update_vsyscall_old definition clocksource: document some basic timekeeping concepts timekeeping: Use cached ntp_tick_length when accumulating error timekeeping: Rework frequency adjustments to work better w/ nohz timekeeping: Minor fixup for timespec64->timespec assignment ftrace: Provide trace clocks monotonic timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC seqcount: Add raw_write_seqcount_latch() seqcount: Provide raw_read_seqcount() timekeeping: Use tk_read_base as argument for timekeeping_get_ns() timekeeping: Create struct tk_read_base and use it in struct timekeeper timekeeping: Restructure the timekeeper some more clocksource: Get rid of cycle_last clocksource: Move cycle_last validation to core code clocksource: Make delta calculation a function wireless: ath9k: Get rid of timespec conversions drm: vmwgfx: Use nsec based interfaces drm: i915: Use nsec based interfaces timekeeping: Provide ktime_get_raw() hangcheck-timer: Use ktime_get_ns() ...
2014-08-05mlx4_core: Add support for secure-host and SMP firewallJack Morgenstein
Secure-host is the general term for the capability of a device to protect itself and the subnet from malicious host software. This is achieved by: 1. Not allowing un-trusted entities to access device configuration registers, directly (through pci_cr or pci_conf) and indirectly (through MADs). 2. Hiding M_Key from untrusted entities. 3. Preventing the modification of GUID0 by un-trusted entities 4. Not allowing drivers on untrusted hosts to receive nor to transmit packets over QP0 (SMP Firewall). The secure-host capability depends on firmware handling all QP0 packets, and not passing these packets up to the driver. Any information required by the driver for proper operation (e.g., SM lid) is passed via events generated by the firmware while processing QP0 MADs. Driver support mainly requires using the MAD_DEMUX FW command at startup, where the feature is enabled/disabled through a procedure described in the Mellanox HCA tools package. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> [ Fix error path in mlx4_setup_hca to go to err_mcg_table_free. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01mlx4_core: Add helper functions to support MR re-registrationMatan Barak
Add few helper functions to support a mechanism of getting an MPT, modifying it and updating the HCA with the modified object. The code takes 2 paths, one for directly changing the MPT (and sometimes its related MTTs) and another one which queries the MPT and updates the HCA via fw command SW2HW_MPT. The first path is used in native mode; the second path is slower and is used only in SRIOV. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-30mlx5: Adjust events to use unsigned long param instead of void *Jack Morgenstein
In the event flow, we currently pass only a port number in the void *data argument. Rather than pass a pointer to the event handlers, we should use an "unsigned long" parameter, and pass the port number value directly. In the future, if necessary for some events, we can use the unsigned long parameter to pass a pointer. Based on a patch by Eli Cohen <eli@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30mlx5: minor fixes (mainly avoidance of hidden casts)Jack Morgenstein
There were many places where parameters which should be u8/u16 were integer type. Additionally, in 2 places, a check for a non-null pointer was added before dereferencing the pointer (this is actually a bug fix). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30mlx5: Move pci device handling from mlx5_ib to mlx5_coreJack Morgenstein
In preparation for a new mlx5 device which is VPI (i.e., ports can be either IB or ETH), move the pci device functionality from mlx5_ib to mlx5_core. This involves the following changes: 1. Move mlx5_core_dev struct out of mlx5_ib_dev. mlx5_core_dev is now an independent structure maintained by mlx5_core. mlx5_ib_dev now has a pointer to that struct. This requires changing a lot of places where the core_dev struct was accessed via mlx5_ib_dev (now, this needs to be a pointer dereference). 2. All PCI initializations are now done in mlx5_core. Thus, it is now mlx5_core which does pci_register_device (and not mlx5_ib, as was previously). 3. mlx5_ib now registers itself with mlx5_core as an "interface" driver. This is very similar to the mechanism employed for the mlx4 (ConnectX) driver. Once the HCA is initialized (by mlx5_core), it invokes the interface drivers to do their initializations. 4. There is a new event handler which the core registers: mlx5_core_event(). This event handler invokes the event handlers registered by the interfaces. Based on a patch by Eli Cohen <eli@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-24net/mlx4_en: mlx4_en_[gs]et_priv_flags() can be staticFengguang Wu
Fixes sparse warning intrduced by commit 0fef9d0 ("net/mlx4_en: Disable blueflame using ethtool private flags") Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-23net: mlx5: Use ktime_get_ns()Thomas Gleixner
This code is beyond silly: struct timespec ts = ktime_get_ts(); ktime_t ktime = timespec_to_ktime(ts); Further down the code builds the delta of two ktime_t values and converts the result to nanoseconds. Use ktime_get_ns() and replace all the nonsense. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Eli Cohen <eli@mellanox.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2014-07-22net/mlx4_en: Reduce memory consumption on kdump kernelAmir Vadai
When memory is limited, reduce number of rx and tx rings. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22net/mlx4_core: Use low memory profile on kdump kernelAmir Vadai
When running in kdump kernel, reduce number of resources allocated for the hardware. This will enable the NIC to operate in this low memory environment at the expense of performance and some features not related to the basic NIC functionality. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22net/mlx4_en: Disable blueflame using ethtool private flagsAmir Vadai
Enable the user to turn off the hardware feature called BlueFlame. Since it is something specific to mlx4_en hardware, we control the feature via ethtool private flags. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22net/mlx4_en: current_mac isn't updated in port upEyal Perry
When port is down dev_addr is changed (e.g. by bonding) but current_mac is not touched. When port is up again, hash_mac is updated to dev_addr, but current_mac isn't. This leads to inconsistency between current_mac and mac_hash. Because of that, mlx4_en_replace_mac() fails to find current_mac in mac_hash. Fix is to reset current_mac to dev_addr when port is up - as we do for mac_hash. Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/infiniband/hw/cxgb4/device.c The cxgb4 conflict was simply overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Null termination fix in dns_resolver got the pointer dereferncing wrong, fix from Ben Hutchings. 2) ip_options_compile() has a benign but real buffer overflow when parsing options. From Eric Dumazet. 3) Table updates can crash in netfilter's nftables if none of the state flags indicate an actual change, from Pablo Neira Ayuso. 4) Fix race in nf_tables dumping, also from Pablo. 5) GRE-GRO support broke the forwarding path because the segmentation state was not fully initialized in these paths, from Jerry Chu. 6) sunvnet driver leaks objects and potentially crashes on module unload, from Sowmini Varadhan. 7) We can accidently generate the same handle for several u32 classifier filters, fix from Cong Wang. 8) Several edge case bug fixes in fragment handling in xen-netback, from Zoltan Kiss. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits) ipv4: fix buffer overflow in ip_options_compile() batman-adv: fix TT VLAN inconsistency on VLAN re-add batman-adv: drop QinQ claim frames in bridge loop avoidance dns_resolver: Null-terminate the right string xen-netback: Fix pointer incrementation to avoid incorrect logging xen-netback: Fix releasing header slot on error path xen-netback: Fix releasing frag_list skbs in error path xen-netback: Fix handling frag_list on grant op error path net_sched: avoid generating same handle for u32 filters net: huawei_cdc_ncm: add "subclass 3" devices net: qmi_wwan: add two Sierra Wireless/Netgear devices wan/x25_asy: integer overflow in x25_asy_change_mtu() net: ppp: fix creating PPP pass and active filters net/mlx4_en: cq->irq_desc wasn't set in legacy EQ's sunvnet: clean up objects created in vnet_new() on vnet_exit() r8169: Enable RX_MULTI_EN for RTL_GIGA_MAC_VER_40 net-gre-gro: Fix a bug that breaks the forwarding path netfilter: nf_tables: 64bit stats need some extra synchronization netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale netfilter: nf_tables: safe RCU iteration on list when dumping ...
2014-07-18Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma fixes from Roland Dreier: - cxgb4 hardware driver regression fixes - mlx5 hardware driver regression fixes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/mlx5: Enable "block multicast loopback" for kernel consumers RDMA/cxgb4: Call iwpm_init() only once mlx5_core: Fix possible race between mr tree insert/delete RDMA/cxgb4: Initialize the device status page RDMA/cxgb4: Clean up connection on ARP error RDMA/cxgb4: Fix skb_leak in reject_cr()
2014-07-16net/mlx4_en: cq->irq_desc wasn't set in legacy EQ'sAmir Vadai
Fix a regression introduced by commit 35f6f45 ("net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map"). When core is started in legacy EQ's (number of IRQ's < rx rings), cq->irq_desc was NULL. This caused a kernel crash under heavy traffic - when having more than rx NAPI budget completions. Fixed to have it set for both EQ modes. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16net/mlx4_core: Remove MCG in case it is attached to promiscuous QPs onlySaeed Mahameed
In B0 steering mode if promiscuous QP asks to be detached from MCG entry, and it is the only one in this entry then the entry will never be deleted. This is a wrong behavior since we don't want to keep those entries after the promiscuous QP becomes non-promiscuous. Therefore remove steering entry containing only promiscuous QP. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16net/mlx4_core: In SR-IOV mode host should add promisc QP to default entry onlyEugenia Emantayev
In current situation host is adding the promiscuous QP to all steering entries and the default entry as well. In this case when having PV and SR-IOV on the same setup bridge will receive all traffic that is targeted to the other VMs. This is bad. Solution: In SR-IOV mode host can add promiscuous QP to default entry only. The above problem and fix are relevant for B0 steering mode only. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16net/mlx4_core: Make sure the max number of QPs per MCG isn't exceededAlexander Guller
In B0 steering mode when adding QPs to the default MCG entry need to check that maximal number of QPs per MCG entry was not exceeded. Signed-off-by: Alexander Guller <alexg@mellanox.com> Reviewed-by: Aviad Yehezkel <aviadye@mellanox.co.il> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16net/mlx4_core: Make sure that negative array index isn't usedDotan Barak
To make sure that the array index isn't used in the code with negative value, we stop using the for loop integer iterator outside of it. >From now on use members count to swap the last QP with removed one. Fix also the second occurrence of this flow in mlx4_qp_detach_common(). In mlx4_qp_detach_common() use members_count instead of loop iterator outside of the for loop. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16net/mlx4_core: Fix leakage of SW multicast entriesYevgeny Petrilin
When removing multicast address in B0 steering mode there is a bug in cases where there is a single QP registered for the address, and this QP is also promiscuous. In such cases the entry wouldn't be deleted from the SW structure representing all Ethernet MCG entries, but would be removed in HW. This way when driver goes to remove it from SW and HW structures the HW deletion fails. Moreover the same index could later be used for registering different address, which can be Infiniband. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14mlx4: mark napi id for gro_skbJason Wang
Napi id was not marked for gro_skb, this will lead rx busy loop won't work correctly since they stack never try to call low latency receive method because of a zero socket napi id. Fix this by marking napi id for gro_skb. The transaction rate of 1 byte netperf tcp_rr gets about 50% increased (from 20531.68 to 30610.88). Cc: Amir Vadai <amirv@mellanox.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-09mlx5_core: Fix possible race between mr tree insert/deleteSagi Grimberg
In mlx5_core_destroy_mkey(), we must first remove the mr from the radix tree and then destroy it. Otherwise we might hit a race if the key was reallocated and we attempted to insert it to the radix tree. Also handle radix tree insert/delete failures. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Eli Cohen <elic@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-08net/mlx4_en: Ignore budget on TX napi pollingAmir Vadai
It is recommended that TX work not count against the quota. The cost of TX packet liberation is a minute percentage of what it costs to process an RX frame. Furthermore, that SKB freeing makes memory available for other paths in the stack. Give the TX a larger budget and be more aggressive about cleaning up the Tx descriptors this budget could be changed using ethtool: $ ethtool -C eth1 tx-frames-irq <budget> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08net/mlx4_en: Fix mac_hash database inconsistencyNoa Osherovich
Using a local copy of dev_addr in mlx4_en_set_mac() to prevent dev_addr from being modified during error flow or when dev_addr is modified in another context (which is another problem that is being discussed over the mailing list [1]). Also fixing bad naming of priv->prev_mac into priv->current_mac. [1] - http://patchwork.ozlabs.org/patch/351489/ Reviewed-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08net/mlx4_en: Do not count LLC/SNAP in MTU calculationYishai Hadas
LLC/SNAP 8 bytes should not be added as part of header calculation. If used, payload will be decreased accordingly. For MTU of 1500 we'll set 1522 instead of 1523. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Liran Liss <liranl@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08net/mlx4_en: Do not disable vlan filter during promiscuous modeEugenia Emantayev
Promiscous mode is only for MACs. Should not disable/enable VLAN filter when entering/leaving promisuous mode. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.co.il> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08net/mlx4: Verify port number in __mlx4_unregister_macEugenia Emantayev
Verify port number to avoid crashes if port number is outside the range. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08net/mlx4_en: Run loopback test only when port is upEugenia Emantayev
Loopback can't work when port is down. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-08net/mlx4_en: Fix set port ratelimit for 40GEEugenia Emantayev
In 40GE we can't use the default bw units for set ratelimit (100 Mbps) since the max is 255*100 Mbps = 25 Gbps (not suited for 40GE), thus we need 1 Gbps units. But for 10GE 1 Gbps units might be too bruit so we use the following solution. For user set ratelimit <= 25 Gbps: use 100 Mbps units * user_ratelimit (* 10). For user set ratelimit > 25 Gbps: use 1 Gbps units * user_ratelimit. For user set unlimited ratelimit (0 Gbps): use 1 Gbps units * MAX_RATELIMIT_DEFAULT (57) Note: any value > 58 will damage the FW ratelimit computation, so we allow a max and any higher value will be pulled down to 57. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07net/mlx4_en: Don't configure the HW vxlan parser when vxlan offloading isn't setOr Gerlitz
The add_vxlan_port ndo driver code was wrongly testing whether HW vxlan offloads are supported by the device instead of checking if they are currently enabled. This causes the driver to configure the HW parser to conduct matching for vxlan packets but since no steering rules were set, vxlan packets are dropped on RX. Fix that by doing the right test, as done in the del_vxlan_port ndo handler. Fixes: 1b136de ('net/mlx4: Implement vxlan ndo calls') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02net/mlx4_en: IRQ affinity hint is not cleared on port downAmir Vadai
Need to remove affinity hint at mlx4_en_deactivate_cq() and not at mlx4_en_destroy_cq() - since affinity_mask might be free'd while still being used by procfs. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ ↵Amir Vadai
affinity map IRQ affinity notifier can only have a single notifier - cpu_rmap notifier. Can't use it to track changes in IRQ affinity map. Detect IRQ affinity changes by comparing CPU to current IRQ affinity map during NAPI poll thread. CC: Thomas Gleixner <tglx@linutronix.de> CC: Ben Hutchings <ben@decadent.org.uk> Fixes: 2eacc23 ("net/mlx4_core: Enforce irq affinity changes immediatly") Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-22net/mlx4_core: Fix the error flow when probing with invalid VF configurationOr Gerlitz
Single ported VF are currently not supported on configurations where one or both ports are IB. When we hit this case, the relevant flow in the driver didn't return error and jumped to the wrong label. Fix that. Fixes: dd41cc3 ('net/mlx4: Adapt num_vfs/probed_vf params for single port VF') Reported-by: Shirley Ma <shirley.ma@oracle.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...
2014-06-11net/mlx4_en: Use affinity hintYuval Atias
The “affinity hint” mechanism is used by the user space daemon, irqbalancer, to indicate a preferred CPU mask for irqs. Irqbalancer can use this hint to balance the irqs between the cpus indicated by the mask. We wish the HCA to preferentially map the IRQs it uses to numa cores close to it. To accomplish this, we use cpumask_set_cpu_local_first(), that sets the affinity hint according the following policy: First it maps IRQs to “close” numa cores. If these are exhausted, the remaining IRQs are mapped to “far” numa cores. Signed-off-by: Yuval Atias <yuvala@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net/mlx4_core: Keep only one driver entry release mlx4_privWei Yang
Following commit befdf89 "net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()", there are two mlx4 pci callbacks which will attempt to release the mlx4_priv object -- .shutdown and .remove. This leads to a use-after-free access to the already freed mlx4_priv instance and trigger a "Kernel access of bad area" crash when both .shutdown and .remove are called. During reboot or kexec, .shutdown is called, with the VFs probed to the host going through shutdown first and then the PF. Later, the PF will trigger VFs' .remove since VFs still have driver attached. Fix that by keeping only one driver entry which releases mlx4_priv. Fixes: befdf89 ('net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()') CC: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net/mlx4_core: Fix SRIOV free-pool management when enforcing resource quotasJack Morgenstein
The Hypervisor driver tracks free slots and reserved slots at the global level and tracks allocated slots and guaranteed slots per VF. Guaranteed slots are treated as reserved by the driver, so the total reserved slots is the sum of all guaranteed slots over all the VFs. As VFs allocate resources, free (global) is decremented and allocated (per VF) is incremented for those resources. However, reserved (global) is never changed. This means that effectively, when a VF allocates a resource from its guaranteed pool, it is actually reducing that resource's free pool (since the global reserved count was not also reduced). The fix for this problem is the following: For each resource, as long as a VF's allocated count is <= its guaranteed number, when allocating for that VF, the reserved count (global) should be reduced by the allocation as well. When the global reserved count reaches zero, the remaining global free count is still accessible as the free pool for that resource. When the VF frees resources, the reverse happens: the global reserved count for a resource is incremented only once the VFs allocated number falls below its guaranteed number. This fix was developed by Rick Kready <kready@us.ibm.com> Reported-by: Rick Kready <kready@us.ibm.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull main InfiniBand/RDMA updates from Roland Dreier: - add iWARP port mapper to avoid conflicts between RDMA and normal stack TCP connections. - fixes for i386 / x86-64 structure padding differences (ABI compatibility for 32-on-64) from Yann Droneaud. - a pile of SRP initiator fixes from Bart Van Assche. - fixes for a writeback / memory allocation deadlock with NFS over IPoIB connected mode from Jiri Kosina. - the usual fixes and cleanups to mlx4, mlx5, cxgb4 and other low-level drivers. * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (61 commits) RDMA/cxgb4: Add support for iWARP Port Mapper user space service RDMA/nes: Add support for iWARP Port Mapper user space service RDMA/core: Add support for iWARP Port Mapper user space service IB/mlx4: Fix gfp passing in create_qp_common() IB/umad: Fix use-after-free on close IB/core: Fix kobject leak on device register error flow RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_resp mlx4_core: Fix GFP flags parameters to be gfp_t IB/core: Fix port kobject deletion during error flow IB/core: Remove unneeded kobject_get/put calls IB/core: Fix sparse warnings about redeclared functions IB/mad: Fix sparse warning about gfp_t use IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO IB: Add a QP creation flag to use GFP_NOIO allocations IB: Return error for unsupported QP creation flags IB: Allow build of hw/ and ulp/ subdirectories independently mlx4_core: Move handling of MLX4_QP_ST_MLX to proper switch statement RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp IB/srp: Avoid problems if a header uses pr_fmt IB/umad: Fix error handling ...
2014-06-10Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', ↵Roland Dreier
'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next
2014-06-06net: use SPEED_UNKNOWN and DUPLEX_UNKNOWN when appropriateJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04mlx4_core: Fix GFP flags parameters to be gfp_tRoland Dreier
Otherwise sparse gives a bunch of warnings like drivers/net/ethernet/mellanox/mlx4/srq.c:110:66: sparse: incorrect type in argument 4 (different base types) drivers/net/ethernet/mellanox/mlx4/srq.c:110:66: expected int [signed] gfp drivers/net/ethernet/mellanox/mlx4/srq.c:110:66: got restricted gfp_t Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: include/net/inetpeer.h net/ipv6/output_core.c Changes in net were fixing bugs in code removed in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02Merge branch 'ethtool-rssh-fixes' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bwh/net-next Ben Hutchings says: ==================== Pull request: Fixes for new ethtool RSS commands This addresses several problems I previously identified with the new ETHTOOL_{G,S}RSSH commands: 1. Missing validation of reserved parameters 2. Vague documentation 3. Use of unnamed magic number 4. No consolidation with existing driver operations I don't currently have access to suitable network hardware, but have tested these changes with a dummy driver that can support various combinations of operations and sizes, together with (a) Debian's ethtool 3.13 (b) ethtool 3.14 with the submitted patch to use ETHTOOL_{G,S}RSSH and minor adjustment for fixes 1 and 3. v2: Update RSS operations in vmxnet3 too ==================== Signed-off-by: David S. Miller <davem@davemloft.net>