summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/mthca
AgeCommit message (Collapse)Author
2008-09-29IB/mthca: Use pci_request_regions()Roland Dreier
Back in prehistoric (pre-git!) days, the kernel's MSI-X support did request_mem_region() on a device's MSI-X tables, which meant that a driver that enabled MSI-X couldn't use pci_request_regions() (since that would clash with the PCI layer's MSI-X request). However, that was removed (by me!) years ago, so mthca can just use pci_request_regions() and pci_release_regions() instead of its own much more complicated code that avoids requesting the MSI-X tables. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-26dma-mapping: add the device argument to dma_mapping_error()FUJITA Tomonori
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER architecture does: This enables us to cleanly fix the Calgary IOMMU issue that some devices are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423). I think that per-device dma_mapping_ops support would be also helpful for KVM people to support PCI passthrough but Andi thinks that this makes it difficult to support the PCI passthrough (see the above thread). So I CC'ed this to KVM camp. Comments are appreciated. A pointer to dma_mapping_ops to struct dev_archdata is added. If the pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's NULL, the system-wide dma_ops pointer is used as before. If it's useful for KVM people, I plan to implement a mechanism to register a hook called when a new pci (or dma capable) device is created (it works with hot plugging). It enables IOMMUs to set up an appropriate dma_mapping_ops per device. The major obstacle is that dma_mapping_error doesn't take a pointer to the device unlike other DMA operations. So x86 can't have dma_mapping_ops per device. Note all the POWER IOMMUs use the same dma_mapping_error function so this is not a problem for POWER but x86 IOMMUs use different dma_mapping_error functions. The first patch adds the device argument to dma_mapping_error. The patch is trivial but large since it touches lots of drivers and dma-mapping.h in all the architecture. This patch: dma_mapping_error() doesn't take a pointer to the device unlike other DMA operations. So we can't have dma_mapping_ops per device. Note that POWER already has dma_mapping_ops per device but all the POWER IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device argument. [akpm@linux-foundation.org: fix sge] [akpm@linux-foundation.org: fix svc_rdma] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix bnx2x] [akpm@linux-foundation.org: fix s2io] [akpm@linux-foundation.org: fix pasemi_mac] [akpm@linux-foundation.org: fix sdhci] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix sparc] [akpm@linux-foundation.org: fix ibmvscsi] Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-22IB/mthca: Keep free count for MTT buddy allocatorRoland Dreier
MTT entries are allocated with a buddy allocator, which just keeps bitmaps for each level of the buddy table. However, all free space starts out at the highest order, and small allocations start scanning from the lowest order. When the lowest order tables have no free space, this can lead to scanning potentially millions of bits before finding a free entry at a higher order. We can avoid this by just keeping a count of how many free entries each order has, and skipping the bitmap scan when an order is completely empty. This provides a nice performance boost for a negligible increase in memory usage. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14IB/mthca: Fix check of max_send_sge for special QPsRoland Dreier
The MLX transport requires two extra gather entries for sends (one for the header and one for the checksum at the end, as the comment says). However the code checked that max_recv_sge was not too big, instead of checking max_send_sge as it should have. Fix the code to check the correct condition. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14IB/mthca: Use round_jiffies() for catastrophic error polling timerRoland Dreier
Exactly when the catastrophic error polling timer function runs is not important, so use round_jiffies() to save unnecessary wakeups. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14IB/mthca: Remove "stop" flag for catastrophic error polling timerRoland Dreier
Since we use del_timer_sync() anyway, there's no need for an additional flag to tell the timer not to rearm. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14IB/mthca: Remove extra code for RESET->ERR QP state transitionRoland Dreier
Commit b18aad71 ("IB/mthca: Fix RESET to ERROR transition") added some extra code to handle a QP state transition from RESET to ERROR. However, the latest 1.2.1 version of the IB spec has clarified that this transition is actually not allowed, so we can remove this extra code again. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14RDMA/core: Add memory management extensions supportSteve Wise
This patch adds support for the IB "base memory management extension" (BMME) and the equivalent iWARP operations (which the iWARP verbs mandates all devices must implement). The new operations are: - Allocate an ib_mr for use in fast register work requests. - Allocate/free a physical buffer lists for use in fast register work requests. This allows device drivers to allocate this memory as needed for use in posting send requests (eg via dma_alloc_coherent). - New send queue work requests: * send with remote invalidate * fast register memory region * local invalidate memory region * RDMA read with invalidate local memory region (iWARP only) Consumer interface details: - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added to indicate device support for these features. - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV, IB_WR_RDMA_READ_WITH_INV are added. - A new consumer API function, ib_alloc_mr() is added to allocate fast register memory regions. - New consumer API functions, ib_alloc_fast_reg_page_list() and ib_free_fast_reg_page_list() are added to allocate and free device-specific memory for fast registration page lists. - A new consumer API function, ib_update_fast_reg_key(), is added to allow the key portion of the R_Key and L_Key of a fast registration MR to be updated. Consumers call this if desired before posting a IB_WR_FAST_REG_MR work request. Consumers can use this as follows: - MR is allocated with ib_alloc_mr(). - Page list memory is allocated with ib_alloc_fast_reg_page_list(). - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key(). - MR made VALID and bound to a specific page list via ib_post_send(IB_WR_FAST_REG_MR) - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV), ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with invalidate operation. - MR is deallocated with ib_dereg_mr() - page lists dealloced via ib_free_fast_reg_page_list(). Applications can allocate a fast register MR once, and then can repeatedly bind the MR to different physical block lists (PBLs) via posting work requests to a send queue (SQ). For each outstanding MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be allocated (the fast_reg_page_list is owned by the low-level driver from the consumer posting a work request until the request completes). Thus pipelining can be achieved while still allowing device-specific page_list processing. The 32-bit fast register memory key/STag is composed of a 24-bit index and an 8-bit key. The application can change the key each time it fast registers thus allowing more control over the peer's use of the key/STag (ie it can effectively be changed each time the rkey is rebound to a page list). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14RDMA: Remove subversion $Id tagsRoland Dreier
They don't get updated by git and so they're worse than useless. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-23IB/mthca: Clear ICM pages before handing to FWEli Cohen
Current memfree FW has a bug which in some cases, assumes that ICM pages passed to it are cleared. This patch uses __GFP_ZERO to allocate all ICM pages passed to the FW. Once firmware with a fix is released, we can make the workaround conditional on firmware version. This fixes the bug reported by Arthur Kepner <akepner@sgi.com> here: http://lists.openfabrics.org/pipermail/general/2008-May/050026.html Cc: <stable@kernel.org> Signed-off-by: Eli Cohen <eli@mellanox.co.il> [ Rewritten to be a one-liner using __GFP_ZERO instead of vmap()ing ICM memory and memset()ing it to 0. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-16IB/mthca: Fix max_sge value returned by query_deviceRoland Dreier
The mthca driver returns the maximum number of scatter/gather entries returned by the firmware as the max_sge value when device properties are queried. However, the firmware also reports a limit on the maximum descriptor size allowed, and because mthca takes into account the worst case send request overhead when checking whether to allow a QP to be created, the largest number of scatter/gather entries that can be used with mthca may be limited by the maximum descriptor size rather than just by the actual s/g entry limit. This means that applications cannot actually create QPs with max_send_sge equal to the limit returned by ib_query_device(). Fix this by checking if the maximum descriptor size imposes a lower limit and if so returning that lower limit. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attributeRoland Dreier
Commit cb9fbc5c ("IB: expand ib_umem_get() prototype") changed the mthca userspace ABI to provide a way for userspace to indicate which memory regions need the DMA write barrier attribute. However, it is possible to handle this without breaking existing userspace, by having the mthca kernel driver recognize whether it is talking to old or new userspace, depending on the size of the register MR structure passed in. The only potential drawback of this is that is allows old userspace (which has a bug with DMA ordering on large SGI Altix systems) to continue to run on new kernels, but the advantage of allowing old userspace to continue to work on unaffected systems seems to outweigh this, and we can print a warning to push people to upgrade their userspace. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29IB/mthca: Avoid recycling old FMR R_Keys too soonOlaf Kirch
When a FMR is unmapped, mthca resets the map count to 0, and clears the upper part of the R_Key which is used as the sequence counter. This poses a problem for RDS, which uses ib_fmr_unmap as a fence operation. RDS assumes that after issuing an unmap, the old R_Keys will be invalid for a "reasonable" period of time. For instance, Oracle processes uses shared memory buffers allocated from a pool of buffers. When a process dies, we want to reclaim these buffers -- but we must make sure there are no pending RDMA operations to/from those buffers. The only way to achieve that is by using unmap and sync the TPT. However, when the sequence count is reset on unmap, there is a high likelihood that a new mapping will be given the same R_Key that was issued a few milliseconds ago. To prevent this, don't reset the sequence count when unmapping a FMR. Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-29IB: expand ib_umem_get() prototypeArthur Kepner
Add a new parameter, dmasync, to the ib_umem_get() prototype. Use dmasync = 1 when mapping user-allocated CQs with ib_umem_get(). Signed-off-by: Arthur Kepner <akepner@sgi.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Jes Sorensen <jes@sgi.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits) SCSI: convert struct class_device to struct device DRM: remove unused dev_class IB: rename "dev" to "srp_dev" in srp_host structure IB: convert struct class_device to struct device memstick: convert struct class_device to struct device driver core: replace remaining __FUNCTION__ occurrences sysfs: refill attribute buffer when reading from offset 0 PM: Remove destroy_suspended_device() Firmware: add iSCSI iBFT Support PM: Remove legacy PM (fix) Kobject: Replace list_for_each() with list_for_each_entry(). SYSFS: Explicitly include required header file slab.h. Driver core: make device_is_registered() work for class devices PM: Convert wakeup flag accessors to inline functions PM: Make wakeup flags available whenever CONFIG_PM is set PM: Fix misuse of wakeup flag accessors in serial core Driver core: Call device_pm_add() after bus_add_device() in device_add() PM: Handle device registrations during suspend/resume block: send disk "change" event for rescan_partitions() sysdev: detect multiple driver registrations ... Fixed trivial conflict in include/linux/memory.h due to semaphore header file change (made irrelevant by the change to mutex).
2008-04-19IB: convert struct class_device to struct deviceTony Jones
This converts the main ib_device to use struct device instead of struct class_device as class_device is going away. Signed-off-by: Tony Jones <tonyj@suse.de> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-18Convert asm/semaphore.h users to linux/semaphore.hMatthew Wilcox
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2008-04-16IB/mthca: Update module version and release dateJack Morgenstein
The ib_mthca driver has been stable for a while, so bump the version number to 1.0 to indicate this. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16IB/mthca: Update QP state if query QP succeedsDotan Barak
If the QP was moved to another state (such as SQE) by the hardware, then after this change the user won't have to set the IBV_QP_CUR_STATE mask in order to execute modify QP in order to recover from this state. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16IB/core: Add support for "send with invalidate" work requestsRoland Dreier
Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a "send with invalidate" work request as defined in the iWARP verbs and the InfiniBand base memory management extensions. Also put "imm_data" and a new "invalidate_rkey" member in a new "ex" union in struct ib_send_wr. The invalidate_rkey member can be used to pass in an R_Key/STag to be invalidated. Add this new union to struct ib_uverbs_send_wr. Add code to copy the invalidate_rkey field in ib_uverbs_post_send(). Fix up low-level drivers to deal with the change to struct ib_send_wr, and just remove the imm_data initialization from net/sunrpc/xprtrdma/, since that code never does any send with immediate operations. Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since the iWARP drivers currently in the tree set the bit. The amso1100 driver at least will silently fail to honor the IB_SEND_INVALIDATE bit if passed in as part of userspace send requests (since it does not implement kernel bypass work request queueing). Remove the flag from all existing drivers that set it until we know which ones are OK. The values chosen for the new flag is not consecutive to avoid clashing with flags defined in the XRC patches, which are not merged yet but which are already in use and are likely to be merged soon. This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16IB/core: Add creation flags to struct ib_qp_init_attrEli Cohen
Add a create_flags member to struct ib_qp_init_attr that will allow a kernel verbs consumer to create a pass special flags when creating a QP. Add a flag value for telling low-level drivers that a QP will be used for IPoIB UD LSO. The create_flags member will also be useful for XRC and ehca low-latency QP support. Since no create_flags handling is implemented yet, add code to all low-level drivers to return -EINVAL if create_flags is non-zero. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16IB/mthca: Avoid integer overflow when allocating huge ICM tableRoland Dreier
In mthca_alloc_icm_table(), the number of entries to allocate for the table->icm array is computed by calculating obj_size * nobj and then dividing by MTHCA_TABLE_CHUNK_SIZE. If nobj is really large, then obj_size * nobj may overflow and the division may get the wrong value (even a negative value). Fix this by calculating the number of objects per chunk and then dividing nobj by this value instead. This patch allows crazy configurations such as loading ib_mthca with the module parameter num_mtt=33554432 to work properly. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16IB/mthca: Avoid integer overflow when dealing with profile sizeRoland Dreier
mthca_make_profile() returns the size in bytes of the HCA context layout it creates, or a negative value if an error occurs. However, the return value is declared as u64 and the memfree initialization path casts this value to int to test if it is negative. This makes it think incorrectly than an error has occurred if the context size happens to be bigger than 2GB, since this turns into a negative int. Fix this by having mthca_make_profile() return an s64 and testing for an error by checking whether this 64-bit value itself is negative. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16IB/mthca: Add IPoIB checksum offload supportEli Cohen
Arbel and Sinai devices support checksum generation and verification of TCP and UDP packets for UD IPoIB messages. This patch checks if the HCA supports this and sets the IB_DEVICE_UD_IP_CSUM capability flag if it does. It implements support for handling the IB_SEND_IP_CSUM send flag and setting the csum_ok field in receive work completions. Signed-off-by: Eli Cohen <eli@mellnaox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-04-16IB/mthca: Formatting cleanupsRoland Dreier
Fix a few whitespace and other coding style problems. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-19IB/mthca: Free correct MPT on error exit from mthca_fmr_alloc()Roland Dreier
When mthca_fmr_alloc() returns an error, it should free the MPT at the index key, not mr->ibmr.lkey, since the lkey has been mangled by hw_index_to_key() and no longer is the real index. This bug causes corruption of the MPT table free bitmap when mthca_fmr_alloc() fails. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-13IB/mthca: Convert to use be16_add_cpu()Marcin Slusarz
replace: big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) + expression_in_cpu_byteorder); with: beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder); Generated with a semantic patch. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-12IB/mthca: Add missing sg_init_table() in mthca_map_user_db()Roland Dreier
Usually harmless, since the scatterlist is always hard-coded to a length of 1, but it triggers a BUG() if CONFIG_DEBUG_SG=y, so we better fix it. This fixes <http://bugzilla.kernel.org/show_bug.cgi?id=9934>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/mthca: Return proper error codes from mthca_fmr_alloc()Olaf Kirch
If the allocation of the MTT or the mailbox failed, mthca_fmr_alloc() would return 0 (success) no matter what. This leads to crashes a little down the road, when we try to dereference eg mr->mtt, which was really ERR_PTR(-Ewhatever). Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB: Avoid marking __devinitdata as constRoland Dreier
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/ib_mthca: Pre-link receive WQEs in Tavor modeEli Cohen
We have recently discovered that Tavor mode requires each WQE in a posted list of receive WQEs to have a valid NDA field at all times. This requirement holds true for regular QPs as well as for SRQs. This patch prelinks the receive queue in a regular QP and keeps the free list in SRQ always properly linked. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/mthca: Remove checks for srq->first_free < 0Eli Cohen
The SRQ receive posting functions make sure that srq->first_free never becomes negative, so we can remove tests of whether it is negative. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/mthca: Don't read reserved fields in mthca_QUERY_ADAPTER()Jack Morgenstein
For memfree devices, the firmware QUERY_ADAPTER command does not return vendor_id, device_id, and revision_id; do not return these fields in the QUERY_ADAPTER function for memfree devices. Instead, for memfree devices, initialize the rev_id field of the mthca device via init_node_data (MAD IFC query), as is done in the query_device verb implementation. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-02-04IB/mthca: Fix and simplify page size calculation in mthca_reg_phys_mr()Roland Dreier
In mthca_reg_phys_mr(), we calculate the page size for the HCA hardware to use to map the buffer list passed in by the consumer. For example, if the consumer passes in [0] addr 0x1000, size 0x1000 [1] addr 0x2000, size 0x1000 then the algorithm would come up with a page size of 0x2000 and a list of two pages, at 0x0000 and 0x2000. Usually, this would work fine since the memory region would start at an offset of 0x1000 and have a length of 0x2000. However, the old code did not take into account the alignment of the IO virtual address passed in. For example, if the consumer passed in a virtual address of 0x6000 for the above, then the offset of 0x1000 would not be used correctly because the page mask of 0x1fff would result in an offset of 0. We can fix this quite neatly by making sure that the page shift we use is no bigger than the first bit where the start of the first buffer and the IO virtual address differ. Also, we can further simplify the code by removing the special case for a single buffer by noticing that it doesn't matter if we use a page size that is too big. This allows the loop to compute the page shift to be replaced with __ffs(). Thanks to Bryan S Rosenburg <rosnbrg@us.ibm.com> for pointing out the original bug and suggesting several ways to improve this patch. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/mthca: Update latest "native Arbel" firmware revisionRoland Dreier
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25IB/mthca: Remove MSI support as scheduledAdrian Bunk
Remove MSI support from the mthca driver, as scheduled. There is no reason to use MSI instead of MSI-X, since MSI-X performs better. No one has spoken up since MSI support was deprecated in commit f6be6fbe ("IB/mthca: Schedule MSI support for removal"), so apparently the MSI support is unused. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-24SG: Change sg_set_page() to take length and offset argumentJens Axboe
Most drivers need to set length and offset as well, so may as well fold those three lines into one. Add sg_assign_page() for those two locations that only needed to set the page, where the offset/length is set outside of the function context. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: mlx4_core: Increase command timeout for INIT_HCA to 10 seconds IPoIB/cm: Use common CQ for CM send completions IB/uverbs: Fix checking of userspace object ownership IB/mlx4: Sanity check userspace send queue sizes IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))" IB/ehca: Enable large page MRs by default IB/ehca: Change meaning of hca_cap_mr_pgsize IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr() IB/ehca: Fix masking error in {,re}reg_phys_mr() IB/ehca: Supply QP token for SRQ base QPs IPoIB: Use round_jiffies() for ah_reap_task RDMA/cma: Fix deadlock destroying listen requests RDMA/cma: Add locking around QP accesses IB/mthca: Avoid alignment traps when writing doorbells mlx4_core: Kill mlx4_write64_raw()
2007-10-22[SG] Update drivers to use sg helpersJens Axboe
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-15IB/mthca: Avoid alignment traps when writing doorbellsRoland Dreier
Architectures such as ia64 see alignment traps when doing a 64-bit read from __be32 doorbell[2] arrays to do doorbell writes in mthca_write64(). Fix this by just passing the two halves of the doorbell value into mthca_write64(). This actually improves the generated code by allowing the compiler to see what's going on better. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-10IB/mthca: Mark error paths as unlikely() in post_srq_recv functionsEli Cohen
Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09IB/mthca: Use mmiowb() to avoid firmware commands getting jumbled upRoland Dreier
Firmware commands are sent to the HCA by writing multiple words to a command register block. Access to this block of registers is serialized with a mutex. However, on large SGI systems, problems were seen with multiple CPUs issuing FW commands at the same time, because the writes to the register block may be reordered within the system interconnect and reach the HCA in a different order than they were issued (even with the mutex). Fix this by adding an mmiowb() before dropping the mutex. Tested-by: Arthur Kepner <akepner@sgi.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09IB/mthca: Increase max number of QPs per multicast group to 56Roland Dreier
Increase the number of QPs allowed per multicast group from 8 to 56. This allows for one QP per core on 16-core systems, which are now quite common, and allows some space for future growth. This is basically the same patch that Jack Morgenstein <jackm@dev.mellanox.co.il> just supplied for mlx4. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09IB/mthca: Use PCI-X/PCI-Express read control interfacesPeter Oruba
These driver changes incorporate the proposed PCI-X / PCI-Express read byte count interface. Reading and setting those values doesn't take place "manually", instead wrapping functions are called to allow quirks for some PCI bridges. Signed-off by: Peter Oruba <peter.oruba@amd.com> Based on work by Stephen Hemminger <shemminger@linux-foundation.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09IB/mthca: Enable MSI-X by defaultMichael S. Tsirkin
Recover from MSI-X errors by automatically falling back on regular interrupt, instead of asking the user to do this manually. This makes it possible to enable MSI-X by default, and will make it possible to get rid of the msi_x module option in the future. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-20IB/mthca: Change command token on timeoutMichael S. Tsirkin
The FW command token is currently only updated on a command completion event. This means that on command timeout, the same token will be reused for new command, which results in a mess if the timed out command *does* eventually complete. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18IB/mthca: Simplify use of size0 in work request postingRoland Dreier
Current code sets size0 to 0 at the start of work request posting functions and then handles size0 == 0 specially within the loop over work requests. Change this so size0 is set along with f0 the first time through the loop (when nreq == 0). This makes the code easier to understand by making it clearer that f0 and size0 are always initialized if nreq != 0 without having to know that size0 == 0 implies nreq == 0. Also annotate size0 with uninitialized_var() so that this doesn't introduce a new compiler warning. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18IB/mthca: Factor out setting WQE UD segment entriesRoland Dreier
Factor code to set UD entries out of the work request posting functions into inline functions set_tavor_ud_seg() and set_arbel_ud_seg(). This doesn't change the generated code in any significant way, and makes the source easier on the eyes. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18IB/mthca: Factor out setting WQE remote address and atomic segment entriesRoland Dreier
Factor code to set remote address and atomic segment entries out of the work request posting functions into inline functions set_raddr_seg() and set_atomic_seg(). This doesn't change the generated code in any significant way, and makes the source easier on the eyes. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-18IB/mthca: Factor out setting WQE data segment entriesRoland Dreier
Factor code to set data segment entries out of the work request posting functions into inline functions mthca_set_data_seg() and mthca_set_data_seg_inval(). This makes the code more readable and also allows the compiler to do a better job -- on x86_64: add/remove: 0/0 grow/shrink: 0/6 up/down: 0/-69 (-69) function old new delta mthca_arbel_post_srq_recv 373 369 -4 mthca_arbel_post_receive 570 562 -8 mthca_tavor_post_srq_recv 520 508 -12 mthca_tavor_post_send 1344 1330 -14 mthca_arbel_post_send 1481 1467 -14 mthca_tavor_post_receive 792 775 -17 Signed-off-by: Roland Dreier <rolandd@cisco.com>