From 414ed90cee32486c50f91b28990443e0dc21c868 Mon Sep 17 00:00:00 2001 From: Mike Marciniszyn Date: Thu, 10 Feb 2011 14:11:28 +0000 Subject: IB/qib: Fix double add_timer() The following panic BUG_ON occurs during qib testing: Kernel BUG at include/linux/timer.h:82 RIP [] :ib_qib:start_timer+0x73/0x89 RSP <0>Kernel panic - not syncing: Fatal exception <0>Dumping qib trace buffer from panic qib_set_lid INFO: IB0:1 got a lid: 0xf8 Done dumping qib trace buffer BUG: warning at kernel/panic.c:137/panic() (Tainted: G The flaw is due to a missing state test when processing responses that results in an add_timer() call when the same timer is already queued. This code was executing in parallel with a QP destroy on another CPU that had changed the state to reset, but the missing test caused to response handling code to run on into the panic. Signed-off-by: Mike Marciniszyn Signed-off-by: Roland Dreier --- drivers/infiniband/hw/qib/qib_rc.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'drivers/infiniband') diff --git a/drivers/infiniband/hw/qib/qib_rc.c b/drivers/infiniband/hw/qib/qib_rc.c index 8245237b67c..31e09b05a1a 100644 --- a/drivers/infiniband/hw/qib/qib_rc.c +++ b/drivers/infiniband/hw/qib/qib_rc.c @@ -1439,6 +1439,8 @@ static void qib_rc_rcv_resp(struct qib_ibport *ibp, } spin_lock_irqsave(&qp->s_lock, flags); + if (!(ib_qib_state_ops[qp->state] & QIB_PROCESS_RECV_OK)) + goto ack_done; /* Ignore invalid responses. */ if (qib_cmp24(psn, qp->s_next_psn) >= 0) -- cgit v1.2.3-70-g09d2 From c0af2c057d7ce3f0b260f9380d187a82bb5cab28 Mon Sep 17 00:00:00 2001 From: Mike Marciniszyn Date: Wed, 16 Feb 2011 15:48:25 +0000 Subject: IB/qib: Prevent double completions after a timeout or RNR error There is a double completion associated with error handling for RC QPs. The sequence is: - The do_rc_ack() routine fields an RNR nack and there are 0 rnr_retries configured on the QP. - qib_error_qp() stops the pending timer - qib_rc_send_complete() is called from sdma_complete() - qib_rc_send_complete() starts the timer because the msb of the psn just completed says an ack is needed. - a bunch of flushes occur as ipoib posts WQEs to an error'ed QP - rc_timeout() calls qib_restart_rc() - qib_restart_rc() calls qib_send_complete() with a IB_WC_RETRY_EXC_ERR on a wqe that has already been completed in the past The fix avoids starting the timer since another packet will never arrive. Signed-off-by: Mike Marciniszyn Signed-off-by: Roland Dreier --- drivers/infiniband/hw/qib/qib_rc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'drivers/infiniband') diff --git a/drivers/infiniband/hw/qib/qib_rc.c b/drivers/infiniband/hw/qib/qib_rc.c index 31e09b05a1a..eca0c41f122 100644 --- a/drivers/infiniband/hw/qib/qib_rc.c +++ b/drivers/infiniband/hw/qib/qib_rc.c @@ -1005,7 +1005,8 @@ void qib_rc_send_complete(struct qib_qp *qp, struct qib_ib_header *hdr) * there are still requests that haven't been acked. */ if ((psn & IB_BTH_REQ_ACK) && qp->s_acked != qp->s_tail && - !(qp->s_flags & (QIB_S_TIMER | QIB_S_WAIT_RNR | QIB_S_WAIT_PSN))) + !(qp->s_flags & (QIB_S_TIMER | QIB_S_WAIT_RNR | QIB_S_WAIT_PSN)) && + (ib_qib_state_ops[qp->state] & QIB_PROCESS_RECV_OK)) start_timer(qp); while (qp->s_last != qp->s_acked) { -- cgit v1.2.3-70-g09d2