myri10ge: fix stop/go ordering even more

The doorbell writes may be seen out of order by the firmware if they
are in WC memory since the tx spin(un)lock does not flush WC writes.
Hence if the "stop" is written on a different CPU than the "go", it
is possible that the stop will arrive after the go unless we add an
explicit memory barrier (and mmiowb() is not enough).

It fixes transmit hangs in multi tx queue mode.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
index a5f428b..b378670 100644
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -75,7 +75,7 @@
 #include "myri10ge_mcp.h"
 #include "myri10ge_mcp_gen_header.h"
 
-#define MYRI10GE_VERSION_STR "1.4.3-1.375"
+#define MYRI10GE_VERSION_STR "1.4.3-1.378"
 
 MODULE_DESCRIPTION("Myricom 10G driver (10GbE)");
 MODULE_AUTHOR("Maintainer: help@myri.com");
@@ -1393,6 +1393,7 @@
 		if (tx->req == tx->done) {
 			tx->queue_active = 0;
 			put_be32(htonl(1), tx->send_stop);
+			mb();
 			mmiowb();
 		}
 		__netif_tx_unlock(dev_queue);
@@ -2865,6 +2866,7 @@
 	if ((mgp->dev->real_num_tx_queues > 1) && tx->queue_active == 0) {
 		tx->queue_active = 1;
 		put_be32(htonl(1), tx->send_go);
+		mb();
 		mmiowb();
 	}
 	tx->pkt_start++;