[PATCH] Use 32 bit division in slab_put_obj()

Improve the performance of slab_put_obj().  Without the cast, gcc considers
ptrdiff_t a 64 bit signed integer and ends up emitting code to use a full
signed 128 bit divide on EM64T, which is substantially slower than a 32 bit
unsigned divide.

I noticed this when looking at the profile of a case where the slab balance
is just on edge and thrashes back and forth freeing a block.

Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/mm/slab.c b/mm/slab.c
index 6f8495e..88082ae 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1398,7 +1398,7 @@
 		struct slab *slabp = page_get_slab(virt_to_page(objp));
 		int objnr;
 
-		objnr = (objp - slabp->s_mem) / cachep->objsize;
+		objnr = (unsigned)(objp - slabp->s_mem) / cachep->objsize;
 		if (objnr) {
 			objp = slabp->s_mem + (objnr - 1) * cachep->objsize;
 			realobj = (char *)objp + obj_dbghead(cachep);
@@ -2341,7 +2341,7 @@
 	if (cachep->flags & SLAB_STORE_USER)
 		*dbg_userword(cachep, objp) = caller;
 
-	objnr = (objp - slabp->s_mem) / cachep->objsize;
+	objnr = (unsigned)(objp - slabp->s_mem) / cachep->objsize;
 
 	BUG_ON(objnr >= cachep->num);
 	BUG_ON(objp != slabp->s_mem + objnr * cachep->objsize);
@@ -2699,7 +2699,7 @@
 		slabp = page_get_slab(virt_to_page(objp));
 		l3 = cachep->nodelists[node];
 		list_del(&slabp->list);
-		objnr = (objp - slabp->s_mem) / cachep->objsize;
+		objnr = (unsigned)(objp - slabp->s_mem) / cachep->objsize;
 		check_spinlock_acquired_node(cachep, node);
 		check_slabp(cachep, slabp);