tile: optimize and clean up string functions
This change cleans up the string code in a number of ways:
- For memcpy(), fix bug in prefetch and increase distance to 3 lines;
optimize for unaligned data; do all loads before wh64 to make memcpy
safe for forward-overlapping calls; etc. Performance is improved.
- Use new copy_byte() function on tilegx to spread a single byte value
out into a full word using the shufflebytes instruction.
- Clean up header include ordering to be more canonical, and remove
spurious #undefs of function names.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
diff --git a/arch/tile/lib/string-endian.h b/arch/tile/lib/string-endian.h
index c0eed7c..2e49cbf 100644
--- a/arch/tile/lib/string-endian.h
+++ b/arch/tile/lib/string-endian.h
@@ -1,5 +1,5 @@
/*
- * Copyright 2011 Tilera Corporation. All Rights Reserved.
+ * Copyright 2013 Tilera Corporation. All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
@@ -31,3 +31,14 @@
#define CFZ(x) __insn_clz(x)
#define REVCZ(x) __insn_ctz(x)
#endif
+
+/*
+ * Create eight copies of the byte in a uint64_t. Byte Shuffle uses
+ * the bytes of srcB as the index into the dest vector to select a
+ * byte. With all indices of zero, the first byte is copied into all
+ * the other bytes.
+ */
+static inline uint64_t copy_byte(uint8_t byte)
+{
+ return __insn_shufflebytes(byte, 0, 0);
+}