c++ - memcpy where size is known at compile time -
i find myself tuning piece of code memory copied using memcpy
, third parameter (size) known @ compile time.
the consumer of function calling memcpy
similar this:
template <size_t s> void foo() { void* dstmemory = whatevera void* srcmemory = whateverb memcpy(dstmemory, srcmemory, s) }
now, have expected memcpy
intrinsic smart enough realise this:
foo<4>()
... can replace memcpy
in function 32 bit integer assignment. however, surprisingly find myself seeing >2x speedup doing this:
template<size_t size> inline void memcpy_fixed(void* dst, const void* src) { memcpy(dst, src, size); } template<> inline void memcpy_fixed<4>(void* dst, const void* src) { *((uint32_t*)dst) = *((uint32_t*)src); }
and rewriting foo
to:
template <size_t s> void foo() { void* dstmemory = whatevera void* srcmemory = whateverb memcpy_fixed<s>(dstmemory, srcmemory) }
both tests on clang (os x) -o3. have expected memcpy
intrinsic smarter case size known @ compile time.
my compiler flags are:
-gline-tables-only -o3 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer
am asking of c++ compiler or there compiler flag missing?
memcpy
not same *((uint32_t*)dst) = *((uint32_t*)src)
.
memcpy can deal unaligned memory.
by way, modern compiler replace memcpy of known size suitable code emission. small size emit things rep movsb
, may not fastest enough in case.
if found particular case gain 2x speed , think need speed up, free hand dirty (with clear comments).
Comments
Post a Comment