c++ - memcpy where size is known at compile time -


i find myself tuning piece of code memory copied using memcpy , third parameter (size) known @ compile time.

the consumer of function calling memcpy similar this:

template <size_t s> void foo() {     void* dstmemory = whatevera     void* srcmemory = whateverb     memcpy(dstmemory, srcmemory, s)  } 

now, have expected memcpy intrinsic smart enough realise this:

foo<4>() 

... can replace memcpy in function 32 bit integer assignment. however, surprisingly find myself seeing >2x speedup doing this:

template<size_t size> inline void memcpy_fixed(void* dst, const void* src) {     memcpy(dst, src, size); }   template<> inline void memcpy_fixed<4>(void* dst, const void* src) { *((uint32_t*)dst) =  *((uint32_t*)src); } 

and rewriting foo to:

 template <size_t s>  void foo() {     void* dstmemory = whatevera     void* srcmemory = whateverb     memcpy_fixed<s>(dstmemory, srcmemory)  } 

both tests on clang (os x) -o3. have expected memcpy intrinsic smarter case size known @ compile time.

my compiler flags are:

-gline-tables-only -o3 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer

am asking of c++ compiler or there compiler flag missing?

memcpy not same *((uint32_t*)dst) = *((uint32_t*)src).

memcpy can deal unaligned memory.

by way, modern compiler replace memcpy of known size suitable code emission. small size emit things rep movsb, may not fastest enough in case.

if found particular case gain 2x speed , think need speed up, free hand dirty (with clear comments).


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -