c++ - Largest data type which can be fetch-ANDed atomically? -
i wanted try , atomically reset 256 bits using this:
#include <x86intrin.h> #include <iostream> #include <array> #include <atomic> int main(){ std::array<std::atomic<__m256i>, 10> updatearray; __m256i allzeros = _mm256_setzero_si256(); updatearray[0].fetch_and(allzeros); }
but compiler errors element not having fetch_and()
. not possible because 256 bit type large guarantee atomicity?
is there other way can implement this? using gcc.
if not, largest type can reset atomically- 64 bits?
edit: avx instructions perform fetch-and atomically?
so there few different things need solved:
- what can processor do?
- what mean atomically?
- can make compiler generate code processor can do?
- does c++11/14 standard support that?
for #1 , #2:
in x86, there instructions 8, 16, 32, 64, 128, 256 , 512 bit operations. 1 processor [at least if data aligned it's own size] perform operation atomically. however, operation "true atomic", needs prevent race conditions within update of data [in other words, prevent other processor reading, modifying , writing same location]. aside small number of "implied lock" instructions, done adding "lock prefix" particular instruction - perform right kind of cache-talk [technical term] other processors in system ensure processor can update data.
we can't use vex instructions lock prefix (from intel's manual)
any vex-encoded instruction lock prefix preceding vex #ud
you need vex prefix use avx instructions, , #ud means "undefined instruction" - in other words, code cause processor exception if try execute it.
so, 100% processor can not atomic operation on 256 bits @ time. answer discusses sse instruction atomicity: sse instructions: cpus can atomic 16b memory operations?
#3 pretty meaningless if instruction isn't valid.
#4 - well, standard supports std::atomic<uintmax_t>
, , if uintmax_t
happens 128 or 256 bits, that. i'm not aware of processor supporting 128 or higher bits uintmax_t
, language doesn't prevent it.
if requirement "atomic" isn't strong "need ensure 100% no other processor updates @ same time", using regular sse, avx or avx512 instructions suffice - there race conditions if have 2 processor(cores) doing read/modify/write operations on same bit of memory simultaneously.
the largest atomic operation on x86 cmpxchg16b, swap 2 64-bit integer registers content in memory if value in 2 other registers match value in memory. come reads 1 128-bit value, ands out bits, , stores new value atomically if nothing else got in there first - if happened, have repeat operation, , of course, it's not single atomic and-operation either.
of course, on other platforms intel , amd, behaviour may different.
Comments
Post a Comment