c++ - Largest data type which can be fetch-ANDed atomically? -


i wanted try , atomically reset 256 bits using this:

#include <x86intrin.h> #include <iostream> #include <array> #include <atomic>  int main(){      std::array<std::atomic<__m256i>, 10> updatearray;      __m256i allzeros = _mm256_setzero_si256();      updatearray[0].fetch_and(allzeros); } 

but compiler errors element not having fetch_and(). not possible because 256 bit type large guarantee atomicity?

is there other way can implement this? using gcc.

if not, largest type can reset atomically- 64 bits?

edit: avx instructions perform fetch-and atomically?

so there few different things need solved:

  1. what can processor do?
  2. what mean atomically?
  3. can make compiler generate code processor can do?
  4. does c++11/14 standard support that?

for #1 , #2:

in x86, there instructions 8, 16, 32, 64, 128, 256 , 512 bit operations. 1 processor [at least if data aligned it's own size] perform operation atomically. however, operation "true atomic", needs prevent race conditions within update of data [in other words, prevent other processor reading, modifying , writing same location]. aside small number of "implied lock" instructions, done adding "lock prefix" particular instruction - perform right kind of cache-talk [technical term] other processors in system ensure processor can update data.

we can't use vex instructions lock prefix (from intel's manual)

any vex-encoded instruction lock prefix preceding vex #ud

you need vex prefix use avx instructions, , #ud means "undefined instruction" - in other words, code cause processor exception if try execute it.

so, 100% processor can not atomic operation on 256 bits @ time. answer discusses sse instruction atomicity: sse instructions: cpus can atomic 16b memory operations?

#3 pretty meaningless if instruction isn't valid.

#4 - well, standard supports std::atomic<uintmax_t>, , if uintmax_t happens 128 or 256 bits, that. i'm not aware of processor supporting 128 or higher bits uintmax_t, language doesn't prevent it.

if requirement "atomic" isn't strong "need ensure 100% no other processor updates @ same time", using regular sse, avx or avx512 instructions suffice - there race conditions if have 2 processor(cores) doing read/modify/write operations on same bit of memory simultaneously.

the largest atomic operation on x86 cmpxchg16b, swap 2 64-bit integer registers content in memory if value in 2 other registers match value in memory. come reads 1 128-bit value, ands out bits, , stores new value atomically if nothing else got in there first - if happened, have repeat operation, , of course, it's not single atomic and-operation either.

of course, on other platforms intel , amd, behaviour may different.


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -