c - Multiply-subtract in SSE -


i vectorizing piece of code , @ point have following setup:

register m128 = { 99,99,99,99,99,99,99,99 } register m128 b = { 100,50,119,30,99,40,50,20 } 

i packing shorts in these registers, why have 8 values per register. subtract i'th element in b corresponding value in a if i'th value of b greater or equal value in a (in case, a filled constant 99 ). end, first use greater or equal operation between b , a, yields, example:

register m128 c = { 1,0,1,0,1,0,0,0 } 

to complete operation, i'd use multiply-and-subtract, i.e. store in b operation b -= a*c. result be:

b = { 1,50,20,30,0,40,50,20 } 

is there operation such thing? found fused operations haswell, working on sandy-bridge. also, if has better idea this, please let me know (e.g. logical subtract: if 1 in c subtract, nothing otherwise.

you want sse version of code, right?

if (b >= a)     t = b-a else     t = b b = t 

since want avoid conditionals the sse version can rid of control flow (note mask inverted):

uint16_t mask = (b>=a)-1 uint16_t tmp = b-a; uint16_t d = (b & mask) | (tmp & ~mask) b = d 

i've checked _mm_cmpgt_epi16 intrinsic , has nice property in returns either 0x0000 false or 0xffff true, instead of single bit 0 or 1 (thereby eliminating need first subtraction). therefore our sse version might this.

__m128i mask = _mm_cmpgt_epi16 (b, a) __m128i tmp = _mm_sub_epi16 (b, a) __m128 d = _mm_or_ps (_mm_and_ps (mask, tmp), _mm_andnot_ps (mask, b)) 

edit: harold has mentioned far less complicated answer. above solution might helpful if need modify else part of if/else.

uint16_t mask = ~( (b>=a)-1 ) uint16_t tmp = & mask b = b - tmp 

the sse code be

__m128i mask = _mm_cmpgt_epi16 (b, a) __m128i t = _mm_sub_epi16 (b, _mm_and_si128 (mask, a)) 

Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -