c - Multiply-subtract in SSE -
i vectorizing piece of code , @ point have following setup:
register m128 = { 99,99,99,99,99,99,99,99 } register m128 b = { 100,50,119,30,99,40,50,20 }
i packing short
s in these registers, why have 8 values per register. subtract i'th element in b
corresponding value in a
if i'th value of b
greater or equal value in a
(in case, a
filled constant 99 ). end, first use greater or equal operation between b
, a
, yields, example:
register m128 c = { 1,0,1,0,1,0,0,0 }
to complete operation, i'd use multiply-and-subtract, i.e. store in b
operation b -= a*c
. result be:
b = { 1,50,20,30,0,40,50,20 }
is there operation such thing? found fused operations haswell, working on sandy-bridge. also, if has better idea this, please let me know (e.g. logical subtract: if 1 in c
subtract, nothing otherwise.
you want sse version of code, right?
if (b >= a) t = b-a else t = b b = t
since want avoid conditionals the sse version can rid of control flow (note mask inverted):
uint16_t mask = (b>=a)-1 uint16_t tmp = b-a; uint16_t d = (b & mask) | (tmp & ~mask) b = d
i've checked _mm_cmpgt_epi16
intrinsic , has nice property in returns either 0x0000
false or 0xffff
true, instead of single bit 0 or 1 (thereby eliminating need first subtraction). therefore our sse version might this.
__m128i mask = _mm_cmpgt_epi16 (b, a) __m128i tmp = _mm_sub_epi16 (b, a) __m128 d = _mm_or_ps (_mm_and_ps (mask, tmp), _mm_andnot_ps (mask, b))
edit: harold has mentioned far less complicated answer. above solution might helpful if need modify else part of if/else.
uint16_t mask = ~( (b>=a)-1 ) uint16_t tmp = & mask b = b - tmp
the sse code be
__m128i mask = _mm_cmpgt_epi16 (b, a) __m128i t = _mm_sub_epi16 (b, _mm_and_si128 (mask, a))
Comments
Post a Comment