according bit twiddling hacks website, operation
unsigned int a; // value merge in non-masked bits unsigned int b; // value merge in masked bits unsigned int mask; // 1 bits b should selected; 0 a. unsigned int r; // result of (a & ~mask) | (b & mask) goes here r = ^ ((a ^ b) & mask);
allows merge 2 bit sequences a
, b
according mask. wondering:
- whether operation had specific/usual name?
- whether specific assembly instruction existing operation on instruction set?
in sse/avx programming, selective copying 1 vector based on mask called blend. sse4.1 added instructions pblendvb xmm1, xmm2/m128, <xmm0>
, implicit operand xmm0 controls bytes of src overwrite corresponding bytes in dst. (without sse4.1, you'd , and andnot mask onto 2 vectors, , or together; xor trick has less instruction-level parallelism, , requires @ least many mov instructions copy registers.)
there's immediate blend instruction, pblendw
, mask 8-bit immediate instead of register. , there 32-bit , 64-bit immediate blends (blendps
, blendpd
, vpblendd
) , variable blends (blendvps
, blendvpd
).
idk if other simd instruction sets (neon, altivec, whatever mips calls theirs, etc.) call them "blends" or not.
sse/avx (or x86 integer instructions) don't provide better usual bitwise xor/and doing bitwise (instead of element-wise) blends until avx512f.
avx512f can bitwise version of (or other bitwise ternary function) single vpternlogd
or vpternlogq
instruction. (the difference between d , q element sizes if use mask register merge-masking or zero-masking destination, didn't stop intel making separate intrinsics no-mask case:
__m512i _mm512_ternarylogic_epi32 (__m512i a, __m512i b, __m512i c, int imm8)
, equivalent ..._epi64 version.
the imm8
immediate byte truth table. every bit of destination determined independently, corresponding bits of a, b , c using them 3-bit index truth table. i.e. imm8[a:b:c]
.
avx512 fun play when appears in mainstream desktop/laptop cpus, that's couple years away still.
Comments
Post a Comment