consider following code:
#include <limits> #include <cstdint> using t = uint32_t; // or uint64_t t shift(t x, t y, t n) { return (x >> n) | (y << (std::numeric_limits<t>::digits - n)); }
according godbolt, clang 3.8.1 generates following assembly code -o1, -o2, -o3:
shift(unsigned int, unsigned int, unsigned int): movb %dl, %cl shrdl %cl, %esi, %edi movl %edi, %eax retq
while gcc 6.2 (even -mtune=haswell
) generates:
shift(unsigned int, unsigned int, unsigned int): movl $32, %ecx subl %edx, %ecx sall %cl, %esi movl %edx, %ecx shrl %cl, %edi movl %esi, %eax orl %edi, %eax ret
this seems far less optimized, since shrd fast on intel sandybridge , later. there anyway rewrite function facilitate optimization compilers (and in particular gcc) , favor use of shld/shrd assembly instructions?
or there gcc -mtune
or other options encourage gcc tune better modern intel cpus?
with -march=haswell
, emits bmi2 shlx / shrx, still not shrd.
no, can see no way gcc use shrd
instruction.
can manipulate output gcc generates changing -mtune
, -march
options.
or there gcc -mtune or other options encourage gcc tune better modern intel cpus?
yes can gcc generate bmi2 code:
e.g: x86-64 gcc6.2 -o3 -march=znver1 //amd zen
generates: (haswell timings).
code critical path latency reciprocal throughput --------------------------------------------------------------- mov eax, 32 * 0.25 sub eax, edx 1 0.25 shlx eax, esi, eax 1 0.5 shrx esi, edi, edx * 0.5 or eax, esi 1 0.25 ret total: 3 1.75
compared clang 3.8.1:
mov cl, dl 1 0.25 shrd edi, esi, cl 4 2 mov eax, edi * 0.25 ret total 5 2.25
given dependency chain here: shrd
slower on haswell, tied on sandybridge, slower on skylake.
reciprocal throughput faster shrx
sequence.
so depends, on post bmi processors gcc produces better code, pre-bmi clang wins.
shrd
has wildly varying timings on different processors, can see why gcc not overly fond of it.
-os
(optimize size) gcc still not select shrd
.
*) not part of timing because either not on critical path, or turns 0 latency register rename.
Comments
Post a Comment