linux - Big difference in overhead caused by instructions in straight-line code -


i trying understand overhead in [blk_account_io_completion][1] in linux block layer. using perf annotate following snippet (abridged). can shed light on reason add , test instruction have such overheads compared neighboring instruction executed them?

         :                      part_stat_add(cpu, part, sectors[rw], bytes >> 9);     0.13 :        ffffffff813336eb:       movsxd r8,r8d     0.00 :        ffffffff813336ee:       lea    rdx,[rax*8+0x0]     0.00 :        ffffffff813336f6:       mov    rcx,qword ptr [rdi+0x210]    72.04 :        ffffffff813336fd:       add    rcx,qword ptr [r8*8-0x7e2df6a0]     0.22 :        ffffffff81333705:       add    qword ptr [rcx+rdx*1],rsi     0.61 :        ffffffff81333709:       mov    eax,dword ptr [rdi+0x1f4]    26.52 :        ffffffff8133370f:       test   eax,eax     0.00 :        ffffffff81333711:       je     ffffffff81333733 <blk_account_io_completion+0x83> 

one possible reason these instructions happen pointed instruction pointer when sample taken. typical x86 cpu can retire 4 instructions per cycle, when , sample token, program counter point 1 instruction, not four.

here example - see below. simple plain loop bunch of nop instructions. note how clockticks distribute on profile 3 instructions in gaps. may similar effect seeing.

alternatively, mov rcx,qword ptr [rdi+0x210] , mov eax,dword ptr [rdi+0x1f4] miss cache cycles spent on being attributed next instruction, see here.

        │    disassembly of section .text:        │        │    00000000004004ed :        │      push   %rbp        │      mov    %rsp,%rbp        │      movl   $0x0,-0x4(%rbp)        │    ↓ jmp    25  14.59 │ d:   nop        │      nop        │      nop   0.03 │      nop  14.58 │      nop        │      nop        │      nop   0.08 │      nop  13.89 │      nop        │      nop   0.01 │      nop   0.08 │      nop  13.99 │      nop        │      nop   0.01 │      nop   0.05 │      nop  13.92 │      nop        │      nop   0.01 │      nop   0.07 │      nop  14.44 │      addl   $0x1,-0x4(%rbp)   0.33 │25:   cmpl   $0x3fffffff,-0x4(%rbp)  13.90 │    ↑ jbe    d        │      pop    %rbp        │    ← retq 

Comments