Lab 06: Vectorization Lab

Source Code:

#include
#include

int main(void){
int a1[1000];
int a2[1000];
int a3[1000];
int a, b, c, d, e;

long n = 1000000;

for(a = 0; a < 1000; a++){
a1[a] = rand()%100;
a2[a] = rand()%100;
}
for(b = 0; b < 1000; b++){
a3[b] = (a1[b] + a2[b]);
}
for(c = 0; c < 1000; c++){
a3[c] += n ;
}
for(d = 0; d < 1000; d++){
printf(“a3[%d] = %d\n”, d, a3[d]);
}
return 0;
}

Compiler command:
gcc -ftree-vectorizer-verbose=6 -O3 lab6.c

Dissassembly listing:
<main>

4004c0:       d1400bff        sub     sp, sp, #0x2, lsl #12
4004c4:       d13b83ff        sub     sp, sp, #0xee0
4004c8:       a9bd7bfd        stp     x29, x30, [sp,#-48]!
4004cc:       910003fd        mov     x29, sp
4004d0:       d281fa01        mov     x1, #0xfd0                      // #4048
4004d4:       a90153f3        stp     x19, x20, [sp,#16]
4004d8:       a9025bf5        stp     x21, x22, [sp,#32]
4004dc:       d2800013        mov     x19, #0x0                       // #0
4004e0:       9100c3b6        add     x22, x29, #0x30
4004e4:       52800c94        mov     w20, #0x64                      // #100
4004e8:       8b1d0035        add     x21, x1, x29
4004ec:       97ffffe5        bl      400480 <rand@plt>
4004f0:       1ad40c01        sdiv    w1, w0, w20
4004f4:       1b148020        msub    w0, w1, w20, w0
4004f8:       b8366a60        str     w0, [x19,x22]
4004fc:       97ffffe1        bl      400480 <rand@plt>
400500:       1ad40c01        sdiv    w1, w0, w20
400504:       1b148020        msub    w0, w1, w20, w0
400508:       b8356a60        str     w0, [x19,x21]
40050c:       91001273        add     x19, x19, #0x4
400510:       f13e827f        cmp     x19, #0xfa0
400514:       54fffec1        b.ne    4004ec <main+0x2c>
400518:       d283ee03        mov     x3, #0x1f70                     // #8048
40051c:       d2800000        mov     x0, #0x0                        // #0
400520:       8b0303b4        add     x20, x29, x3
400524:       9100c3a1        add     x1, x29, #0x30
400528:       913f43a3        add     x3, x29, #0xfd0
40052c:       8b000022        add     x2, x1, x0
400530:       8b000061        add     x1, x3, x0
400534:       4c407820        ld1     {v0.4s}, [x1] 400538:       4c407841        ld1     {v1.4s}, [x2]
40053c:       8b000281        add     x1, x20, x0
400540:       4ea08420        add     v0.4s, v1.4s, v0.4s
400544:       91004000        add     x0, x0, #0x10
400548:       4c007820        st1     {v0.4s}, [x1]
40054c:       f13e801f        cmp     x0, #0xfa0
400550:       54fffea1        b.ne    400524 <main+0x64>
400554:       d285e201        mov     x1, #0x2f10                     // #12048
400558:       10000342        adr     x2, 4005c0 <main+0x100>
40055c:       aa1403e0        mov     x0, x20
400560:       8b1d0021        add     x1, x1, x29
400564:       4c407841        ld1     {v1.4s}, [x2]
400568:       4c407800        ld1     {v0.4s}, [x0]
40056c:       4ea18400        add     v0.4s, v0.4s, v1.4s
400570:       4c9f7800        st1     {v0.4s}, [x0], #16
400574:       eb01001f        cmp     x0, x1
400578:       54ffff81        b.ne    400568 <main+0xa8>
40057c:       90000015        adrp    x21, 400000
400580:       d2800013        mov     x19, #0x0                       // #0
400584:       911f42b5        add     x21, x21, #0x7d0
400588:       b8737a82        ldr     w2, [x20,x19,lsl #2]
40058c:       2a1303e1        mov     w1, w19
400590:       aa1503e0        mov     x0, x21
400594:       97ffffc7        bl      4004b0 <printf@plt> 400598:       91000673        add     x19, x19, #0x1
40059c:       f10fa27f        cmp     x19, #0x3e8
4005a0:       54ffff41        b.ne    400588 <main+0xc8>
4005a4:       a94153f3        ldp     x19, x20, [sp,#16]
4005a8:       a9425bf5        ldp     x21, x22, [sp,#32]
4005ac:       a8c37bfd        ldp     x29, x30, [sp],#48
4005b0:       52800000        mov     w0, #0x0                        // #0
4005b4:       913b83ff        add     sp, sp, #0xee0
4005b8:       91400bff        add     sp, sp, #0x2, lsl #12                                                                           4005bc:       d65f03c0        ret
4005c0:       000f4240        .word   0x000f4240
4005c4:       000f4240        .word   0x000f4240
4005c8:       000f4240        .word   0x000f4240
4005cc:       000f4240        .word   0x000f4240

Reflection:

It took me a while to understand how to write the code so that the compiler would vectorize it. After a couple of tries at it, I finally understood that the compiler would only vectorize it when there are multiple loops. I believe what it does is it checks one for loop and then just starts working with multiple loops if it finds any dependencies between each other.

Volume-sampling-via-SIMD solution:
I have no solution yet.

Advertisements

One thought on “Lab 06: Vectorization Lab

  1. Hi Lawrence — this doesn’t look vectorized at all to me! Use the -ftree-vectorizer-verbose=6 argument to GCC (along with -O3) to see the vectorization decisions, and adjust the code to get vectorization to take place. When vectorized, you’ll see reference to the vector registers (e.g., v0, v1, …) in the object code.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s