interesting optimization

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

interesting optimization

Seung Jae Lee
Hello, LLVMers.

I ran a simple code twice with a very slight modification and found something interesting.

I ran this first:
///////////////////////HL source code////////////////////////
unsigned foo() {

  unsigned i,j;
  unsigned sum = 0;
  for (i=0; i<10; i++)
  {
    sum +=  i;
    for (j=0; j<3; j++)
      sum += 2;
  }

  return sum;
}
////////////////////////////////////////////////////////////
(It returns 105 when executed.)

The IR corresponding to the above is:
///////////////////////LLVM IR//////////////////////////////
define i32 @foo() nounwind  {
entry:
        br label %bb9.outer.us

bb9.outer.us:           ; preds = %bb9.outer.us, %entry
        %indvar42 = phi i32 [ 0, %entry ], [ %indvar.next48, %bb9.outer.us ]            ; <i32> [#uses=2]
        %sum.0.pn.ph.us = phi i32 [ 0, %entry ], [ %sum.1.lcssa.us, %bb9.outer.us ]             ; <i32> [#uses=1]
        %tmp = add i32 %indvar42, 6             ; <i32> [#uses=1]
        %sum.1.lcssa.us = add i32 %sum.0.pn.ph.us, %tmp         ; <i32> [#uses=2]
        %indvar.next48 = add i32 %indvar42, 1           ; <i32> [#uses=2]
        %exitcond49 = icmp eq i32 %indvar.next48, 10            ; <i1> [#uses=1]
        br i1 %exitcond49, label %bb21.split, label %bb9.outer.us

bb21.split:             ; preds = %bb9.outer.us
        ret i32 %sum.1.lcssa.us
}
////////////////////////////////////////////////////////////
As shown in IR above, only the inner loop is optimized (by directly adding 6).

I simply changed the HL source code like this:
(I only changed the type of 'sum' from 'unsigend' into 'float')
////////////////////////////////////////////////////////////
unsigned foo() {

  unsigned i,j;
  float sum = 0;
  for (i=0; i<10; i++)
  {
    sum += i;
    for (j=0; j<3; j++)
      sum += 2;
  }

  return sum;
}
////////////////////////////////////////////////////////////
(I know this is not dandy but just for trial...)

The corresponding IR is shown like this:
(LLVM simply spit out the return value after better optimization.)
////////////////////////////////////////////////////////////
define i32 @foo() nounwind  {
entry:
        ret i32 105
}
////////////////////////////////////////////////////////////

This is quite interesting to me because optimization level is different although I just changed a datum type.

Is there anybody can explain for me why this could change the optimization?

Thanks,
Seung

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev