Re: Soft-float

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Soft-float

Leo Romanoff
Hi,

I tried out the new soft-float support from the mainline.

Overall, it looks very nice and pretty clean. It is now extremely easy
to add the soft-float support for your target. Just do not call
addRegisterClass() for your FP types and they will be expanded into
libcalls.

But there are several minor things that would be still nice to have:

a) It is not possible to express that:
   - f32 and f64 are both illegal and therefore are mapped to integers
   - but only f64 is emulated on the target and there are no f32
arithmetic libcalls available (which is the case on our target)  

To make it possible, f32 should be always promoted first to f64 and
then an f64 operation should be applied.

I see a small problem here with the current code, since f32 should be
promoted to the illegal type f64. It might require some special-case
handling eventually. For example, what should be the result
getTypeToTransformTo(f32)? On the one hand, it is f64. On the other
hand it is illegal.
           
b) LLVM currently uses hard-wired library function names for          
  FP libcalls (and ALL other libcalls as well). It would be nice if
they would be customizable, since some targets (e.g. some embedded
systems) have some existing naming conventions that are to be followed
and cannot be changed. For example, on our embedded target all libcalls
for soft-float and some integer operations have very target-specific
names for historical reasons.

TargetLowering class specific for a target could be extended to handle
that and this would require only minor changes.
             
c) LLVM libcalls currently pass their parameters on stack. But on some
embedded systems FP support routines expect parameters on specific
registers.            
   At the moment, SelectionDAGLegalize::ExpandLibCall() explicitly uses
CallingConv::C, but it could be made customizable by introducing a
special libcall calling convention or even better by allowing the
target specific lowering of libcalls. Actually it can be combined with
the solution for (b). In this case target-specific lowering can take
care about names of libcalls and also how they handle their parameters
and return values.
             
d) Would it be possible with current implementation of soft-float
support to map f32/f64 to integer types smaller than i32, e.g. to i16?
I have the impression that it is not necessarily the case, since it
would require that f64 is split into 4 parts.
   This question is more about a theoretical possibility. At the moment
my embedded target supports i32 registers. But some embedded systems
are still only 16bit, which means that they would need something like
this.
   I'm wondering, how easy or difficult would it be to support such a
mapping to any integer type?

My impression is that (b) and (c) are very easy to implement, but (a)
and (d) could be more chellenging.

Evan, I guess you are the most qualified person to judge about this,
since you implemented the new soft-float support.

What do you think about these extension proposals?

-Roman  


--- Roman Levenstein <[hidden email]> wrote:

> > Date: Tue, 19 Dec 2006 22:13:08 +0100
> From: Roman Levenstein <[hidden email]>
> To: Chris Lattner <[hidden email]>
> Subject: Re: Soft-float
>
> Hi Chris,
>
> > BTW, in mainline CVS, the LLVM legalizer now supports expanding
> > floating  point types to integer types and inserting GCC-style
> > libcalls for soft-float.  Please ask on the mailing list if you
> > have any questions,
>
> Oh, nice to know! Thanks for this info! Actually didn't have too much
>
> time for LLVM hacking recently, since I had a lot of business trips.
> But
> just last week I got a working version of soft-floats using both
> approaches: a simple post code selection pass and my legalizer-based
> solution. The second one was much more complex to implement, noy very
> "clean" and affected many different places of the CodeGen. I haven't
> done any performance comparisions yet to see if legalizer-based
> approach really brings any significant wins.
>
> I'll have a look at the mainline and compare it to my implementation
> to see the differences between them. I'll also try to formulate some
> questions and report about my experiences while changing the LLVM
> legalizer to support it.
>
> -Roman
>
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Soft-float

Chris Lattner
On Wed, 20 Dec 2006, Roman Levenstein wrote:
> Overall, it looks very nice and pretty clean. It is now extremely easy
> to add the soft-float support for your target. Just do not call
> addRegisterClass() for your FP types and they will be expanded into
> libcalls.

Great.

> a) It is not possible to express that:
>   - f32 and f64 are both illegal and therefore are mapped to integers
>   - but only f64 is emulated on the target and there are no f32
> arithmetic libcalls available (which is the case on our target)
>
> To make it possible, f32 should be always promoted first to f64 and
> then an f64 operation should be applied.

Ok.  This shouldn't be the default behavior (because it will generate
significantly less efficient code for targets that have both) but we
should support this.

> I see a small problem here with the current code, since f32 should be
> promoted to the illegal type f64. It might require some special-case
> handling eventually. For example, what should be the result
> getTypeToTransformTo(f32)? On the one hand, it is f64. On the other
> hand it is illegal.

I think getTypeToTransformTo(f32) should return f64.  That would be
recursively expanded to i64, then to 2x i32 if needed.

I haven't looked at the mechanics required to get this working, but it
shouldn't be too ugly.

> b) LLVM currently uses hard-wired library function names for
>  FP libcalls (and ALL other libcalls as well). It would be nice if
> they would be customizable, since some targets (e.g. some embedded
> systems) have some existing naming conventions that are to be followed
> and cannot be changed. For example, on our embedded target all libcalls
> for soft-float and some integer operations have very target-specific
> names for historical reasons.
>
> TargetLowering class specific for a target could be extended to handle
> that and this would require only minor changes.

Yes, TargetLowering would be a natural place to put this.  Patches welcome
:)

> c) LLVM libcalls currently pass their parameters on stack. But on some
> embedded systems FP support routines expect parameters on specific
> registers.
>   At the moment, SelectionDAGLegalize::ExpandLibCall() explicitly uses
> CallingConv::C, but it could be made customizable by introducing a
> special libcall calling convention or even better by allowing the
> target specific lowering of libcalls. Actually it can be combined with
> the solution for (b). In this case target-specific lowering can take
> care about names of libcalls and also how they handle their parameters
> and return values.

Yep, TargetLowering can have a pair for each libcall: a function name and
a calling convention to use.  Patches welcome :)

> d) Would it be possible with current implementation of soft-float
> support to map f32/f64 to integer types smaller than i32, e.g. to i16?
> I have the impression that it is not necessarily the case, since it
> would require that f64 is split into 4 parts.

Yes, this should be fine.

>   This question is more about a theoretical possibility. At the moment
> my embedded target supports i32 registers. But some embedded systems
> are still only 16bit, which means that they would need something like
> this.
>   I'm wondering, how easy or difficult would it be to support such a
> mapping to any integer type?

It should be transparently handled by the framework.  Basically, you'd
get:

f32 -> f64 -> i64 -> 2x i32 -> 4x i16

If you don't add a register class for i32 or i64, but you do have one for
i16, legalize will already 'expand' them for you.

Note that we don't have any 16-bit targets in CVS, so there may be minor
bugs, but the framework is all there.  Duraid was working on a 16-bit port
at one point (and reported a few bugs, which were fixed) but it was never
contributed.

-Chris

--
http://nondot.org/sabre/
http://llvm.org/
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Soft-float

Evan Cheng-2
>
>> d) Would it be possible with current implementation of soft-float
>> support to map f32/f64 to integer types smaller than i32, e.g. to  
>> i16?
>> I have the impression that it is not necessarily the case, since it
>> would require that f64 is split into 4 parts.
>
> Yes, this should be fine.
>
>>   This question is more about a theoretical possibility. At the  
>> moment
>> my embedded target supports i32 registers. But some embedded systems
>> are still only 16bit, which means that they would need something like
>> this.
>>   I'm wondering, how easy or difficult would it be to support such a
>> mapping to any integer type?
>
> It should be transparently handled by the framework.  Basically, you'd
> get:
>
> f32 -> f64 -> i64 -> 2x i32 -> 4x i16
>
> If you don't add a register class for i32 or i64, but you do have  
> one for
> i16, legalize will already 'expand' them for you.
>

This will probably require a slightly more extensive patch to  
legalizer. The current mechanism assumes either 1->1 or 1->2  
expansion. It also assumes the result of expansion are of legal  
types. That means, you will have to either 1) modify ExpandOp() to  
handle cases which need to be recursively expanded or 2) modify it to  
return a vector of SDOperand's. Solution one is what I would pursue.  
It's not done simply because there isn't a need for it right now. :-)

Evan

>
> -Chris
>
> --
> http://nondot.org/sabre/
> http://llvm.org/
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Soft-float

Leo Romanoff
> >> d) Would it be possible with current implementation of soft-float
> >> support to map f32/f64 to integer types smaller than i32, e.g. to
>
> >> i16?
> >> I have the impression that it is not necessarily the case, since
> it would require that f64 is split into 4 parts.
> >
> > Yes, this should be fine.
> >
> >>   This question is more about a theoretical possibility. At the  
> >> moment my embedded target supports i32 registers. But some
>> embedded systems  are still only 16bit, which means that they would
>> need something likethis.
> >>   I'm wondering, how easy or difficult would it be to support such
> a  mapping to any integer type?
> >
> > It should be transparently handled by the framework.  Basically,
> you'd
> > get:
> >
> > f32 -> f64 -> i64 -> 2x i32 -> 4x i16
> >
> > If you don't add a register class for i32 or i64, but you do have  
> > one for i16, legalize will already 'expand' them for you.
> >
>
> This will probably require a slightly more extensive patch to  
> legalizer. The current mechanism assumes either 1->1 or 1->2  
> expansion.

Exactly. This is what I meant with "more chellenging";) It is assumed
at several places that 1->1 or 2->2 expanstions are taking place. A
generic case is not handled yet.

> It also assumes the result of expansion are of legal  
> types.

Yes. And this is also a reason why it is not too obvious how to handle
f32->f64 promotion and later f64->i64 expansion on targets that support
only f64 soft-floats.

Chris Lattner wrote:
>That would be recursively expanded to i64, then to 2x i32 if needed.

I tried to set getTypeToTransformTo(f32) to return f64, even when f64
is illegal type. But this recursive expansion does not take place with
the current legalizer implementation. Currently, it is assumed that the
result of  getTypeToTransformTo() is a legal type. For example,
CreateRegForValue tries to create a register of such a promoted type
and fails in the above mentioned case.
 

Evan wrote:
> That means, you will have to either 1) modify ExpandOp() to  
> handle cases which need to be recursively expanded or 2) modify it to
 
> return a vector of SDOperand's. Solution one is what I would pursue.

Agreed. I also feel that some sort of recursive expansion is required.

I also have a feeling that getTypeToTransformTo(MVT::ValueType) should
probably also recurse until it finds a type T where
getTypeToTransformTo(T) = T, i.e. it finds a legal type. This would
almost solve the issue with f32->f64 promotion where both FP types are
illegal. The only concern here is that in this case
getTypeToTransformTo(MVT::f32) would return MVT::i64 and therefore the
information about the fact that it should first be promoted to f64 is
lost. The problem is that getTypeToTransformTo() is used for two
"different" goals: to tell which type to use for register mapping and
to tell which type to use for promotions/expansions for the sake of
"type system correctness". May be it would even make sense to have two
different mappings because of this? One mapping will be used for
allocation of virtual registers and the like and would always return a
legal type and the other will be used just as getTypeToTransformTo() in
LegalizeOp(), ExpandOp() and PromoteOp() and can return also illegal
types?
 
> It's not done simply because there isn't a need for it right now. :-)

Since I have this need, I'll try to find a solution for this issue and
to provide a patch.

-Roman

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Soft-float

Evan Cheng-2

On Dec 20, 2006, at 2:06 PM, Roman Levenstein wrote:

>>
>> This will probably require a slightly more extensive patch to
>> legalizer. The current mechanism assumes either 1->1 or 1->2
>> expansion.
>
> Exactly. This is what I meant with "more chellenging";) It is assumed
> at several places that 1->1 or 2->2 expanstions are taking place. A
> generic case is not handled yet.
>
>> It also assumes the result of expansion are of legal
>> types.
>
> Yes. And this is also a reason why it is not too obvious how to handle
> f32->f64 promotion and later f64->i64 expansion on targets that  
> support
> only f64 soft-floats.
>
> Chris Lattner wrote:
>> That would be recursively expanded to i64, then to 2x i32 if needed.
>
> I tried to set getTypeToTransformTo(f32) to return f64, even when f64
> is illegal type. But this recursive expansion does not take place with
> the current legalizer implementation. Currently, it is assumed that  
> the
> result of  getTypeToTransformTo() is a legal type. For example,
> CreateRegForValue tries to create a register of such a promoted type
> and fails in the above mentioned case.

All of the issues can be solved by adding the logic to recursively  
expand operands. They shouldn't be too complicated.

>
>
> Evan wrote:
>> That means, you will have to either 1) modify ExpandOp() to
>> handle cases which need to be recursively expanded or 2) modify it to
>
>> return a vector of SDOperand's. Solution one is what I would pursue.
>
> Agreed. I also feel that some sort of recursive expansion is required.
>
> I also have a feeling that getTypeToTransformTo(MVT::ValueType) should
> probably also recurse until it finds a type T where
> getTypeToTransformTo(T) = T, i.e. it finds a legal type. This would
> almost solve the issue with f32->f64 promotion where both FP types are
> illegal. The only concern here is that in this case
> getTypeToTransformTo(MVT::f32) would return MVT::i64 and therefore the
> information about the fact that it should first be promoted to f64 is
> lost. The problem is that getTypeToTransformTo() is used for two
> "different" goals: to tell which type to use for register mapping and
> to tell which type to use for promotions/expansions for the sake of
> "type system correctness". May be it would even make sense to have two
> different mappings because of this? One mapping will be used for
> allocation of virtual registers and the like and would always return a
> legal type and the other will be used just as getTypeToTransformTo
> () in
> LegalizeOp(), ExpandOp() and PromoteOp() and can return also illegal
> types?

No need to change getTypeToTransformTo(). There is a getTypeToExpandTo
() that is expand the type recursively until it find a legal type.

>
>> It's not done simply because there isn't a need for it right now. :-)
>
> Since I have this need, I'll try to find a solution for this issue and
> to provide a patch.

Great! There are a few spots where ExpandOp() are called recursively.  
It would be nice to remove those and use the general expansion  
facility instead.

Evan

>
> -Roman
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Possible bug in the linear scan register allocator

Leo Romanoff
Hi,

I was working on extending soft-float support for handling expansion of
i64 and f64 into i16, i.e. on supporting the expansion of long values
of illegal types into more then two parts. For that I modified
SelectionDAGLowering::getValue() and several other functions.

This seems to work now on my test-cases, but while testing it I ran
into a different problem. I have the impression that I found a bug in
the linear-scan register allocator (local and simple allocators work
just fine on the same code). The problem is that linear scan loops for
ever under certain conditions. May be I overlook something in my target
specific code, but I think that it is wrong in any case if a register
allocator loops for ever.

I attach a part of the log file produced with "llc -debug" where you
can see reg.alloc related bits. Register naming convention is:
 rbN - for 8bit registers
 rwN - for 16bit registers
 rxN - for 32bit registers

I also attach the C source file and the  LLVM assembler file for your
convenience.
 
My personal understanding of what is going on is that it is due to the
incorrect joining of live intervals. If I disable intervals joining by
using --join-liveintervals=false, everything works fine.
According to the log file, what happens during joining is the
following:
 1) some of the fixed registers intervals are merged with some virtual
registers intervals
 2) later there is a need to spill one of the allocated registers, but
since all joined intervals are FIXED intervals now due to (1), they
cannot be spilled. Therefore, the register allocator loops for ever.

I would be grateful, if someone would confirm that this is a bug. And
of course, it would be very nice if one of the RegAlloc Gurus could fix
it ;)

-Roman

--- Evan Cheng <[hidden email]> wrote:

>
> On Dec 20, 2006, at 2:06 PM, Roman Levenstein wrote:
>
> >>
> >> This will probably require a slightly more extensive patch to
> >> legalizer. The current mechanism assumes either 1->1 or 1->2
> >> expansion.
> >
> > Exactly. This is what I meant with "more chellenging";) It is
> assumed
> > at several places that 1->1 or 2->2 expanstions are taking place. A
> > generic case is not handled yet.
> >
> >> It also assumes the result of expansion are of legal
> >> types.
> >
> > Yes. And this is also a reason why it is not too obvious how to
> handle
> > f32->f64 promotion and later f64->i64 expansion on targets that  
> > support
> > only f64 soft-floats.
> >
> > Chris Lattner wrote:
> >> That would be recursively expanded to i64, then to 2x i32 if
> needed.
> >
> > I tried to set getTypeToTransformTo(f32) to return f64, even when
> f64
> > is illegal type. But this recursive expansion does not take place
> with
> > the current legalizer implementation. Currently, it is assumed that
>  
> > the
> > result of  getTypeToTransformTo() is a legal type. For example,
> > CreateRegForValue tries to create a register of such a promoted
> type
> > and fails in the above mentioned case.
>
> All of the issues can be solved by adding the logic to recursively  
> expand operands. They shouldn't be too complicated.
>
> >
> >
> > Evan wrote:
> >> That means, you will have to either 1) modify ExpandOp() to
> >> handle cases which need to be recursively expanded or 2) modify it
> to
> >
> >> return a vector of SDOperand's. Solution one is what I would
> pursue.
> >
> > Agreed. I also feel that some sort of recursive expansion is
> required.
> >
> > I also have a feeling that getTypeToTransformTo(MVT::ValueType)
> should
> > probably also recurse until it finds a type T where
> > getTypeToTransformTo(T) = T, i.e. it finds a legal type. This would
> > almost solve the issue with f32->f64 promotion where both FP types
> are
> > illegal. The only concern here is that in this case
> > getTypeToTransformTo(MVT::f32) would return MVT::i64 and therefore
> the
> > information about the fact that it should first be promoted to f64
> is
> > lost. The problem is that getTypeToTransformTo() is used for two
> > "different" goals: to tell which type to use for register mapping
> and
> > to tell which type to use for promotions/expansions for the sake of
> > "type system correctness". May be it would even make sense to have
> two
> > different mappings because of this? One mapping will be used for
> > allocation of virtual registers and the like and would always
> return a
> > legal type and the other will be used just as getTypeToTransformTo
> > () in
> > LegalizeOp(), ExpandOp() and PromoteOp() and can return also
> illegal
> > types?
>
> No need to change getTypeToTransformTo(). There is a
> getTypeToExpandTo
> () that is expand the type recursively until it find a legal type.
>
> >
> >> It's not done simply because there isn't a need for it right now.
> :-)
> >
> > Since I have this need, I'll try to find a solution for this issue
> and
> > to provide a patch.
>
> Great! There are a few spots where ExpandOp() are called recursively.
>  
> It would be nice to remove those and use the general expansion  
> facility instead.
>
> Evan
>
> >
> > -Roman
> >
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
********** COMPUTING LIVE INTERVALS **********
********** Function: atof
entry:
0 %reg1026 = MOV16rm <fi#-4>, 1, %NOREG, 0
MOV16rm %reg1026<d> <fi#-4> 1 %mreg(0) 0
                register: %reg1026 +[2,32:0) +[136,142:0)
4 %reg1024 = MOV16rm <fi#-1>, 1, %NOREG, 0
MOV16rm %reg1024<d> <fi#-1> 1 %mreg(0) 0
                register: %reg1024 +[6,32:0) +[32,34:0)
8 %reg1027 = MOV16rm <fi#-3>, 1, %NOREG, 0
MOV16rm %reg1027<d> <fi#-3> 1 %mreg(0) 0
                register: %reg1027 +[10,32:0) +[136,146:0)
12 %reg1028 = MOV16rm <fi#-5>, 1, %NOREG, 0
MOV16rm %reg1028<d> <fi#-5> 1 %mreg(0) 0
                register: %reg1028 +[14,32:0) +[136,150:0)
16 %reg1025 = MOV16rm <fi#-2>, 1, %NOREG, 0
MOV16rm %reg1025<d> <fi#-2> 1 %mreg(0) 0
                register: %reg1025 +[18,32:0) +[136,138:0)
20 %reg1033 = MOV8rm %reg1024, 1, %NOREG, 0
MOV8rm %reg1033<d> %reg1024 1 %mreg(0) 0
                register: %reg1033 +[22,26:0)
24 CMP8ri %reg1033<kill>, 0
CMP8ri %reg1033 0
28 JPCC mbb<bb7,0x8771350>, 4
JPCC <mbb:bb7@0x8771350> 4
bb.preheader:
32 %reg1029 = MOV16rr %reg1024<kill>
MOV16rr %reg1029<d> %reg1024
                register: %reg1029 +[34,42:0)
36 %reg1029 = INC16r %reg1029
INC16r %reg1029<d> %reg1029
                register: %reg1029 replace range with [34,38:1)RESULT: %reg1029,0 = [34,38:1)[38,42:0)  0@? 1@34
40 %reg1042 = MOV16rr %reg1029<kill>
MOV16rr %reg1042<d> %reg1029
                register: %reg1042 +[42,44:0) +[44,46:0)
bb:
44 %reg1030 = MOV16rr %reg1042<kill>
MOV16rr %reg1030<d> %reg1042
                register: %reg1030 +[46,54:0)
48 %reg1031 = MOV8rm %reg1030, 1, %NOREG, 0
MOV8rm %reg1031<d> %reg1030 1 %mreg(0) 0
                register: %reg1031 +[50,68:0) +[76,78:0)
52 %reg1032 = MOV16rr %reg1030<kill>
MOV16rr %reg1032<d> %reg1030
                register: %reg1032 +[54,68:0) +[68,70:0)
56 %reg1032 = INC16r %reg1032
INC16r %reg1032<d> %reg1032
                register: %reg1032 replace range with [54,58:1)RESULT: %reg1032,0 = [54,58:1)[58,70:0)  0@? 1@54
60 CMP8ri %reg1031, 0
CMP8ri %reg1031 0
64 JPCC mbb<bb7.loopexit,0x8771310>, 4
JPCC <mbb:bb7.loopexit@0x8771310> 4
bb.bb_crit_edge:
68 %reg1042 = MOV16rr %reg1032<kill>
MOV16rr %reg1042<d> %reg1032
                register: %reg1042Removing [44,46] from: %reg1042,0 = [42,46:0)  0@42
RESULT: %reg1042,0 = [42,44:0)  0@42 replace range with [44,46:1)RESULT: %reg1042,0 = [42,44:0)[44,46:1)  0@42 1@? +[70,76:2)
72 JP mbb<bb,0x8771290>
JP <mbb:bb@0x8771290>
bb7.loopexit:
76 %reg1034 = MOVSX16rr8 %reg1031<kill>
MOVSX16rr8 %reg1034<d> %reg1031
                register: %reg1034 +[78,82:0)
80 %reg1035 = MOV16rr %reg1034<kill>
MOV16rr %reg1035<d> %reg1034
                register: %reg1035 +[82,102:0)
84 %reg1035 = SHL16ri %reg1035, 3
SHL16ri %reg1035<d> %reg1035 3
                register: %reg1035 replace range with [82,86:1)RESULT: %reg1035,0 = [82,86:1)[86,102:0)  0@? 1@82
88 %reg1036 = MOV16ri <ga:da>
MOV16ri %reg1036<d> <ga:da>
                register: %reg1036 +[90,98:0)
92 %reg1037 = MOV16rm %reg1035, 1, %reg1036, 0
MOV16rm %reg1037<d> %reg1035 1 %reg1036 0
                register: %reg1037 +[94,118:0)
96 %reg1038 = MOV16rr %reg1036<kill>
MOV16rr %reg1038<d> %reg1036
                register: %reg1038 +[98,114:0)
100 %reg1038 = ADDrr16 %reg1038, %reg1035<kill>
ADDrr16 %reg1038<d> %reg1038 %reg1035
                register: %reg1038 replace range with [98,102:1)RESULT: %reg1038,0 = [98,102:1)[102,114:0)  0@? 1@98
104 %reg1039 = MOV16rm %reg1038, 1, %NOREG, 6
MOV16rm %reg1039<d> %reg1038 1 %mreg(0) 6
                register: %reg1039 +[106,130:0)
108 %reg1040 = MOV16rm %reg1038, 1, %NOREG, 4
MOV16rm %reg1040<d> %reg1038 1 %mreg(0) 4
                register: %reg1040 +[110,126:0)
112 %reg1041 = MOV16rm %reg1038<kill>, 1, %NOREG, 2
MOV16rm %reg1041<d> %reg1038 1 %mreg(0) 2
                register: %reg1041 +[114,122:0)
116 %rw0 = MOV16rr %reg1037<kill>
MOV16rr %mreg(28)<d> %reg1037
                register: rw0 killed +[118,134:0)
                register: rx0 killed +[118,134:0)
                register: S0 killed +[118,134:0)
                register: rb0 killed +[118,134:0)
                register: rb4 killed +[118,134:0)
120 %rw1 = MOV16rr %reg1041<kill>
MOV16rr %mreg(3)<d> %reg1041
                register: rw1 killed +[122,134:0)
                register: rx1 killed +[122,134:0)
                register: S1 killed +[122,134:0)
                register: rb1 killed +[122,134:0)
                register: rb5 killed +[122,134:0)
124 %rw2 = MOV16rr %reg1040<kill>
MOV16rr %mreg(10)<d> %reg1040
                register: rw2 killed +[126,134:0)
                register: rx2 killed +[126,134:0)
                register: S2 killed +[126,134:0)
                register: rb2 killed +[126,134:0)
                register: rb6 killed +[126,134:0)
128 %rw3 = MOV16rr %reg1039<kill>
MOV16rr %mreg(13)<d> %reg1039
                register: rw3 killed +[130,134:0)
                register: rx3 killed +[130,134:0)
                register: S3 killed +[130,134:0)
                register: rb3 killed +[130,134:0)
                register: rb7 killed +[130,134:0)
132 RET %rx0<imp-use,kill>, %rx1<imp-use,kill>, %rx2<imp-use,kill>, %rx3<imp-use,kill>
RET %mreg(39) %mreg(29) %mreg(31) %mreg(33)
bb7:
136 %rw0 = MOV16rr %reg1025<kill>
MOV16rr %mreg(28)<d> %reg1025
                register: rw0 killed +[138,154:1)
                register: rx0 killed +[138,154:1)
                register: S0 killed +[138,154:1)
                register: rb0 killed +[138,154:1)
                register: rb4 killed +[138,154:1)
140 %rw1 = MOV16rr %reg1026<kill>
MOV16rr %mreg(3)<d> %reg1026
                register: rw1 killed +[142,154:1)
                register: rx1 killed +[142,154:1)
                register: S1 killed +[142,154:1)
                register: rb1 killed +[142,154:1)
                register: rb5 killed +[142,154:1)
144 %rw2 = MOV16rr %reg1027<kill>
MOV16rr %mreg(10)<d> %reg1027
                register: rw2 killed +[146,154:1)
                register: rx2 killed +[146,154:1)
                register: S2 killed +[146,154:1)
                register: rb2 killed +[146,154:1)
                register: rb6 killed +[146,154:1)
148 %rw3 = MOV16rr %reg1028<kill>
MOV16rr %mreg(13)<d> %reg1028
                register: rw3 killed +[150,154:1)
                register: rx3 killed +[150,154:1)
                register: S3 killed +[150,154:1)
                register: rb3 killed +[150,154:1)
                register: rb7 killed +[150,154:1)
152 RET %rx0<imp-use,kill>, %rx1<imp-use,kill>, %rx2<imp-use,kill>, %rx3<imp-use,kill>
RET %mreg(39) %mreg(29) %mreg(31) %mreg(33)
********** INTERVALS **********
rb0,inf = [118,134:0)[138,154:1)  0@? 1@?
rb5,inf = [122,134:0)[142,154:1)  0@? 1@?
rw1,inf = [122,134:0)[142,154:1)  0@122 1@142
rb1,inf = [122,134:0)[142,154:1)  0@? 1@?
rb6,inf = [126,134:0)[146,154:1)  0@? 1@?
rw2,inf = [126,134:0)[146,154:1)  0@126 1@146
rb2,inf = [126,134:0)[146,154:1)  0@? 1@?
rb7,inf = [130,134:0)[150,154:1)  0@? 1@?
rw3,inf = [130,134:0)[150,154:1)  0@130 1@150
rb3,inf = [130,134:0)[150,154:1)  0@? 1@?
S0,inf = [118,134:0)[138,154:1)  0@? 1@?
S1,inf = [122,134:0)[142,154:1)  0@? 1@?
S2,inf = [126,134:0)[146,154:1)  0@? 1@?
S3,inf = [130,134:0)[150,154:1)  0@? 1@?
rb4,inf = [118,134:0)[138,154:1)  0@? 1@?
rw0,inf = [118,134:0)[138,154:1)  0@118 1@138
rx1,inf = [122,134:0)[142,154:1)  0@? 1@?
rx2,inf = [126,134:0)[146,154:1)  0@? 1@?
rx3,inf = [130,134:0)[150,154:1)  0@? 1@?
rx0,inf = [118,134:0)[138,154:1)  0@? 1@?
%reg1024,0 = [6,34:0)  0@?
%reg1025,0 = [18,32:0)[136,138:0)  0@?
%reg1026,0 = [2,32:0)[136,142:0)  0@?
%reg1027,0 = [10,32:0)[136,146:0)  0@?
%reg1028,0 = [14,32:0)[136,150:0)  0@?
%reg1029,0 = [34,38:1)[38,42:0)  0@? 1@34
%reg1030,0 = [46,54:0)  0@46
%reg1031,0 = [50,68:0)[76,78:0)  0@?
%reg1032,0 = [54,58:1)[58,70:0)  0@? 1@54
%reg1033,0 = [22,26:0)  0@?
%reg1034,0 = [78,82:0)  0@?
%reg1035,0 = [82,86:1)[86,102:0)  0@? 1@82
%reg1036,0 = [90,98:0)  0@?
%reg1037,0 = [94,118:0)  0@?
%reg1038,0 = [98,102:1)[102,114:0)  0@? 1@98
%reg1039,0 = [106,130:0)  0@?
%reg1040,0 = [110,126:0)  0@?
%reg1041,0 = [114,122:0)  0@?
%reg1042,0 = [42,44:0)[44,46:1)[70,76:2)  0@42 1@? 2@70
********** JOINING INTERVALS ***********
bb:
44 %reg1030 = MOV16rr %reg1042<kill>
MOV16rr %reg1030<d> %reg1042
                Inspecting %reg1042,0 = [42,44:0)[44,46:1)[70,76:2)  0@42 1@? 2@70 and %reg1030,0 = [46,54:0)  0@46:
                Joined.  Result = %reg1042,0 = [42,44:1)[44,54:0)[70,76:2)  0@? 1@42 2@70
52 %reg1032 = MOV16rr %reg1030<kill>
MOV16rr %reg1032<d> %reg1030
                Inspecting %reg1042,0 = [42,44:1)[44,54:0)[70,76:2)  0@? 1@42 2@70 and %reg1032,0 = [54,58:1)[58,70:0)  0@? 1@54:
                Joined.  Result = %reg1042,0 = [42,44:2)[44,58:1)[58,76:0)  0@? 1@? 2@42
bb.bb_crit_edge:
68 %reg1042 = MOV16rr %reg1032<kill>
MOV16rr %reg1042<d> %reg1032
        Copy already coallesced.
entry:
bb.preheader:
32 %reg1029 = MOV16rr %reg1024<kill>
MOV16rr %reg1029<d> %reg1024
                Inspecting %reg1024,0 = [6,34:0)  0@? and %reg1029,0 = [34,38:1)[38,42:0)  0@? 1@34:
                Joined.  Result = %reg1029,0 = [6,38:1)[38,42:0)  0@? 1@?
40 %reg1042 = MOV16rr %reg1029<kill>
MOV16rr %reg1042<d> %reg1029
                Inspecting %reg1029,0 = [6,38:1)[38,42:0)  0@? 1@? and %reg1042,0 = [42,44:2)[44,58:1)[58,76:0)  0@? 1@? 2@42:
                Joined.  Result = %reg1042,0 = [6,38:3)[38,44:2)[44,58:1)[58,76:0)  0@? 1@? 2@? 3@?
bb7.loopexit:
80 %reg1035 = MOV16rr %reg1034<kill>
MOV16rr %reg1035<d> %reg1034
                Inspecting %reg1034,0 = [78,82:0)  0@? and %reg1035,0 = [82,86:1)[86,102:0)  0@? 1@82:
                Joined.  Result = %reg1035,0 = [78,86:1)[86,102:0)  0@? 1@?
96 %reg1038 = MOV16rr %reg1036<kill>
MOV16rr %reg1038<d> %reg1036
                Inspecting %reg1036,0 = [90,98:0)  0@? and %reg1038,0 = [98,102:1)[102,114:0)  0@? 1@98:
                Joined.  Result = %reg1038,0 = [90,102:1)[102,114:0)  0@? 1@?
116 %rw0 = MOV16rr %reg1037<kill>
MOV16rr %mreg(28)<d> %reg1037
                Inspecting %reg1037,0 = [94,118:0)  0@? and rw0,inf = [118,134:0)[138,154:1)  0@118 1@138:
                Joined.  Result = rw0,inf = [94,134:0)[138,154:1)  0@? 1@138
120 %rw1 = MOV16rr %reg1041<kill>
MOV16rr %mreg(3)<d> %reg1041
                Inspecting %reg1041,0 = [114,122:0)  0@? and rw1,inf = [122,134:0)[142,154:1)  0@122 1@142:
                Joined.  Result = rw1,inf = [114,134:0)[142,154:1)  0@? 1@142
124 %rw2 = MOV16rr %reg1040<kill>
MOV16rr %mreg(10)<d> %reg1040
                Inspecting %reg1040,0 = [110,126:0)  0@? and rw2,inf = [126,134:0)[146,154:1)  0@126 1@146:
                Joined.  Result = rw2,inf = [110,134:0)[146,154:1)  0@? 1@146
128 %rw3 = MOV16rr %reg1039<kill>
MOV16rr %mreg(13)<d> %reg1039
                Inspecting %reg1039,0 = [106,130:0)  0@? and rw3,inf = [130,134:0)[150,154:1)  0@130 1@150:
                Joined.  Result = rw3,inf = [106,134:0)[150,154:1)  0@? 1@150
bb7:
136 %rw0 = MOV16rr %reg1025<kill>
MOV16rr %mreg(28)<d> %reg1025
                Inspecting %reg1025,0 = [18,32:0)[136,138:0)  0@? and rw0,inf = [94,134:0)[138,154:1)  0@? 1@138:
                Joined.  Result = rw0,inf = [18,32:1)[94,134:0)[136,154:1)  0@? 1@?
140 %rw1 = MOV16rr %reg1026<kill>
MOV16rr %mreg(3)<d> %reg1026
                Inspecting %reg1026,0 = [2,32:0)[136,142:0)  0@? and rw1,inf = [114,134:0)[142,154:1)  0@? 1@142:
                Joined.  Result = rw1,inf = [2,32:1)[114,134:0)[136,154:1)  0@? 1@?
144 %rw2 = MOV16rr %reg1027<kill>
MOV16rr %mreg(10)<d> %reg1027
                Inspecting %reg1027,0 = [10,32:0)[136,146:0)  0@? and rw2,inf = [110,134:0)[146,154:1)  0@? 1@146:
                Joined.  Result = rw2,inf = [10,32:1)[110,134:0)[136,154:1)  0@? 1@?
148 %rw3 = MOV16rr %reg1028<kill>
MOV16rr %mreg(13)<d> %reg1028
                Inspecting %reg1028,0 = [14,32:0)[136,150:0)  0@? and rw3,inf = [106,134:0)[150,154:1)  0@? 1@150:
                Joined.  Result = rw3,inf = [14,32:1)[106,134:0)[136,154:1)  0@? 1@?
*** Register mapping ***
  reg 1024 -> %reg1029
  reg 1025 -> rw0
  reg 1026 -> rw1
  reg 1027 -> rw2
  reg 1028 -> rw3
  reg 1029 -> %reg1042
  reg 1030 -> %reg1042
  reg 1032 -> %reg1042
  reg 1034 -> %reg1035
  reg 1036 -> %reg1038
  reg 1037 -> rw0
  reg 1039 -> rw3
  reg 1040 -> rw2
  reg 1041 -> rw1
********** INTERVALS **********
rb0,inf = [18,32:3)[94,118:2)[118,134:0)[136,138:3)[138,154:1)  0@? 1@? 2@? 3@?
rb5,inf = [2,32:3)[114,122:2)[122,134:0)[136,142:3)[142,154:1)  0@? 1@? 2@? 3@?
rw1,inf = [2,32:1)[114,134:0)[136,154:1)  0@? 1@?
rb1,inf = [2,32:3)[114,122:2)[122,134:0)[136,142:3)[142,154:1)  0@? 1@? 2@? 3@?
rb6,inf = [10,32:3)[110,126:2)[126,134:0)[136,146:3)[146,154:1)  0@? 1@? 2@? 3@?
rw2,inf = [10,32:1)[110,134:0)[136,154:1)  0@? 1@?
rb2,inf = [10,32:3)[110,126:2)[126,134:0)[136,146:3)[146,154:1)  0@? 1@? 2@? 3@?
rb7,inf = [14,32:3)[106,130:2)[130,134:0)[136,150:3)[150,154:1)  0@? 1@? 2@? 3@?
rw3,inf = [14,32:1)[106,134:0)[136,154:1)  0@? 1@?
rb3,inf = [14,32:3)[106,130:2)[130,134:0)[136,150:3)[150,154:1)  0@? 1@? 2@? 3@?
S0,inf = [18,32:3)[94,118:2)[118,134:0)[136,138:3)[138,154:1)  0@? 1@? 2@? 3@?
S1,inf = [2,32:3)[114,122:2)[122,134:0)[136,142:3)[142,154:1)  0@? 1@? 2@? 3@?
S2,inf = [10,32:3)[110,126:2)[126,134:0)[136,146:3)[146,154:1)  0@? 1@? 2@? 3@?
S3,inf = [14,32:3)[106,130:2)[130,134:0)[136,150:3)[150,154:1)  0@? 1@? 2@? 3@?
rb4,inf = [18,32:3)[94,118:2)[118,134:0)[136,138:3)[138,154:1)  0@? 1@? 2@? 3@?
rw0,inf = [18,32:1)[94,134:0)[136,154:1)  0@? 1@?
rx1,inf = [2,32:3)[114,122:2)[122,134:0)[136,142:3)[142,154:1)  0@? 1@? 2@? 3@?
rx2,inf = [10,32:3)[110,126:2)[126,134:0)[136,146:3)[146,154:1)  0@? 1@? 2@? 3@?
rx3,inf = [14,32:3)[106,130:2)[130,134:0)[136,150:3)[150,154:1)  0@? 1@? 2@? 3@?
rx0,inf = [18,32:3)[94,118:2)[118,134:0)[136,138:3)[138,154:1)  0@? 1@? 2@? 3@?
%reg1031,1.05 = [50,68:0)[76,78:0)  0@?
%reg1033,inf = [22,26:0)  0@?
%reg1035,0.208333 = [78,86:1)[86,102:0)  0@? 1@?
%reg1038,0.291667 = [90,102:1)[102,114:0)  0@? 1@?
%reg1042,0.485714 = [6,38:3)[38,44:2)[44,58:1)[58,76:0)  0@? 1@? 2@? 3@?
********** MACHINEINSTRS **********
entry:
0 %rw1 = MOV16rm <fi#-4>, 1, %NOREG, 0
MOV16rm %mreg(3)<d> <fi#-4> 1 %mreg(0) 0
4 %reg1042 = MOV16rm <fi#-1>, 1, %NOREG, 0
MOV16rm %reg1042<d> <fi#-1> 1 %mreg(0) 0
8 %rw2 = MOV16rm <fi#-3>, 1, %NOREG, 0
MOV16rm %mreg(10)<d> <fi#-3> 1 %mreg(0) 0
12 %rw3 = MOV16rm <fi#-5>, 1, %NOREG, 0
MOV16rm %mreg(13)<d> <fi#-5> 1 %mreg(0) 0
16 %rw0 = MOV16rm <fi#-2>, 1, %NOREG, 0
MOV16rm %mreg(28)<d> <fi#-2> 1 %mreg(0) 0
20 %reg1033 = MOV8rm %reg1042, 1, %NOREG, 0
MOV8rm %reg1033<d> %reg1042 1 %mreg(0) 0
24 CMP8ri %reg1033<kill>, 0
CMP8ri %reg1033 0
28 JPCC mbb<bb7,0x8771350>, 4
JPCC <mbb:bb7@0x8771350> 4
bb.preheader:
36 %reg1042 = INC16r %reg1042
INC16r %reg1042<d> %reg1042
bb:
48 %reg1031 = MOV8rm %reg1042, 1, %NOREG, 0
MOV8rm %reg1031<d> %reg1042 1 %mreg(0) 0
56 %reg1042 = INC16r %reg1042
INC16r %reg1042<d> %reg1042
60 CMP8ri %reg1031, 0
CMP8ri %reg1031 0
64 JPCC mbb<bb7.loopexit,0x8771310>, 4
JPCC <mbb:bb7.loopexit@0x8771310> 4
bb.bb_crit_edge:
72 JP mbb<bb,0x8771290>
JP <mbb:bb@0x8771290>
bb7.loopexit:
76 %reg1035 = MOVSX16rr8 %reg1031<kill>
MOVSX16rr8 %reg1035<d> %reg1031
84 %reg1035 = SHL16ri %reg1035, 3
SHL16ri %reg1035<d> %reg1035 3
88 %reg1038 = MOV16ri <ga:da>
MOV16ri %reg1038<d> <ga:da>
92 %rw0 = MOV16rm %reg1035, 1, %reg1038, 0
MOV16rm %mreg(28)<d> %reg1035 1 %reg1038 0
100 %reg1038 = ADDrr16 %reg1038, %reg1035<kill>
ADDrr16 %reg1038<d> %reg1038 %reg1035
104 %rw3 = MOV16rm %reg1038, 1, %NOREG, 6
MOV16rm %mreg(13)<d> %reg1038 1 %mreg(0) 6
108 %rw2 = MOV16rm %reg1038, 1, %NOREG, 4
MOV16rm %mreg(10)<d> %reg1038 1 %mreg(0) 4
112 %rw1 = MOV16rm %reg1038<kill>, 1, %NOREG, 2
MOV16rm %mreg(3)<d> %reg1038 1 %mreg(0) 2
132 RET %rx0<imp-use,kill>, %rx1<imp-use,kill>, %rx2<imp-use,kill>, %rx3<imp-use,kill>
RET %mreg(39) %mreg(29) %mreg(31) %mreg(33)
bb7:
152 RET %rx0<imp-use,kill>, %rx1<imp-use,kill>, %rx2<imp-use,kill>, %rx3<imp-use,kill>
RET %mreg(39) %mreg(29) %mreg(31) %mreg(33)
********** LINEAR SCAN **********
********** Function: atof
fixed intervals:
        %reg1,inf = [18,32:3)[94,118:2)[118,134:0)[136,138:3)[138,154:1)  0@? 1@? 2@? 3@? -> rb0
        %reg2,inf = [2,32:3)[114,122:2)[122,134:0)[136,142:3)[142,154:1)  0@? 1@? 2@? 3@? -> rb5
        %reg3,inf = [2,32:1)[114,134:0)[136,154:1)  0@? 1@? -> rw1
        %reg4,inf = [2,32:3)[114,122:2)[122,134:0)[136,142:3)[142,154:1)  0@? 1@? 2@? 3@? -> rb1
        %reg5,inf = [10,32:3)[110,126:2)[126,134:0)[136,146:3)[146,154:1)  0@? 1@? 2@? 3@? -> rb6
        %reg10,inf = [10,32:1)[110,134:0)[136,154:1)  0@? 1@? -> rw2
        %reg11,inf = [10,32:3)[110,126:2)[126,134:0)[136,146:3)[146,154:1)  0@? 1@? 2@? 3@? -> rb2
        %reg12,inf = [14,32:3)[106,130:2)[130,134:0)[136,150:3)[150,154:1)  0@? 1@? 2@? 3@? -> rb7
        %reg13,inf = [14,32:1)[106,134:0)[136,154:1)  0@? 1@? -> rw3
        %reg17,inf = [14,32:3)[106,130:2)[130,134:0)[136,150:3)[150,154:1)  0@? 1@? 2@? 3@? -> rb3
        %reg18,inf = [18,32:3)[94,118:2)[118,134:0)[136,138:3)[138,154:1)  0@? 1@? 2@? 3@? -> S0
        %reg19,inf = [2,32:3)[114,122:2)[122,134:0)[136,142:3)[142,154:1)  0@? 1@? 2@? 3@? -> S1
        %reg20,inf = [10,32:3)[110,126:2)[126,134:0)[136,146:3)[146,154:1)  0@? 1@? 2@? 3@? -> S2
        %reg21,inf = [14,32:3)[106,130:2)[130,134:0)[136,150:3)[150,154:1)  0@? 1@? 2@? 3@? -> S3
        %reg27,inf = [18,32:3)[94,118:2)[118,134:0)[136,138:3)[138,154:1)  0@? 1@? 2@? 3@? -> rb4
        %reg28,inf = [18,32:1)[94,134:0)[136,154:1)  0@? 1@? -> rw0
        %reg29,inf = [2,32:3)[114,122:2)[122,134:0)[136,142:3)[142,154:1)  0@? 1@? 2@? 3@? -> rx1
        %reg31,inf = [10,32:3)[110,126:2)[126,134:0)[136,146:3)[146,154:1)  0@? 1@? 2@? 3@? -> rx2
        %reg33,inf = [14,32:3)[106,130:2)[130,134:0)[136,150:3)[150,154:1)  0@? 1@? 2@? 3@? -> rx3
        %reg39,inf = [18,32:3)[94,118:2)[118,134:0)[136,138:3)[138,154:1)  0@? 1@? 2@? 3@? -> rx0
active intervals:
inactive intervals:

*** CURRENT ***: %reg1042,0.485714 = [6,38:3)[38,44:2)[44,58:1)[58,76:0)  0@? 1@? 2@? 3@?
        processing active intervals:
        processing inactive intervals:
        allocating current interval: rw8
active intervals:
        %reg1042,0.485714 = [6,38:3)[38,44:2)[44,58:1)[58,76:0)  0@? 1@? 2@? 3@? -> rw8
inactive intervals:

*** CURRENT ***: %reg1033,inf = [22,26:0)  0@?
        processing active intervals:
        processing inactive intervals:
        allocating current interval: no free registers
        assigning stack slot at interval %reg1033,inf = [22,26:0)  0@?:
                register with min weight: rb0 (inf)
                rolling back to: 22
active intervals:
        %reg1042,0.485714 = [6,38:3)[38,44:2)[44,58:1)[58,76:0)  0@? 1@? 2@? 3@? -> rw8
inactive intervals:

*** CURRENT ***: %reg1033,inf = [22,26:0)  0@?
        processing active intervals:
        processing inactive intervals:
        allocating current interval: no free registers
        assigning stack slot at interval %reg1033,inf = [22,26:0)  0@?:
                register with min weight: rb0 (inf)
                rolling back to: 22
active intervals:
        %reg1042,0.485714 = [6,38:3)[38,44:2)[44,58:1)[58,76:0)  0@? 1@? 2@? 3@? -> rw8
inactive intervals:

*** CURRENT ***: %reg1033,inf = [22,26:0)  0@?
        processing active intervals:
        processing inactive intervals:
        allocating current interval: no free registers
        assigning stack slot at interval %reg1033,inf = [22,26:0)  0@?:
                register with min weight: rb0 (inf)
                rolling back to: 22
active intervals:
        %reg1042,0.485714 = [6,38:3)[38,44:2)[44,58:1)[58,76:0)  0@? 1@? 2@? 3@? -> rw8
inactive intervals:

and this entry "*** CURRENT ***: %reg1033,inf = [22,26:0)  0@?" loops
for ever.

extern double da[];
double atof(char* s, double a)
{
        while (*s++) {
                a = da[*s];
        }
        return a;
}


; ModuleID = '<stdin>'
target datalayout = "e-p:32:32"
target endian = little
target pointersize = 32
target triple = "i686-pc-linux-gnu"
%da = external global [0 x double] ; <[0 x double]*> [#uses=1]

implementation   ; Functions:

double %atof(sbyte* %s, double %a) {
entry:
        %tmp411 = load sbyte* %s ; <sbyte> [#uses=1]
        %tmp12 = seteq sbyte %tmp411, 0 ; <bool> [#uses=1]
        br bool %tmp12, label %bb7, label %bb

bb: ; preds = %bb, %entry
        %indvar = phi uint [ 0, %entry ], [ %indvar.next, %bb ] ; <uint> [#uses=2]
        %s.pn.rec = bitcast uint %indvar to int ; <int> [#uses=1]
        %tmp6.0.rec = add int %s.pn.rec, 1 ; <int> [#uses=1]
        %tmp6.0 = getelementptr sbyte* %s, int %tmp6.0.rec ; <sbyte*> [#uses=1]
        %tmp4 = load sbyte* %tmp6.0 ; <sbyte> [#uses=2]
        %tmp = seteq sbyte %tmp4, 0 ; <bool> [#uses=1]
        %indvar.next = add uint %indvar, 1 ; <uint> [#uses=1]
        br bool %tmp, label %bb7.loopexit, label %bb

bb7.loopexit: ; preds = %bb
        %tmp = sext sbyte %tmp4 to int ; <int> [#uses=1]
        %tmp1 = getelementptr [0 x double]* %da, int 0, int %tmp ; <double*> [#uses=1]
        %tmp = load double* %tmp1 ; <double> [#uses=1]
        ret double %tmp

bb7: ; preds = %entry
        ret double %a
}

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Possible bug in the linear scan register allocator

Chris Lattner
On Thu, 21 Dec 2006, Roman Levenstein wrote:

> following:
> 1) some of the fixed registers intervals are merged with some virtual
> registers intervals
> 2) later there is a need to spill one of the allocated registers, but
> since all joined intervals are FIXED intervals now due to (1), they
> cannot be spilled. Therefore, the register allocator loops for ever.
>
> I would be grateful, if someone would confirm that this is a bug. And
> of course, it would be very nice if one of the RegAlloc Gurus could fix
> it ;)

This is likely a bug, probably PR711.

Unfortunately, this isn't super easy to fix, I don't have plans to do so
in the near future...

-Chris

> --- Evan Cheng <[hidden email]> wrote:
>>
>> On Dec 20, 2006, at 2:06 PM, Roman Levenstein wrote:
>>
>>>>
>>>> This will probably require a slightly more extensive patch to
>>>> legalizer. The current mechanism assumes either 1->1 or 1->2
>>>> expansion.
>>>
>>> Exactly. This is what I meant with "more chellenging";) It is
>> assumed
>>> at several places that 1->1 or 2->2 expanstions are taking place. A
>>> generic case is not handled yet.
>>>
>>>> It also assumes the result of expansion are of legal
>>>> types.
>>>
>>> Yes. And this is also a reason why it is not too obvious how to
>> handle
>>> f32->f64 promotion and later f64->i64 expansion on targets that
>>> support
>>> only f64 soft-floats.
>>>
>>> Chris Lattner wrote:
>>>> That would be recursively expanded to i64, then to 2x i32 if
>> needed.
>>>
>>> I tried to set getTypeToTransformTo(f32) to return f64, even when
>> f64
>>> is illegal type. But this recursive expansion does not take place
>> with
>>> the current legalizer implementation. Currently, it is assumed that
>>
>>> the
>>> result of  getTypeToTransformTo() is a legal type. For example,
>>> CreateRegForValue tries to create a register of such a promoted
>> type
>>> and fails in the above mentioned case.
>>
>> All of the issues can be solved by adding the logic to recursively
>> expand operands. They shouldn't be too complicated.
>>
>>>
>>>
>>> Evan wrote:
>>>> That means, you will have to either 1) modify ExpandOp() to
>>>> handle cases which need to be recursively expanded or 2) modify it
>> to
>>>
>>>> return a vector of SDOperand's. Solution one is what I would
>> pursue.
>>>
>>> Agreed. I also feel that some sort of recursive expansion is
>> required.
>>>
>>> I also have a feeling that getTypeToTransformTo(MVT::ValueType)
>> should
>>> probably also recurse until it finds a type T where
>>> getTypeToTransformTo(T) = T, i.e. it finds a legal type. This would
>>> almost solve the issue with f32->f64 promotion where both FP types
>> are
>>> illegal. The only concern here is that in this case
>>> getTypeToTransformTo(MVT::f32) would return MVT::i64 and therefore
>> the
>>> information about the fact that it should first be promoted to f64
>> is
>>> lost. The problem is that getTypeToTransformTo() is used for two
>>> "different" goals: to tell which type to use for register mapping
>> and
>>> to tell which type to use for promotions/expansions for the sake of
>>> "type system correctness". May be it would even make sense to have
>> two
>>> different mappings because of this? One mapping will be used for
>>> allocation of virtual registers and the like and would always
>> return a
>>> legal type and the other will be used just as getTypeToTransformTo
>>> () in
>>> LegalizeOp(), ExpandOp() and PromoteOp() and can return also
>> illegal
>>> types?
>>
>> No need to change getTypeToTransformTo(). There is a
>> getTypeToExpandTo
>> () that is expand the type recursively until it find a legal type.
>>
>>>
>>>> It's not done simply because there isn't a need for it right now.
>> :-)
>>>
>>> Since I have this need, I'll try to find a solution for this issue
>> and
>>> to provide a patch.
>>
>> Great! There are a few spots where ExpandOp() are called recursively.
>>
>> It would be nice to remove those and use the general expansion
>> facility instead.
>>
>> Evan
>>
>>>
>>> -Roman
>>>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

-Chris

--
http://nondot.org/sabre/
http://llvm.org/
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Possible bug in the linear scan register allocator

Leo Romanoff

--- Chris Lattner <[hidden email]> wrote:

> On Thu, 21 Dec 2006, Roman Levenstein wrote:
> > following:
> > 1) some of the fixed registers intervals are merged with some
> virtual
> > registers intervals
> > 2) later there is a need to spill one of the allocated registers,
> but
> > since all joined intervals are FIXED intervals now due to (1), they
> > cannot be spilled. Therefore, the register allocator loops for
> ever.
> >
> > I would be grateful, if someone would confirm that this is a bug.
> And
> > of course, it would be very nice if one of the RegAlloc Gurus could
> fix
> > it ;)
>
> This is likely a bug, probably PR711.
>
> Unfortunately, this isn't super easy to fix, I don't have plans to do
> so in the near future...

OK. I looked at the PR711 at http://llvm.org/bugs/show_bug.cgi?id=711
Indeed, it sounds as the same bug.

Two questions:
1) At least, it would be better if LLVM would crash on an assertion
instead of running for ever in such situations. I think this can be
easily detected, since this is a case where nothing could be spilled.

2) You write in PR711:
>This is due to the coallescer coallescing virtregs with both EAX and
>EDX, which makes them unavailable to satisfy spills, causing the RA to
>run out of registers.  We want to coallesce physregs when possible,
>but we cannot pin them in the spiller:
>we have to be able to >uncoallesce them.

First of all, I totally agree with "we have to be able to uncoallesce
them". Linear Scan already does a backtracking in some situations. I'm
wondering, if it is very difficult to implement the logic,  where when
it detects that it runs for ever (i.e. nothing can be spilled for some
reason) it basically backtracks completely, does coallescing again but
ignoring some selected virtual/physical registers (or any attempts to
coalesce with physregs) and tries to allocate the whole function again?
Alternatively, may be it would be possible to rerun the linear-scan
pass without interval joining on a given function? This will probably
produce a worse code, but it is better then crashing or looping for
ever.

> this isn't super easy to fix
OK. Do you see any further problems that I have not mentioned above?
Can you elaborate a bit?

-Roman


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Possible bug in the linear scan register allocator

Chris Lattner
On Fri, 22 Dec 2006, Roman Levenstein wrote:

>> This is likely a bug, probably PR711.
>>
>> Unfortunately, this isn't super easy to fix, I don't have plans to do
>> so in the near future...
>
> OK. I looked at the PR711 at http://llvm.org/bugs/show_bug.cgi?id=711
> Indeed, it sounds as the same bug.
>
> Two questions:
> 1) At least, it would be better if LLVM would crash on an assertion
> instead of running for ever in such situations. I think this can be
> easily detected, since this is a case where nothing could be spilled.

Yes, this is certainly desirable.

> 2) You write in PR711:
>> This is due to the coallescer coallescing virtregs with both EAX and
>> EDX, which makes them unavailable to satisfy spills, causing the RA to
>> run out of registers.  We want to coallesce physregs when possible,
>> but we cannot pin them in the spiller:
>> we have to be able to >uncoallesce them.
>
> First of all, I totally agree with "we have to be able to uncoallesce
> them". Linear Scan already does a backtracking in some situations. I'm
> wondering, if it is very difficult to implement the logic,  where when
> it detects that it runs for ever (i.e. nothing can be spilled for some
> reason) it basically backtracks completely, does coallescing again but
> ignoring some selected virtual/physical registers (or any attempts to
> coalesce with physregs) and tries to allocate the whole function again?
> Alternatively, may be it would be possible to rerun the linear-scan
> pass without interval joining on a given function? This will probably
> produce a worse code, but it is better then crashing or looping for
> ever.

Without being able to detect this situation, we can't recover from it.
Because coallescing and linscan are two different passes, we can't really
undo both of them.  Once coallsecing is done, it's done.

>> this isn't super easy to fi
> OK. Do you see any further problems that I have not mentioned above?
> Can you elaborate a bit?

Fixing this properly requires changing the way we represent coallesced
registers and how the RA acts on it.  In particular, right now, when we
coallesce a  physreg with a virtreg, we completely lose information about
what the code looked like before the coallescing occurred.  Instead of
treating coallescing like this, it would be better to treat coallescing
with a physreg as a hint that we would prefer the virtreg to be allocated
to the physreg, not as a command.

-Chris

--
http://nondot.org/sabre/
http://llvm.org/
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev