[llvm-dev] Suboptimal code generated by clang+llc in quite a common scenario (?)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Suboptimal code generated by clang+llc in quite a common scenario (?)

Rajesh S R via llvm-dev
I found a something that I quite not understand when compiling a common piece of code using the -Os flags. 
I found it while testing my own backend but then I got deeper and found that at least the x86 is affected as well. This is the referred code:

char pp[3];
char *scscx = pp;
int tst( char i, char j, char k )
{
  scscx[0] = i;
  scscx[1] = j; 
  scscx[2] = k;
  return 0;
}

The above gets compiled for the x86 architecture like this:

; Function Attrs: nofree norecurse nounwind optsize uwtable
define i32 @tst(i8 signext %i, i8 signext %j, i8 signext %k) local_unnamed_addr #1 {
entry:
  %0 = load i8*, i8** @scscx, align 8, !tbaa !11
  store i8 %i, i8* %0, align 1, !tbaa !13
  %1 = load i8*, i8** @scscx, align 8, !tbaa !11
  %arrayidx1 = getelementptr inbounds i8, i8* %1, i64 1
  store i8 %j, i8* %arrayidx1, align 1, !tbaa !13
  %2 = load i8*, i8** @scscx, align 8, !tbaa !11
  %arrayidx2 = getelementptr inbounds i8, i8* %2, i64 2
  store i8 %k, i8* %arrayidx2, align 1, !tbaa !13
  ret i32 0
}

According to that, the variable ‘scscx’ is loaded three times despite it’s never modified. The resulting assembly code is this:

.globl _tst
_tst:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset %ebp, -8
movl %esp, %ebp
.cfi_def_cfa_register %ebp
pushl %esi
.cfi_offset %esi, -12
movb 16(%ebp), %al
movb 12(%ebp), %cl
movb 8(%ebp), %dl
movl _scscx, %esi
movb %dl, (%esi)
movl _scscx, %edx
movb %cl, 1(%edx)
movl _scscx, %ecx
movb %al, 2(%ecx)
xorl %eax, %eax
popl %esi
popl %ebp
retl
.cfi_endproc

.comm _pp,3,0
.section __DATA,__data
.globl _scscx
.p2align 3
_scscx:
.long _pp


Again, the _scscx is loaded three times instead of reusing a register, which is suboptimal.


NOW, if I replace the original code by this:

int pp[3];
int *scscx = pp;
int tst( int i, int j, int k )
{
  scscx[0] = i;
  scscx[1] = j; 
  scscx[2] = k;
  return 0;
}

I get the following:


; Function Attrs: nofree norecurse nounwind optsize uwtable
define i32 @tst(i32 %i, i32 %j, i32 %k) local_unnamed_addr #1 {
entry:
  %0 = load i32*, i32** @scscx, align 8, !tbaa !11
  store i32 %i, i32* %0, align 4, !tbaa !13
  %arrayidx1 = getelementptr inbounds i32, i32* %0, i64 1
  store i32 %j, i32* %arrayidx1, align 4, !tbaa !13
  %arrayidx2 = getelementptr inbounds i32, i32* %0, i64 2
  store i32 %k, i32* %arrayidx2, align 4, !tbaa !13
  ret i32 0
}


.globl _tst
_tst:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset %ebp, -8
movl %esp, %ebp
.cfi_def_cfa_register %ebp
pushl %esi
.cfi_offset %esi, -12
movl 16(%ebp), %eax
movl 12(%ebp), %ecx
movl 8(%ebp), %edx
movl _scscx, %esi
movl %edx, (%esi)
movl %ecx, 4(%esi)
movl %eax, 8(%esi)
xorl %eax, %eax
popl %esi
popl %ebp
retl
.cfi_endproc

.comm _pp,12,2
.section __DATA,__data
.globl _scscx
.p2align 3
_scscx:
.long _pp


In this case the compiler optimises the load of _scscx into a register and reuses its value instead of loading the variable multiple times. This results in a cleaner and more optimal code, specially when compared with the first case.

I would like to understand why this happens, and whether there’s a way (or workaround) to improve it?

Should I file a bug report for that?

Thanks.

Joan









_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Suboptimal code generated by clang+llc in quite a common scenario (?)

Rajesh S R via llvm-dev
Hi,

char* scscx is an universal pointer and may point to anything,
including itself. That is, scscx might point to itself:

scscx = (char*)&scscx;

such that

scscx[0] = ...

changes the address scscx point to. A pointer to (int*) in contrast is
only allowed to point to integers in memory, it is not an universal
pointer. In particular, when accessing it the compiler can assume that
it is not aliasing with something that is of type char*.

For more details, see e.g. Wikipedia [1] or Stackoverflow [2]

[1] https://en.wikipedia.org/wiki/Pointer_aliasing#Aliasing_and_re-ordering
[2] https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule

Michael


Am Do., 8. Aug. 2019 um 10:19 Uhr schrieb Joan Lluch via llvm-dev
<[hidden email]>:

>
> I found a something that I quite not understand when compiling a common piece of code using the -Os flags.
> I found it while testing my own backend but then I got deeper and found that at least the x86 is affected as well. This is the referred code:
>
> char pp[3];
> char *scscx = pp;
> int tst( char i, char j, char k )
> {
>   scscx[0] = i;
>   scscx[1] = j;
>   scscx[2] = k;
>   return 0;
> }
>
> The above gets compiled for the x86 architecture like this:
>
> ; Function Attrs: nofree norecurse nounwind optsize uwtable
> define i32 @tst(i8 signext %i, i8 signext %j, i8 signext %k) local_unnamed_addr #1 {
> entry:
>   %0 = load i8*, i8** @scscx, align 8, !tbaa !11
>   store i8 %i, i8* %0, align 1, !tbaa !13
>   %1 = load i8*, i8** @scscx, align 8, !tbaa !11
>   %arrayidx1 = getelementptr inbounds i8, i8* %1, i64 1
>   store i8 %j, i8* %arrayidx1, align 1, !tbaa !13
>   %2 = load i8*, i8** @scscx, align 8, !tbaa !11
>   %arrayidx2 = getelementptr inbounds i8, i8* %2, i64 2
>   store i8 %k, i8* %arrayidx2, align 1, !tbaa !13
>   ret i32 0
> }
>
> According to that, the variable ‘scscx’ is loaded three times despite it’s never modified. The resulting assembly code is this:
>
> .globl _tst
> _tst:
> .cfi_startproc
> pushl %ebp
> .cfi_def_cfa_offset 8
> .cfi_offset %ebp, -8
> movl %esp, %ebp
> .cfi_def_cfa_register %ebp
> pushl %esi
> .cfi_offset %esi, -12
> movb 16(%ebp), %al
> movb 12(%ebp), %cl
> movb 8(%ebp), %dl
> movl _scscx, %esi
> movb %dl, (%esi)
> movl _scscx, %edx
> movb %cl, 1(%edx)
> movl _scscx, %ecx
> movb %al, 2(%ecx)
> xorl %eax, %eax
> popl %esi
> popl %ebp
> retl
> .cfi_endproc
>
> .comm _pp,3,0
> .section __DATA,__data
> .globl _scscx
> .p2align 3
> _scscx:
> .long _pp
>
>
> Again, the _scscx is loaded three times instead of reusing a register, which is suboptimal.
>
>
> NOW, if I replace the original code by this:
>
> int pp[3];
> int *scscx = pp;
> int tst( int i, int j, int k )
> {
>   scscx[0] = i;
>   scscx[1] = j;
>   scscx[2] = k;
>   return 0;
> }
>
> I get the following:
>
>
> ; Function Attrs: nofree norecurse nounwind optsize uwtable
> define i32 @tst(i32 %i, i32 %j, i32 %k) local_unnamed_addr #1 {
> entry:
>   %0 = load i32*, i32** @scscx, align 8, !tbaa !11
>   store i32 %i, i32* %0, align 4, !tbaa !13
>   %arrayidx1 = getelementptr inbounds i32, i32* %0, i64 1
>   store i32 %j, i32* %arrayidx1, align 4, !tbaa !13
>   %arrayidx2 = getelementptr inbounds i32, i32* %0, i64 2
>   store i32 %k, i32* %arrayidx2, align 4, !tbaa !13
>   ret i32 0
> }
>
>
> .globl _tst
> _tst:
> .cfi_startproc
> pushl %ebp
> .cfi_def_cfa_offset 8
> .cfi_offset %ebp, -8
> movl %esp, %ebp
> .cfi_def_cfa_register %ebp
> pushl %esi
> .cfi_offset %esi, -12
> movl 16(%ebp), %eax
> movl 12(%ebp), %ecx
> movl 8(%ebp), %edx
> movl _scscx, %esi
> movl %edx, (%esi)
> movl %ecx, 4(%esi)
> movl %eax, 8(%esi)
> xorl %eax, %eax
> popl %esi
> popl %ebp
> retl
> .cfi_endproc
>
> .comm _pp,12,2
> .section __DATA,__data
> .globl _scscx
> .p2align 3
> _scscx:
> .long _pp
>
>
> In this case the compiler optimises the load of _scscx into a register and reuses its value instead of loading the variable multiple times. This results in a cleaner and more optimal code, specially when compared with the first case.
>
> I would like to understand why this happens, and whether there’s a way (or workaround) to improve it?
>
> Should I file a bug report for that?
>
> Thanks.
>
> Joan
>
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Suboptimal code generated by clang+llc in quite a common scenario (?)

Rajesh S R via llvm-dev
This might not be the workaround you want because it is only available in C, but you can use restrict to allow such optimizations.


Alex


On Thu, Aug 8, 2019 at 11:50 AM Michael Kruse via llvm-dev <[hidden email]> wrote:
Hi,

char* scscx is an universal pointer and may point to anything,
including itself. That is, scscx might point to itself:

scscx = (char*)&scscx;

such that

scscx[0] = ...

changes the address scscx point to. A pointer to (int*) in contrast is
only allowed to point to integers in memory, it is not an universal
pointer. In particular, when accessing it the compiler can assume that
it is not aliasing with something that is of type char*.

For more details, see e.g. Wikipedia [1] or Stackoverflow [2]

[1] https://en.wikipedia.org/wiki/Pointer_aliasing#Aliasing_and_re-ordering
[2] https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule

Michael


Am Do., 8. Aug. 2019 um 10:19 Uhr schrieb Joan Lluch via llvm-dev
<[hidden email]>:
>
> I found a something that I quite not understand when compiling a common piece of code using the -Os flags.
> I found it while testing my own backend but then I got deeper and found that at least the x86 is affected as well. This is the referred code:
>
> char pp[3];
> char *scscx = pp;
> int tst( char i, char j, char k )
> {
>   scscx[0] = i;
>   scscx[1] = j;
>   scscx[2] = k;
>   return 0;
> }
>
> The above gets compiled for the x86 architecture like this:
>
> ; Function Attrs: nofree norecurse nounwind optsize uwtable
> define i32 @tst(i8 signext %i, i8 signext %j, i8 signext %k) local_unnamed_addr #1 {
> entry:
>   %0 = load i8*, i8** @scscx, align 8, !tbaa !11
>   store i8 %i, i8* %0, align 1, !tbaa !13
>   %1 = load i8*, i8** @scscx, align 8, !tbaa !11
>   %arrayidx1 = getelementptr inbounds i8, i8* %1, i64 1
>   store i8 %j, i8* %arrayidx1, align 1, !tbaa !13
>   %2 = load i8*, i8** @scscx, align 8, !tbaa !11
>   %arrayidx2 = getelementptr inbounds i8, i8* %2, i64 2
>   store i8 %k, i8* %arrayidx2, align 1, !tbaa !13
>   ret i32 0
> }
>
> According to that, the variable ‘scscx’ is loaded three times despite it’s never modified. The resulting assembly code is this:
>
> .globl _tst
> _tst:
> .cfi_startproc
> pushl %ebp
> .cfi_def_cfa_offset 8
> .cfi_offset %ebp, -8
> movl %esp, %ebp
> .cfi_def_cfa_register %ebp
> pushl %esi
> .cfi_offset %esi, -12
> movb 16(%ebp), %al
> movb 12(%ebp), %cl
> movb 8(%ebp), %dl
> movl _scscx, %esi
> movb %dl, (%esi)
> movl _scscx, %edx
> movb %cl, 1(%edx)
> movl _scscx, %ecx
> movb %al, 2(%ecx)
> xorl %eax, %eax
> popl %esi
> popl %ebp
> retl
> .cfi_endproc
>
> .comm _pp,3,0
> .section __DATA,__data
> .globl _scscx
> .p2align 3
> _scscx:
> .long _pp
>
>
> Again, the _scscx is loaded three times instead of reusing a register, which is suboptimal.
>
>
> NOW, if I replace the original code by this:
>
> int pp[3];
> int *scscx = pp;
> int tst( int i, int j, int k )
> {
>   scscx[0] = i;
>   scscx[1] = j;
>   scscx[2] = k;
>   return 0;
> }
>
> I get the following:
>
>
> ; Function Attrs: nofree norecurse nounwind optsize uwtable
> define i32 @tst(i32 %i, i32 %j, i32 %k) local_unnamed_addr #1 {
> entry:
>   %0 = load i32*, i32** @scscx, align 8, !tbaa !11
>   store i32 %i, i32* %0, align 4, !tbaa !13
>   %arrayidx1 = getelementptr inbounds i32, i32* %0, i64 1
>   store i32 %j, i32* %arrayidx1, align 4, !tbaa !13
>   %arrayidx2 = getelementptr inbounds i32, i32* %0, i64 2
>   store i32 %k, i32* %arrayidx2, align 4, !tbaa !13
>   ret i32 0
> }
>
>
> .globl _tst
> _tst:
> .cfi_startproc
> pushl %ebp
> .cfi_def_cfa_offset 8
> .cfi_offset %ebp, -8
> movl %esp, %ebp
> .cfi_def_cfa_register %ebp
> pushl %esi
> .cfi_offset %esi, -12
> movl 16(%ebp), %eax
> movl 12(%ebp), %ecx
> movl 8(%ebp), %edx
> movl _scscx, %esi
> movl %edx, (%esi)
> movl %ecx, 4(%esi)
> movl %eax, 8(%esi)
> xorl %eax, %eax
> popl %esi
> popl %ebp
> retl
> .cfi_endproc
>
> .comm _pp,12,2
> .section __DATA,__data
> .globl _scscx
> .p2align 3
> _scscx:
> .long _pp
>
>
> In this case the compiler optimises the load of _scscx into a register and reuses its value instead of loading the variable multiple times. This results in a cleaner and more optimal code, specially when compared with the first case.
>
> I would like to understand why this happens, and whether there’s a way (or workaround) to improve it?
>
> Should I file a bug report for that?
>
> Thanks.
>
> Joan
>
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Suboptimal code generated by clang+llc in quite a common scenario (?)

Rajesh S R via llvm-dev
On Thu, 8 Aug 2019 at 17:08, Alex Brachet-Mialot via llvm-dev
<[hidden email]> wrote:
> This might not be the workaround you want because it is only available in C, but you can use restrict to allow such optimizations.

It also doesn't work in Clang unfortunately. We can only represent
restrict on function arguments at the moment.

Cheers.

Tim.
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Suboptimal code generated by clang+llc in quite a common scenario (?)

Rajesh S R via llvm-dev
Hi Tim and Alex

Thanks for your replies.

So just to make it clear for me: does this imply that there’s indeed no way on the current version to tell the compiler or Clang to optimize this?

Thanks,

Joan


> On 8 Aug 2019, at 18:30, Tim Northover via llvm-dev <[hidden email]> wrote:
>
> On Thu, 8 Aug 2019 at 17:08, Alex Brachet-Mialot via llvm-dev
> <[hidden email]> wrote:
>> This might not be the workaround you want because it is only available in C, but you can use restrict to allow such optimizations.
>
> It also doesn't work in Clang unfortunately. We can only represent
> restrict on function arguments at the moment.
>
> Cheers.
>
> Tim.
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Suboptimal code generated by clang+llc in quite a common scenario (?)

Rajesh S R via llvm-dev
You can add __builtin_assume(scscx == pp); if that's your invariant -- this causes the loads to be gone using the clang from trunk.

On Thu, Aug 8, 2019 at 1:01 PM Joan Lluch via llvm-dev <[hidden email]> wrote:
Hi Tim and Alex

Thanks for your replies.

So just to make it clear for me: does this imply that there’s indeed no way on the current version to tell the compiler or Clang to optimize this?

Thanks,

Joan


> On 8 Aug 2019, at 18:30, Tim Northover via llvm-dev <[hidden email]> wrote:
>
> On Thu, 8 Aug 2019 at 17:08, Alex Brachet-Mialot via llvm-dev
> <[hidden email]> wrote:
>> This might not be the workaround you want because it is only available in C, but you can use restrict to allow such optimizations.
>
> It also doesn't work in Clang unfortunately. We can only represent
> restrict on function arguments at the moment.
>
> Cheers.
>
> Tim.
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


--
Jakub Kuderski

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Suboptimal code generated by clang+llc in quite a common scenario (?)

Rajesh S R via llvm-dev

Making the pointer const appears to be enough too.

 

--

Danila

 

From: llvm-dev [mailto:[hidden email]] On Behalf Of Jakub (Kuba) Kuderski via llvm-dev
Sent: Thursday, August 8, 2019 20:23
To: Joan Lluch <[hidden email]>
Cc: LLVM Developers Mailing List <[hidden email]>
Subject: Re: [llvm-dev] Suboptimal code generated by clang+llc in quite a common scenario (?)

 

You can add __builtin_assume(scscx == pp); if that's your invariant -- this causes the loads to be gone using the clang from trunk.

 

On Thu, Aug 8, 2019 at 1:01 PM Joan Lluch via llvm-dev <[hidden email]> wrote:

Hi Tim and Alex

Thanks for your replies.

So just to make it clear for me: does this imply that there’s indeed no way on the current version to tell the compiler or Clang to optimize this?

Thanks,

Joan


> On 8 Aug 2019, at 18:30, Tim Northover via llvm-dev <[hidden email]> wrote:
>
> On Thu, 8 Aug 2019 at 17:08, Alex Brachet-Mialot via llvm-dev
> <[hidden email]> wrote:
>> This might not be the workaround you want because it is only available in C, but you can use restrict to allow such optimizations.
>
> It also doesn't work in Clang unfortunately. We can only represent
> restrict on function arguments at the moment.
>
> Cheers.
>
> Tim.
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


 

--

Jakub Kuderski


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Spills on second bank of registers

Rajesh S R via llvm-dev
In reply to this post by Rajesh S R via llvm-dev
Hi Tim,

I wonder if you could help me with the following, even if just giving some pointers about where to look. I previously posted a similar question in the mailing list, but unfortunately I have not received a reply. This is the subject:

I want to reduce the number of register spills to the stack that are created around storeRegToStackSlot and loadRegFromStackSlot 

In order to do so, I can use free registers from a second set of registers. The idea is that these registers would be used as spills, instead of stack slots. These registers may be free when they do not intervene in any instructions on a given function. 

I assumed that, by default, LLVM would figure out that there are free registers (albeit from a different register class) and would use them as temporaries instead of creating stack spills of the regular register set. However this is not the case. The stack spills are created anyway despite being registers unused on the second register bank. 

I am unsure about why this happens or how to correct/implement this, or where to look. Any pointers would be appreciated.

Thanks,

Joan

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Spills on second bank of registers

Rajesh S R via llvm-dev
Hi Joan,

On Tue, 20 Aug 2019 at 08:29, Joan Lluch <[hidden email]> wrote:
> I assumed that, by default, LLVM would figure out that there are free registers (albeit from a different register class) and would use them as temporaries instead of creating stack spills of the regular register set. However this is not the case. The stack spills are created anyway despite being registers unused on the second register bank.

As far as I know LLVM doesn't have this ability. The ARM CPUs that can
only run Thumb1 code (e.g. Cortex-M0) could theoretically benefit from
such a scheme too, but no-one has so far thought it worth
implementing.

I'm afraid I've not really looked into the details of how LLVM
allocates stack slots, but that would be the obvious place to add the
ability generically. It would have to be a new kind of slot that's
(probably) volatile across function calls, unlike normal ones.

Alternatively, you could always try a MachineFunctionPass to peephole it.

Cheers.

Tim.
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Spills on second bank of registers

Rajesh S R via llvm-dev

On 8/21/19 4:02 AM, Tim Northover via llvm-dev wrote:
> Hi Joan,
>
> On Tue, 20 Aug 2019 at 08:29, Joan Lluch <[hidden email]> wrote:
>> I assumed that, by default, LLVM would figure out that there are free registers (albeit from a different register class) and would use them as temporaries instead of creating stack spills of the regular register set. However this is not the case. The stack spills are created anyway despite being registers unused on the second register bank.
> As far as I know LLVM doesn't have this ability.


I don't think that's quite right, however, you need to have a register
class containing both relevant classes and you need to implement
*RegisterInfo::getLargestLegalSuperClass. An example might be to look at
PowerPC's SPILLTOVSRRC register class:

// Allow spilling GPR's into caller-saved VSR's.
def SPILLTOVSRRC : RegisterClass<"PPC", [i64, f64], 64, (add G8RC, (sub
VSFRC,
                                 (sequence "VF%u", 31, 20),
                                 (sequence "F%u", 31, 14)))>;

which is referenced in PPCRegisterInfo::getLargestLegalSuperClass, and
in general, I think that if you grep for SPILLTOVSRR in
lib/Target/PowerPC you'll see everything.

  -Hal


>   The ARM CPUs that can
> only run Thumb1 code (e.g. Cortex-M0) could theoretically benefit from
> such a scheme too, but no-one has so far thought it worth
> implementing.
>
> I'm afraid I've not really looked into the details of how LLVM
> allocates stack slots, but that would be the obvious place to add the
> ability generically. It would have to be a new kind of slot that's
> (probably) volatile across function calls, unlike normal ones.
>
> Alternatively, you could always try a MachineFunctionPass to peephole it.
>
> Cheers.
>
> Tim.
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev