[llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
Hi all,

The MachineOutliner has come a long way since the original incarnation presented at the 2016 LLVM Developer's Meeting [1]. In particular, we've been pushing a lot on the AArch64 target for the MachineOutliner. It's mature enough at this point that we'd like to take things a step further and turn it on by default in AArch64 under -Oz. Since the primary goal of -Oz is "make it as small as possible", the outliner is a good addition to the -Oz pass pipeline.

For a detailed description of the MachineOutliner, see the original RFC. [2].

We've observed, comparing -Oz to -Oz + outlining on the latest trunk compiler,

* A geomean ~4.4% text size reduction of the CTMark tests (min = 0.3% on tramp3d-v4, max = 15.4% on kc)

* A geomean compile-time overhead of ~1.1% (min = 0.2% on 7zip, max = 2.2% on sqlite3)

We perform regular testing to ensure the outliner produces correct AArch64 code at -Oz. Tests include the LLVM test suite and standard external test suites such as SPEC. All tests compile and execute. We've also been making sure that the outliner produces debuggable code. Users are still guaranteed to have sane backtraces in the presence of outlined functions.

Added exposure to various programs would help the outlining algorithm mature further. This, in turn, will help the overall outlining project. For example, there have been a few discussions on implementing an IR-level outlining pass [3, 4]. Ultimately, the goal is to create a shared outlining interface. This interface would allow the outliner to exist at any level of representation [4]. The general outlining algorithm will be part of the shared interface. Thus, in the spirit of incremental improvement, it makes sense to begin "stress-testing" it sooner than later.

There are a few patches necessary to facilitate this. They are available in the patches section of this email. I’ll summarize what they do here for the sake of discussion though.

The first patch is one that teaches the backend about size optimization levels. This is comparable to what's done in the inliner. Today, the only way to tell if something is optimizing for size is by looking at function attributes. This is fine for function passes, but insufficient for module passes like the MachineOutliner. The function attribute approach forces the outliner to iterate over every function in the module before deciding to take action. If -Oz isn't passed in, then the outliner will not find any functions worth outlining from. This would incur unnecessary compile-time overhead. Thus, we decided the best course of action is to teach the backend about size options.

The second patch teaches llc to handle -Oz and -Os.

The third patch teaches targets about the outliner. A target will be able to specify if, and when it wants outlining on by default. It also adds a flag to disable the MachineOutliner for users that don’t want outlining behaviour when it is enabled by default.

The final patch teaches clang to pass the new size information down along to the backend. This allows us to do things like, clang -Oz … foo.c and have the outliner run.

Thanks for taking the time to read this!
Jessica

*** Patches ***

1. Teaching the backend about -Oz/-Os: https://reviews.llvm.org/D45914
2. Teach llc about -Oz/-Os: https://reviews.llvm.org/D45915
3. Teaching the target about the outliner and enabling it by default under AArch64: https://reviews.llvm.org/D45916
4. Teaching clang to pass -Oz/-Os down to the backend: https://reviews.llvm.org/D45917


*** References ***
[1] Reducing Code Size Using Outlining (https://www.youtube.com/watch?v=yorld-WSOeU)

[2] Original RFC (http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html)

[3] [RFC] Add IR level interprocedural outliner for code size. (http://lists.llvm.org/pipermail/llvm-dev/2017-July/115666.html)

[4] [RFC] PT.2 Add IR level interprocedural outliner for code size. (http://lists.llvm.org/pipermail/llvm-dev/2017-September/117153.html)

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
Teaching the back end about size optimization sounds great, even without the exciting work on MachineOutliner. It would strip some nasty hacks from an out of tree back end that cares about code size :)

Thank you


The first patch is one that teaches the backend about size optimization levels. This is comparable to what's done in the inliner. Today, the only way to tell if something is optimizing for size is by looking at function attributes. This is fine for function passes, but insufficient for module passes like the MachineOutliner. The function attribute approach forces the outliner to iterate over every function in the module before deciding to take action. If -Oz isn't passed in, then the outliner will not find any functions worth outlining from. This would incur unnecessary compile-time overhead. Thus, we decided the best course of action is to teach the backend about size options.

The second patch teaches llc to handle -Oz and -Os.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
In reply to this post by Dean Michael Berris via llvm-dev
On 4/20/2018 7:06 PM, Jessica Paquette via llvm-dev wrote:
We perform regular testing to ensure the outliner produces correct AArch64 code at -Oz. Tests include the LLVM test suite and standard external test suites such as SPEC. All tests compile and execute. We've also been making sure that the outliner produces debuggable code. Users are still guaranteed to have sane backtraces in the presence of outlined functions.

Added exposure to various programs would help the outlining algorithm mature further. This, in turn, will help the overall outlining project. For example, there have been a few discussions on implementing an IR-level outlining pass [3, 4]. Ultimately, the goal is to create a shared outlining interface. This interface would allow the outliner to exist at any level of representation [4]. The general outlining algorithm will be part of the shared interface. Thus, in the spirit of incremental improvement, it makes sense to begin "stress-testing" it sooner than later.

I just tried some tests, and I'm seeing a bunch of failures on SPEC at -O3; looks like mostly crashes at runtime.   I can try to reduce a testcase if you need it.


There are a few patches necessary to facilitate this. They are available in the patches section of this email. I’ll summarize what they do here for the sake of discussion though.

The first patch is one that teaches the backend about size optimization levels. This is comparable to what's done in the inliner. Today, the only way to tell if something is optimizing for size is by looking at function attributes. This is fine for function passes, but insufficient for module passes like the MachineOutliner. The function attribute approach forces the outliner to iterate over every function in the module before deciding to take action. If -Oz isn't passed in, then the outliner will not find any functions worth outlining from. This would incur unnecessary compile-time overhead. Thus, we decided the best course of action is to teach the backend about size options.

I don't think this is really the right approach.  With LTO, you can have a mix of functions, some of which are minsize, and some of which are not.  Or with profile info, we might want to outline only cold code (I guess this isn't implemented yet, but potentially future work).  Tying whether we run the outliner to a command-line flag restricts the possible uses; either the entire module gets outlining, or none of it does.

In general, we've been moving away from global settings so we can optimize more effectively in this sort of scenario.

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
Hi Eli,

I just tried some tests, and I'm seeing a bunch of failures on SPEC at -O3; looks like mostly crashes at runtime.   I can try to reduce a testcase if you need it.
If you could do that, that would be great. Our testing has been primarily for -Oz and -O2, so I haven’t looked at -O3 at all.

I don't think this is really the right approach.  With LTO, you can have a mix of functions, some of which are minsize, and some of which are not.  Or with profile info, we might want to outline only cold code (I guess this isn't implemented yet, but potentially future work).  Tying whether we run the outliner to a command-line flag restricts the possible uses; either the entire module gets outlining, or none of it does.
I’m worried that walking the entire list of functions in the module when nothing has the minsize attribute would incur unnecessary compile-time overhead. If that’s a reasonable thing to do though, I’m fine with that approach. It’d be a less invasive change, and would give us the desired LTO behaviour for free.

- Jessica


On Apr 23, 2018, at 1:24 PM, Friedman, Eli <[hidden email]> wrote:

On 4/20/2018 7:06 PM, Jessica Paquette via llvm-dev wrote:
We perform regular testing to ensure the outliner produces correct AArch64 code at -Oz. Tests include the LLVM test suite and standard external test suites such as SPEC. All tests compile and execute. We've also been making sure that the outliner produces debuggable code. Users are still guaranteed to have sane backtraces in the presence of outlined functions.

Added exposure to various programs would help the outlining algorithm mature further. This, in turn, will help the overall outlining project. For example, there have been a few discussions on implementing an IR-level outlining pass [3, 4]. Ultimately, the goal is to create a shared outlining interface. This interface would allow the outliner to exist at any level of representation [4]. The general outlining algorithm will be part of the shared interface. Thus, in the spirit of incremental improvement, it makes sense to begin "stress-testing" it sooner than later.

I just tried some tests, and I'm seeing a bunch of failures on SPEC at -O3; looks like mostly crashes at runtime.   I can try to reduce a testcase if you need it.


There are a few patches necessary to facilitate this. They are available in the patches section of this email. I’ll summarize what they do here for the sake of discussion though.

The first patch is one that teaches the backend about size optimization levels. This is comparable to what's done in the inliner. Today, the only way to tell if something is optimizing for size is by looking at function attributes. This is fine for function passes, but insufficient for module passes like the MachineOutliner. The function attribute approach forces the outliner to iterate over every function in the module before deciding to take action. If -Oz isn't passed in, then the outliner will not find any functions worth outlining from. This would incur unnecessary compile-time overhead. Thus, we decided the best course of action is to teach the backend about size options.

I don't think this is really the right approach.  With LTO, you can have a mix of functions, some of which are minsize, and some of which are not.  Or with profile info, we might want to outline only cold code (I guess this isn't implemented yet, but potentially future work).  Tying whether we run the outliner to a command-line flag restricts the possible uses; either the entire module gets outlining, or none of it does.

In general, we've been moving away from global settings so we can optimize more effectively in this sort of scenario.

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
On 4/23/2018 1:41 PM, Jessica Paquette wrote:
> Hi Eli,
>
>> I just tried some tests, and I'm seeing a bunch of failures on SPEC
>> at -O3; looks like mostly crashes at runtime.   I can try to reduce a
>> testcase if you need it.
> If you could do that, that would be great. Our testing has been
> primarily for -Oz and -O2, so I haven’t looked at -O3 at all.

Okay, I'll try to come up with something soon.

>
>> I don't think this is really the right approach.  With LTO, you can
>> have a mix of functions, some of which are minsize, and some of which
>> are not.  Or with profile info, we might want to outline only cold
>> code (I guess this isn't implemented yet, but potentially future
>> work).  Tying whether we run the outliner to a command-line flag
>> restricts the possible uses; either the entire module gets outlining,
>> or none of it does.
> I’m worried that walking the entire list of functions in the module
> when nothing has the minsize attribute would incur unnecessary
> compile-time overhead. If that’s a reasonable thing to do though, I’m
> fine with that approach. It’d be a less invasive change, and would
> give us the desired LTO behaviour for free.

Walking the list of functions is very cheap, relatively speaking; I'm
not concerned about the cost of that.  The cost I'd be concerned about
is the cost of running a ModulePass at that point in the pipeline; IIRC
the last time someone tried it, there were bug reports about memory
usage (see https://bugs.llvm.org/show_bug.cgi?id=36123 .)

-Eli

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
In reply to this post by Dean Michael Berris via llvm-dev
I just ran SPEC at -O3 with the outliner enabled for AArch64 and didn’t get any failures on my end. Which flags did you use? I’m curious about what’s going on here...

I used -O3 -mllvm -enable-machine-outliner -arch arm64.

- Jessica

On Apr 23, 2018, at 1:41 PM, Jessica Paquette <[hidden email]> wrote:

Hi Eli,

I just tried some tests, and I'm seeing a bunch of failures on SPEC at -O3; looks like mostly crashes at runtime.   I can try to reduce a testcase if you need it.
If you could do that, that would be great. Our testing has been primarily for -Oz and -O2, so I haven’t looked at -O3 at all.

I don't think this is really the right approach.  With LTO, you can have a mix of functions, some of which are minsize, and some of which are not.  Or with profile info, we might want to outline only cold code (I guess this isn't implemented yet, but potentially future work).  Tying whether we run the outliner to a command-line flag restricts the possible uses; either the entire module gets outlining, or none of it does.
I’m worried that walking the entire list of functions in the module when nothing has the minsize attribute would incur unnecessary compile-time overhead. If that’s a reasonable thing to do though, I’m fine with that approach. It’d be a less invasive change, and would give us the desired LTO behaviour for free.

- Jessica


On Apr 23, 2018, at 1:24 PM, Friedman, Eli <[hidden email]> wrote:

On 4/20/2018 7:06 PM, Jessica Paquette via llvm-dev wrote:
We perform regular testing to ensure the outliner produces correct AArch64 code at -Oz. Tests include the LLVM test suite and standard external test suites such as SPEC. All tests compile and execute. We've also been making sure that the outliner produces debuggable code. Users are still guaranteed to have sane backtraces in the presence of outlined functions.

Added exposure to various programs would help the outlining algorithm mature further. This, in turn, will help the overall outlining project. For example, there have been a few discussions on implementing an IR-level outlining pass [3, 4]. Ultimately, the goal is to create a shared outlining interface. This interface would allow the outliner to exist at any level of representation [4]. The general outlining algorithm will be part of the shared interface. Thus, in the spirit of incremental improvement, it makes sense to begin "stress-testing" it sooner than later.

I just tried some tests, and I'm seeing a bunch of failures on SPEC at -O3; looks like mostly crashes at runtime.   I can try to reduce a testcase if you need it.


There are a few patches necessary to facilitate this. They are available in the patches section of this email. I’ll summarize what they do here for the sake of discussion though.

The first patch is one that teaches the backend about size optimization levels. This is comparable to what's done in the inliner. Today, the only way to tell if something is optimizing for size is by looking at function attributes. This is fine for function passes, but insufficient for module passes like the MachineOutliner. The function attribute approach forces the outliner to iterate over every function in the module before deciding to take action. If -Oz isn't passed in, then the outliner will not find any functions worth outlining from. This would incur unnecessary compile-time overhead. Thus, we decided the best course of action is to teach the backend about size options.

I don't think this is really the right approach.  With LTO, you can have a mix of functions, some of which are minsize, and some of which are not.  Or with profile info, we might want to outline only cold code (I guess this isn't implemented yet, but potentially future work).  Tying whether we run the outliner to a command-line flag restricts the possible uses; either the entire module gets outlining, or none of it does.

In general, we've been moving away from global settings so we can optimize more effectively in this sort of scenario.

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
Sorry, I was using a modified compiler, which by coincidence made the bug much easier to reproduce.

In some rare cases, the compiler will use x30 as a general-purpose register; in that case, outlining breaks because the "ret" branches to the wrong address.  Testcase (reproduce with "clang -O3 --target=aarch64-pc-linux-gnu -mllvm -enable-machine-outliner"):

extern long g1;
extern long g2;
void foo() {
  register long *x asm("x27") = &g1;
  register long *y asm("x29") = &g1;
  register long *z asm("x30") = &g2;
  asm(""::"r"(x),"r"(y),"r"(z));
}
void foo2() {
  register long *x asm("x27") = &g1;
  register long *y asm("x29") = &g1;
  register long *z asm("x30") = &g2;
  asm(""::"r"(x),"r"(y),"r"(z));
}
void foo3() {
  register long *x asm("x27") = &g1;
  register long *y asm("x29") = &g1;
  register long *z asm("x30") = &g2;
  asm(""::"r"(x),"r"(y),"r"(z));
}

-Eli

On 4/23/2018 2:37 PM, Jessica Paquette wrote:
I just ran SPEC at -O3 with the outliner enabled for AArch64 and didn’t get any failures on my end. Which flags did you use? I’m curious about what’s going on here...

I used -O3 -mllvm -enable-machine-outliner -arch arm64.

- Jessica

On Apr 23, 2018, at 1:41 PM, Jessica Paquette <[hidden email]> wrote:

Hi Eli,

I just tried some tests, and I'm seeing a bunch of failures on SPEC at -O3; looks like mostly crashes at runtime.   I can try to reduce a testcase if you need it.
If you could do that, that would be great. Our testing has been primarily for -Oz and -O2, so I haven’t looked at -O3 at all.

I don't think this is really the right approach.  With LTO, you can have a mix of functions, some of which are minsize, and some of which are not.  Or with profile info, we might want to outline only cold code (I guess this isn't implemented yet, but potentially future work).  Tying whether we run the outliner to a command-line flag restricts the possible uses; either the entire module gets outlining, or none of it does.
I’m worried that walking the entire list of functions in the module when nothing has the minsize attribute would incur unnecessary compile-time overhead. If that’s a reasonable thing to do though, I’m fine with that approach. It’d be a less invasive change, and would give us the desired LTO behaviour for free.

- Jessica


On Apr 23, 2018, at 1:24 PM, Friedman, Eli <[hidden email]> wrote:

On 4/20/2018 7:06 PM, Jessica Paquette via llvm-dev wrote:
We perform regular testing to ensure the outliner produces correct AArch64 code at -Oz. Tests include the LLVM test suite and standard external test suites such as SPEC. All tests compile and execute. We've also been making sure that the outliner produces debuggable code. Users are still guaranteed to have sane backtraces in the presence of outlined functions.

Added exposure to various programs would help the outlining algorithm mature further. This, in turn, will help the overall outlining project. For example, there have been a few discussions on implementing an IR-level outlining pass [3, 4]. Ultimately, the goal is to create a shared outlining interface. This interface would allow the outliner to exist at any level of representation [4]. The general outlining algorithm will be part of the shared interface. Thus, in the spirit of incremental improvement, it makes sense to begin "stress-testing" it sooner than later.

I just tried some tests, and I'm seeing a bunch of failures on SPEC at -O3; looks like mostly crashes at runtime.   I can try to reduce a testcase if you need it.


There are a few patches necessary to facilitate this. They are available in the patches section of this email. I’ll summarize what they do here for the sake of discussion though.

The first patch is one that teaches the backend about size optimization levels. This is comparable to what's done in the inliner. Today, the only way to tell if something is optimizing for size is by looking at function attributes. This is fine for function passes, but insufficient for module passes like the MachineOutliner. The function attribute approach forces the outliner to iterate over every function in the module before deciding to take action. If -Oz isn't passed in, then the outliner will not find any functions worth outlining from. This would incur unnecessary compile-time overhead. Thus, we decided the best course of action is to teach the backend about size options.

I don't think this is really the right approach.  With LTO, you can have a mix of functions, some of which are minsize, and some of which are not.  Or with profile info, we might want to outline only cold code (I guess this isn't implemented yet, but potentially future work).  Tying whether we run the outliner to a command-line flag restricts the possible uses; either the entire module gets outlining, or none of it does.

In general, we've been moving away from global settings so we can optimize more effectively in this sort of scenario.

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project



-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
Thanks for reducing that for me!

The outliner pulls out the following:

OUTLINED_FUNCTION_0:                    // @OUTLINED_FUNCTION_0
.cfi_sections .debug_frame
.cfi_startproc
// %bb.0:
adrp x29, g1
add x29, x29, :lo12:g1
adrp x30, g2                                    // This adrp shouldn’t have been outlined.
ret

It shouldn’t be pulling out that adrp. There’s a special case for arps in the outliner which hinged on the assumption that x30 wouldn’t be used in that way. I just finished writing a fix which I’ll have up shortly.

- Jessica

On Apr 23, 2018, at 3:55 PM, Friedman, Eli <[hidden email]> wrote:

Sorry, I was using a modified compiler, which by coincidence made the bug much easier to reproduce.

In some rare cases, the compiler will use x30 as a general-purpose register; in that case, outlining breaks because the "ret" branches to the wrong address.  Testcase (reproduce with "clang -O3 --target=aarch64-pc-linux-gnu -mllvm -enable-machine-outliner"):

extern long g1;
extern long g2;
void foo() {
  register long *x asm("x27") = &g1;
  register long *y asm("x29") = &g1;
  register long *z asm("x30") = &g2;
  asm(""::"r"(x),"r"(y),"r"(z));
}
void foo2() {
  register long *x asm("x27") = &g1;
  register long *y asm("x29") = &g1;
  register long *z asm("x30") = &g2;
  asm(""::"r"(x),"r"(y),"r"(z));
}
void foo3() {
  register long *x asm("x27") = &g1;
  register long *y asm("x29") = &g1;
  register long *z asm("x30") = &g2;
  asm(""::"r"(x),"r"(y),"r"(z));
}

-Eli

On 4/23/2018 2:37 PM, Jessica Paquette wrote:
I just ran SPEC at -O3 with the outliner enabled for AArch64 and didn’t get any failures on my end. Which flags did you use? I’m curious about what’s going on here...

I used -O3 -mllvm -enable-machine-outliner -arch arm64.

- Jessica

On Apr 23, 2018, at 1:41 PM, Jessica Paquette <[hidden email]> wrote:

Hi Eli,

I just tried some tests, and I'm seeing a bunch of failures on SPEC at -O3; looks like mostly crashes at runtime.   I can try to reduce a testcase if you need it.
If you could do that, that would be great. Our testing has been primarily for -Oz and -O2, so I haven’t looked at -O3 at all.

I don't think this is really the right approach.  With LTO, you can have a mix of functions, some of which are minsize, and some of which are not.  Or with profile info, we might want to outline only cold code (I guess this isn't implemented yet, but potentially future work).  Tying whether we run the outliner to a command-line flag restricts the possible uses; either the entire module gets outlining, or none of it does.
I’m worried that walking the entire list of functions in the module when nothing has the minsize attribute would incur unnecessary compile-time overhead. If that’s a reasonable thing to do though, I’m fine with that approach. It’d be a less invasive change, and would give us the desired LTO behaviour for free.

- Jessica


On Apr 23, 2018, at 1:24 PM, Friedman, Eli <[hidden email]> wrote:

On 4/20/2018 7:06 PM, Jessica Paquette via llvm-dev wrote:
We perform regular testing to ensure the outliner produces correct AArch64 code at -Oz. Tests include the LLVM test suite and standard external test suites such as SPEC. All tests compile and execute. We've also been making sure that the outliner produces debuggable code. Users are still guaranteed to have sane backtraces in the presence of outlined functions.

Added exposure to various programs would help the outlining algorithm mature further. This, in turn, will help the overall outlining project. For example, there have been a few discussions on implementing an IR-level outlining pass [3, 4]. Ultimately, the goal is to create a shared outlining interface. This interface would allow the outliner to exist at any level of representation [4]. The general outlining algorithm will be part of the shared interface. Thus, in the spirit of incremental improvement, it makes sense to begin "stress-testing" it sooner than later.

I just tried some tests, and I'm seeing a bunch of failures on SPEC at -O3; looks like mostly crashes at runtime.   I can try to reduce a testcase if you need it.


There are a few patches necessary to facilitate this. They are available in the patches section of this email. I’ll summarize what they do here for the sake of discussion though.

The first patch is one that teaches the backend about size optimization levels. This is comparable to what's done in the inliner. Today, the only way to tell if something is optimizing for size is by looking at function attributes. This is fine for function passes, but insufficient for module passes like the MachineOutliner. The function attribute approach forces the outliner to iterate over every function in the module before deciding to take action. If -Oz isn't passed in, then the outliner will not find any functions worth outlining from. This would incur unnecessary compile-time overhead. Thus, we decided the best course of action is to teach the backend about size options.

I don't think this is really the right approach.  With LTO, you can have a mix of functions, some of which are minsize, and some of which are not.  Or with profile info, we might want to outline only cold code (I guess this isn't implemented yet, but potentially future work).  Tying whether we run the outliner to a command-line flag restricts the possible uses; either the entire module gets outlining, or none of it does.

In general, we've been moving away from global settings so we can optimize more effectively in this sort of scenario.

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project



-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
Hello

A 4.4% geomean codesize improvement is really impressive. That stuff is hard to come by, you usually have to nibble away at it bit at a time. I ran some codesize benchmarks we have and they were in the same ballpark. Some of these are quite small so had less opportunity for outlining, but the average was still over 3% with some as high as 9-10%.

All the tests I ran were fine, although we don't have a lot of -Oz AArch64 testing.

Thanks for working on this, we'll have to see about getting it working for Arm code too!
Dave
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
Hi,

On 25 April 2018 at 14:02, David Green via llvm-dev
<[hidden email]> wrote:
> Hello
>
> A 4.4% geomean codesize improvement is really impressive. That stuff is hard to come by, you usually have to nibble away at it bit at a time. I ran some codesize benchmarks we have and they were in the same ballpark. Some of these are quite small so had less opportunity for outlining, but the average was still over 3% with some as high as 9-10%.
>
> All the tests I ran were fine, although we don't have a lot of -Oz AArch64 testing.

I made the same experiments during the last weeks inside Linaro and
got the same kind of figures.

> Thanks for working on this, we'll have to see about getting it working for Arm code too!

Porting the outliner on ARM is in my plans for this year (as discussed
with other ARM folks at EuroLLVM last week), to avoid duplication is
it ok for you if I work on it, David, Jessica ?

Cheers,
Yvan

> Dave
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
In reply to this post by Dean Michael Berris via llvm-dev
I don't think this is really the right approach.  With LTO, you can have a mix of functions, some of which are minsize, and some of which are not.  Or with profile info, we might want to outline only cold code (I guess this isn't implemented yet, but potentially future work).  Tying whether we run the outliner to a command-line flag restricts the possible uses; either the entire module gets outlining, or none of it does.

I’ve updated the main patch (https://reviews.llvm.org/D45916) to use this method instead. It’s a lot cleaner and keeps the changes far more self-contained. This should make it easier to define custom outlining behaviour based off function attributes, target-specific requirements, etc. The other patches have been abandoned because they are no longer required.

The compile-time overhead should only appear in AArch64 after this patch. It should only incur the 1% overhead if -Oz is passed in. Otherwise, there will be a very small overhead stemming from looping over the functions in the module and checking for the minsize attribute.

I also fixed the -O3 SPEC failure, so I don’t think that there’s anything outstanding left to fix.

- Jessica


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
In reply to this post by Dean Michael Berris via llvm-dev
> Porting the outliner on ARM is in my plans for this year (as discussed
> with other ARM folks at EuroLLVM last week), to avoid duplication is
> it ok for you if I work on it, David, Jessica ?


Sounds good to me; an ARM target would be great!

- Jessica

> On Apr 26, 2018, at 2:17 AM, Yvan Roux <[hidden email]> wrote:
>
> Hi,
>
> On 25 April 2018 at 14:02, David Green via llvm-dev
> <[hidden email]> wrote:
>> Hello
>>
>> A 4.4% geomean codesize improvement is really impressive. That stuff is hard to come by, you usually have to nibble away at it bit at a time. I ran some codesize benchmarks we have and they were in the same ballpark. Some of these are quite small so had less opportunity for outlining, but the average was still over 3% with some as high as 9-10%.
>>
>> All the tests I ran were fine, although we don't have a lot of -Oz AArch64 testing.
>
> I made the same experiments during the last weeks inside Linaro and
> got the same kind of figures.
>
>> Thanks for working on this, we'll have to see about getting it working for Arm code too!
>
> Porting the outliner on ARM is in my plans for this year (as discussed
> with other ARM folks at EuroLLVM last week), to avoid duplication is
> it ok for you if I work on it, David, Jessica ?
>
> Cheers,
> Yvan
>
>> Dave
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
> Porting the outliner on ARM is in my plans for this year (as discussed
> with other ARM folks at EuroLLVM last week), to avoid duplication is
> it ok for you if I work on it, David, Jessica ?

Yeah, sounds great to me. I had merely got as far as looking at the AArch64 code to see how easy it would be to copy, without any honest expectation of being able to look into it properly any time soon. I imagine there are plenty of pitfalls on Arm/Thumb that could make this difficult to get correct.

Thanks for working on it! Let us know how it goes and if we can do anything to help.
Dave
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
On 26 April 2018 at 23:27, David Green <[hidden email]> wrote:
>> Porting the outliner on ARM is in my plans for this year (as discussed
>> with other ARM folks at EuroLLVM last week), to avoid duplication is
>> it ok for you if I work on it, David, Jessica ?
>
> Yeah, sounds great to me. I had merely got as far as looking at the AArch64 code to see how easy it would be to copy, without any honest expectation of being able to look into it properly any time soon. I imagine there are plenty of pitfalls on Arm/Thumb that could make this difficult to get correct.

yes I have the same feeling.

> Thanks for working on it! Let us know how it goes and if we can do anything to help.

Ok great, sure I'll let you know.

Thanks
Yvan
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Turn the MachineOutliner on by default in AArch64 under -Oz

Dean Michael Berris via llvm-dev
In reply to this post by Dean Michael Berris via llvm-dev
Ping!

Any objections to this?

Eli, you’ve been submitting a few patches to the outliner lately. Since I think Matthias is a little too busy to review the patch right now, do you think you could take up the review for it? If you have no objections, I’d like to push this forward.

- Jessica

On Apr 20, 2018, at 7:06 PM, Jessica Paquette <[hidden email]> wrote:

Hi all,

The MachineOutliner has come a long way since the original incarnation presented at the 2016 LLVM Developer's Meeting [1]. In particular, we've been pushing a lot on the AArch64 target for the MachineOutliner. It's mature enough at this point that we'd like to take things a step further and turn it on by default in AArch64 under -Oz. Since the primary goal of -Oz is "make it as small as possible", the outliner is a good addition to the -Oz pass pipeline.

For a detailed description of the MachineOutliner, see the original RFC. [2].

We've observed, comparing -Oz to -Oz + outlining on the latest trunk compiler,

* A geomean ~4.4% text size reduction of the CTMark tests (min = 0.3% on tramp3d-v4, max = 15.4% on kc)

* A geomean compile-time overhead of ~1.1% (min = 0.2% on 7zip, max = 2.2% on sqlite3)

We perform regular testing to ensure the outliner produces correct AArch64 code at -Oz. Tests include the LLVM test suite and standard external test suites such as SPEC. All tests compile and execute. We've also been making sure that the outliner produces debuggable code. Users are still guaranteed to have sane backtraces in the presence of outlined functions.

Added exposure to various programs would help the outlining algorithm mature further. This, in turn, will help the overall outlining project. For example, there have been a few discussions on implementing an IR-level outlining pass [3, 4]. Ultimately, the goal is to create a shared outlining interface. This interface would allow the outliner to exist at any level of representation [4]. The general outlining algorithm will be part of the shared interface. Thus, in the spirit of incremental improvement, it makes sense to begin "stress-testing" it sooner than later.

There are a few patches necessary to facilitate this. They are available in the patches section of this email. I’ll summarize what they do here for the sake of discussion though.

The first patch is one that teaches the backend about size optimization levels. This is comparable to what's done in the inliner. Today, the only way to tell if something is optimizing for size is by looking at function attributes. This is fine for function passes, but insufficient for module passes like the MachineOutliner. The function attribute approach forces the outliner to iterate over every function in the module before deciding to take action. If -Oz isn't passed in, then the outliner will not find any functions worth outlining from. This would incur unnecessary compile-time overhead. Thus, we decided the best course of action is to teach the backend about size options.

The second patch teaches llc to handle -Oz and -Os.

The third patch teaches targets about the outliner. A target will be able to specify if, and when it wants outlining on by default. It also adds a flag to disable the MachineOutliner for users that don’t want outlining behaviour when it is enabled by default.

The final patch teaches clang to pass the new size information down along to the backend. This allows us to do things like, clang -Oz … foo.c and have the outliner run.

Thanks for taking the time to read this!
Jessica

*** Patches ***

1. Teaching the backend about -Oz/-Os: https://reviews.llvm.org/D45914
2. Teach llc about -Oz/-Os: https://reviews.llvm.org/D45915
3. Teaching the target about the outliner and enabling it by default under AArch64: https://reviews.llvm.org/D45916
4. Teaching clang to pass -Oz/-Os down to the backend: https://reviews.llvm.org/D45917


*** References ***
[1] Reducing Code Size Using Outlining (https://www.youtube.com/watch?v=yorld-WSOeU)

[2] Original RFC (http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html)

[3] [RFC] Add IR level interprocedural outliner for code size. (http://lists.llvm.org/pipermail/llvm-dev/2017-July/115666.html)

[4] [RFC] PT.2 Add IR level interprocedural outliner for code size. (http://lists.llvm.org/pipermail/llvm-dev/2017-September/117153.html)


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev